A form of unsupervised learning where the data provides the supervision
In general, withhold some part of the data, and task the network with predicting it
The task defines a proxy loss, and the network is forced to learn what we really care about, e.g. a semantic representation, in order to solve it
Read More