Network must copy inputs to outputs through a “bottleneck” (fewer hidden units)
Hidden representations become a learned compressed code of the inputs/outputs
Capture systematic structure among full set of patterns Due to bottleneck, don’t have capacity to over learn idiosyncratic aspects of particular patterns
For N linear hidden units, hidden representations span the same subspace as the first N principal components (≈PCA)
Read More