Autoencoders are an unsupervised learning model that aim to learn distributed representations of data.
Typically an autoencoder is a neural network trained to predict its own input data. A large enough network will simply memorize the training set, but there are a few things that can be done to generate useful distributed representations of input data, including:
- constraining the size of the model, forcing it to learn a lower-dimensional representation that can be used to re-construct the original higher-dimensional data points.
- adding artificial noise to the initial data points, and training the autoencoder to predict the data points minus the artificial noise. See denoising autoencoder for more information.