One of the most interesting effects of PCA (Principal Component Analysis) is to decorrelate the input covariance matrix C, by computing the eigenvectors and operating a base change using a matrix V:
The eigenvectors are sorted in descending order considering the corresponding eigenvalue, therefore Cpca is a diagonal matrix where the non-null elements are λ1 >= λ2 >= λ3 >= … >= λn. By selecting the top p eigenvalues, it’s possible to operate a dimensionality reduction by projecting the samples in the new sub-space determined by the p top eigenvectors (it’s possible to use Gram-Schmidt orthonormalization if they don’t have a unitary length). The standard PCA procedure works with a bottom-up approach, obtaining the decorrelation of C as a final effect, however, it’s possible to employ neural networks, imposing this condition as an optimization step. One the most effective model has been proposed by Rubner and Tavan (and it’s named after them). Its generic structure is:
Where we suppose that N (input-dimensionality) << M (output-dimensionality). The output of the network can be computed as:
where V (n × n) is a lower-triangular matrix with all diagonal elements to 0 and W has a shape (n × m). Moreover, it’s necessary to store the y(t) in order to compute y(t+1). This procedure must be repeated until the output vector has been stabilized. In general after k < 10 iterations, the modifications are under a threshold of 0.0001, however, it’s important to check this value in every real application.
The training process is managed with two update rules:
The first one is Hebbian based on the Oja’s rule, while the second is anti-Hebbian because its purpose is to reduce the correlation between output units. In fact, without the normalization factor, the update is in the form dW = -αy(i)y(k), so to reduce the synaptic weight when two output units are correlated (same sign).
If the matrix V is kept upper-triangular, it’s possible to vectorize the process. There are also other variants, like the Földiák network, which adopts a symmetric V matrix, so to add the contribution of all other units to y(i). However, the Rubner-Tavan model seems more similar to the process adopted in a sequential PCA, where the second component is computed as orthogonal to the first and so forth until the last one.
The example code is based on the MNIST dataset provided by Scikit-Learn and adopts a fixed a number of cycles to stabilize the output. The code is also available in this GIST:
Hebbian Learning is one the most famous learning theories, proposed by the Canadian psychologist Donald Hebb in 1949, many years before his results were confirmed through neuroscientific experiments.