Unsupervised learning algorithms

Post by Aadith Vittala.

Pehlevan, C., Chklovskii, D. B. (2019). Neuroscience-inspired online unsupervised learning algorithms. ArXiv:1908.01867 [Cs, q-Bio].

This paper serves as a review of similarity-based online unsupervised learning algorithms. These types of algorithms are important because they are biologically-plausible, produce non-trivial results, and sometimes work as well as other non-biological algorithms. In this paper, biologically-plausible algorithms have three main features: they are local (each neuron uses only pre or post synaptic information), they are online (data vectors are presented one at a time and learning occurs after each data vector is passed in), and they are unsupervised (there is no teaching signal to tell the neurons information about error). The simplest example of one of these algorithms is the Oja online PCA algorithm. Here, the system receives \textbf{x}_t each timestep and calculates y_t = \textbf{w}_{t-1} \cdot \textbf{x}_t, which represents the value of the top principal component. The weights are modified each timestep according to

\textbf{w}_t = \textbf{w}_{t-1} + \eta(\textbf{x}_t - \textbf{w}_{t-1} y_t)y_t

This algorithm is both biologically-plausible and potentially useful. This paper aims to find more algorithms like the Oja online PCA algorithm.

As a starting point, they aimed to develop a biologically-plausible algorithm that would output multiple principal components from a given data set. To do this, they chose to use a similarity-matching objective function, where the goal is to minimize the following expression

\min_{\textbf{y}_1 \ldots \textbf{y}_T} \frac{1}{T^2}\sum_{t=1}^T \sum_{t'=1}^T \big(\textbf{x}_t^T \textbf{x}_{t'} - \textbf{y}_t^T \textbf{y}_{t'} \big)^2

This expression essentially tries to match the pairwise similarities between input vectors to the pairwise similarities between output vectors. In previous work, they have shown that the solution to this problem (with \textbf{y} having fewer dimensions than \textbf{x}) is PCA. To solve this in a biologically-plausible fashion, they use a variable substitution trick (inspired by the Hubbard-Stratonovich transformation from physics) to convert this problem to a minimax problem over new variables \textbf{W} and \textbf{M}

\min_{\textbf{W}} \max_{\textbf{M}} \frac{1}{T} \sum_{t=1}^T \big[ 2 {\rm Tr\,}(\textbf{W}^T \textbf{W}) - {\rm Tr\,}(\textbf{M}^T \textbf{M}) +\min_{\textbf{y}_t} (-4 \textbf{x}_t^T \textbf{W}^T \textbf{y}_t + 2 \textbf{y}_t^T\textbf{M}\textbf{y}_t) \big]

This expression leads to an online algorithm where you solve for \textbf{y}_t during each time step t using

\dot{\textbf{y}}_t = \textbf{W} \textbf{x}_t - \textbf{M} \textbf{y}_t

and then update \textbf{W} and \textbf{M} with

W_{ij} = W_{ij} + \eta (y_i x_j - W_{ij}) \hspace{20pt} \textrm{Hebbian}

M_{ij} = M_{ij} + \frac{\eta}{2} (y_i y_j - M_{ij}) \hspace{20pt} \textrm{``anti-Hebbian"}

though after discussion, we think it would be better to call the second weights update “Hebbian for inhibitory synapses”. This online algorithm has not yet been proven to converge, but it gives relatively good results when tested. In addition, it provides a simple interpretation of \textbf{W} as the presynaptic weights mapping all inputs to all neurons, \textbf{y} as the postsynaptic output from all neurons, and \textbf{M} as inhibitory lateral projections between neurons.

The paper goes on to generalize this algorithm to accept whitening constraints (via Lagrange multipliers), to work for non-negative outputs, and to work for clustering and manifold tiling. The details for all of these processes are covered in cited papers, but not in this specific paper. Overall, the similarity-matching objective seems to give well-performing biologically-plausible algorithms for a wide range of problems. However, there are a few important caveats: none of these online algorithms have been proved to converge for the correct solution, inputs were not allowed to correlate, and there is no theoretical basis for stacking (since multiple layers would be equivalent to a single layer). In addition, during discussion we noted that similarity matching seems to essentially promote networks that just rotate their input into their output (as similarity measures the geometric dot product between vectors), so it is not obvious how this technique can conduct the more non-linear transformations necessary for complex computation. Nonetheless, this paper and similarity-matching in general provides important insight into how networks can perform computations while still remaining within the confines of biological plausibility. 

Leave a Reply

Your email address will not be published. Required fields are marked *