# A theory of multineuronal dimensionality, dynamics and measurement

We recently discussed this paper by Gao et al. from Ganguli lab. They present a theory of neural dimensionality and sufficiency conditions for accurate recovery of neural trajectories, providing a much-needed theoretical perspective from which to judge a majority of systems neuroscience studies that rely on dimensionality reduction. Their results also provide a long overdue mathematical justification for drawing conclusions about entire neural systems based on the activity of a small number of neurons. I felt the paper was well written, and the mathematical arguments used in the proofs were pretty engaging — I don’t remember the last time I enjoyed reading supplementary material quite like this. Here’s a brief summary and some additional thoughts on the paper.

Linear dimensionality reduction techniques are widely used in neuroscience to study how behaviourally-relevant variables are represented in the neurons. The general approach goes like this – (i) apply dimensionality reduction e.g. PCA on trial-averaged activity of a population of $M$ neurons to identify a $P$-dimensional subspace ($P) capturing a sufficient fraction of neural activity, and (ii) examine how neural dynamics evolve within this subspace to (hopefully) gain insights about neural computation. This recipe has largely been successful (ignoring failures that generally go unpublished): the reduced dimensionality of neural datasets is often quite small and the corresponding low-dimensional dynamical portraits are usually interpretable. However, neuroscientists observe only a tiny fraction of the complete neural population. So could the success of dimensionality reduction be an artefact of severe subsampling? This is precisely the question that Gao et al. attempt to answer in their paper.

They first develop a theory that describes how neural dimensionality (defined below) is bounded by the task design and some easy-to-measure properties of neurons. Then they adapt the mathematical theory of random projection to neuroscience setting and obtain the amount of geometric distortion in the neural trajectories introduced by subsampling, or equivalently, the minimum number of neurons one has to measure in order to achieve an arbitrarily small distortion in a real experiment. Throughout this post, I use the term neural dimensionality in the same sense that the authors use in the paper: the dimension of the smallest affine subspace that contains a large (~80 – 90%) fraction of the neural trajectories. Note that this notion of dimensionality differs from the intrinsic dimensionality of the neural manifold, which is usually much smaller.

To derive an analytical expression for dimensionality, the authors note that there is an inherent biological limit to how fast the neural trajectory can evolve as a function of the task parameters. Concretely, consider the response of a population of visual neurons to an oriented bar. As you change the orientation from 0 to $\pi$, the activity of the neural population will likely change too. If $\vartheta$ denotes the minimum change in orientation required to induce an appreciable change in the population activity (i.e. the width of the autocorrelation in the population activity pattern), then the population will be able to explore roughly $\pi/\vartheta$ linear dimensions. Of course, the scale of autocorrelation will differ across brain areas (presumably increases as one goes from the retina to higher visual areas), so the neural dimensionality would depend on the properties of the population being sampled, not just on the task design. Similar reasoning applies to other task parameters such as time (yes, they consider time as a task parameter because, after all, neural activity is variable in time). If you wait for time period $T$, the dimensionality will be roughly equal to $T/\tau$ where $\tau$ is now the width of temporal autocorrelation. For the general case of $K$ different task parameters, they prove that neural dimensionality $D$ is ultimately bounded by (even if you record from millions of neurons):

$\displaystyle \LARGE D \le C\frac{\prod_{k=1}^{K}{L_k}}{\prod_{k=1}^{K}{\lambda_k}} \qquad \qquad (1)$

where $\\L_k$ is the range of the $k^{th}$ task parameter, $\lambda_k$ is the corresponding autocorrelation length and $C$ is an $O(1)$ constant which they prove is close to 1. The numerator and denominator depend on task design and smoothness of neural dynamics respectively, so they label the term on the right-hand side neural task complexity (NTC). This terminology was a source of confusion among some of us as it appears to downplay the fundamental role of the neural circuit properties in restricting the dimensionality, but its intended meaning is pretty clear if you read the paper.

To derive NTC, the authors assume that the neural response is stationary in the task parameters and the joint autocorrelation function is factorisable as a product of individual task parameters’ autocorrelation functions, and then show that the above bound becomes weak when these assumptions do not hold for the particular population being studied. The proof was also facilitated in part by a clever choice of the definition of dimensionality: ‘participation ratio’ $={\left (\sum_i \mu_i \right )^2}/{\left (\sum_i \mu_i^2 \right )}$ where $\mu_i$ are the eigenvalues of the neuronal covariance matrix, instead of the more common but analytically cumbersome measure based on ‘fraction $x$ of variance explained’ $=\begin{matrix} argmin\\ D \end{matrix} \ s.t. \ \left ( \sum_{i=1}^{D} \mu_i \right )/\left ( \sum_i \mu_i \right ) \geq x$ , but they demonstrate that their choice is reasonable.

Much of the discussion in our journal club centred on whether equation (1) is just circular reasoning, and whether we really gain any new insight from this theory. This view was somewhat understandable because the authors introduce the paper by promising to present a theory that explains the origin of the simplicity betrayed by the low dimensionality of neural recordings… only to show us that it emerges from the specific way in which neural populations respond (smooth dynamics $\approx$ large denominator) to specific tasks (low complexity $\approx$ small numerator). Although this result may seem qualitatively trivial, the strength of their work lies in making our intuitions precise and packaging them in the form of a compact theorem. Moreover, as shown later in the paper, knowing this bound on dimensionality can be practically helpful in determining how many neurons to record. Before discussing that aspect, I’d like to briefly dwell a little bit on a potentially interesting corollary and a possible extension of the above theorem.

Based on the above theorem, one can identify three regimes of dimensionality for a recording size of $M$ neurons:
(i) $D\approx M;\ D\ll NTC$
(ii) $D\approx NTC;\ D\ll M$
(iii) $D\ll M;\ D\ll NTC$

The first two regimes are pretty straightforward to interpret. (i) implies that you might not have sampled enough neurons, while (ii) means that the task was not complex enough to elicit richer dynamics. The authors call (iii) the most interesting and say ‘Then, and only then, can one say that the dimensionality of neural state space dynamics is constrained by neural circuit properties above and beyond the constraints imposed by the task and smoothness of dynamics alone’. What could those properties be? Here, it is worth noting that their theory takes the speed of neural dynamics into account, but not the direction. Recurrent connections, for example, might prevent the neural trajectory from wandering in certain directions thereby constraining the dimensionality. Such constraints may in fact lead to nonstationary and/or unfactorisable neuronal covariance, violating the conditions that are necessary for dimensionality to approach NTC. Although this is not explicitly discussed, they simulate a non-normal network to demonstrate that its dimensionality is reduced by recurrent amplification. So I guess it must be possible to derive a stronger theorem with a tighter bound on neural dimensionality by incorporating the influence of the strength and structure of connections between neurons.

NTC is a bound on the size of the linear subspace within which neural activity is mostly confined. But even if NTC is small, it is not clear whether we can accurately estimate the neural trajectory within this subspace simply by recording $M$ neurons such that $M\gg NTC$. After all, $M$ is still only a tiny fraction of the total number of neurons in the population $N$. To explore this, the authors use the theory of random projection and show that it is possible to achieve some desired level of fractional error $\epsilon$ in estimating the neural trajectory by ensuring:

$\displaystyle M(\epsilon)=K[O(log\ NTC)\ +\ O(log\ N)\ +\ O(1)]\ \epsilon^{-2} \qquad \qquad (2)$

where $K$ is the number of task parameters. This means that the demands on the size of the neural recording grow only linearly in the number of task parameters and logarithmically (!!) in both NTC and $N$. Equation (2) holds as long as the recorded sample is statistically homogenous to the rest of the neurons, a restriction that is guaranteed for most higher brain areas provided the sampling is unbiased i.e. the experimenter does not cherry-pick which neurons to record/analyse. The authors encourage us to use their theorems to obtain back-of-the-envelope estimates of recording size and to guide experimental design. This is easier said than done, especially when studying a new brain area or when designing a completely new task. Nevertheless, their work is likely to push the status quo in neuroscience experiments by encouraging experimentalists to move boldly towards more complex tasks without radically revising their approach to neural recordings.