Review of “Optimal policy for multi-alternative decisions”

Paper by Satohiro Tajima, Jan Drugowitsch, Nisheet Patel, and Alexendre Pouget Nature Neruoscience, August 5th, 2019

Review by Nick Barendregt (CU, Boulder)

Summary

Organisms must develop robust and accurate strategies to make decisions in order to survive in complex environments. Recent studies have largely focused on value-based or perceptual decisions where observers must choose between two alternatives. However, many real-world situations require choosing between multiple options, and it is not clear if the strategies that are optimal for two-alternative tasks can be directly translated to multi-alternative tasks. To address this question, the authors use dynamic programming to find the optimal strategy for an n-alternative task. Using Bellman’s equation, the authors find that the belief thresholds at which a decision process is terminated are time-varying non-linear functions. To understand how observers could implement such a complex stopping rule, the authors then develop a neural network model that approximates the optimal strategy. Using this network model, they then show that several experimental observations, such as the independence of irrelevant alternatives (IIA) principle, that had been thought to be suboptimal can in fact be explained by the non-linearity of the network. The authors conclude by using their network model to generate testable hypotheses about observer behavior in multi-alternative decision tasks.

Optimal Decision Strategy

To find the optimal strategy for an n-alternative decision task, the authors assume the observer accumulates evidence to update their belief, and that the observer commits to a choice whenever their belief becomes strong enough; this can be described mathematically by the belief crossing a threshold. To find these thresholds, the authors assume that the observer sets their thresholds to maximize their reward rate, or the average reward (less the average cost of accumulating evidence) per average trial length. These assumptions allow them to construct a utility, or value, function for the observer. At each timestep, the observer collects a new piece of evidence and uses it to update their belief. With this new belief, the observer calculates the utility associated with two classes of actions. The first class, which has total actions, is committing to a choice, which has utility equal to the reward for a correct choice times the probability their choice is correct (i.e., their belief in the choice being correct). The second class, which has a single action, is waiting and drawing a new observation, which has utility equal to the average future utility less some cost of obtaining a new observation. The utility function selects the maximum of these n+1 actions for the observer. The decision thresholds are then given by the belief values where the maximal-utility action changes.

Using Bellman’s equations for the utility function, the authors find the decision thresholds are non-linear functions that evolve in time. From the form of these thresholds, the authors surmise that the belief-update process can be projected onto a lower-dimensional space, and that the thresholds collapse as time increases, reflecting the urgency the observer faces to make a choice and proceed to the next trial.

Neural Network Model

To see how observers may approximate this non-linear stopping rule, the authors construct a recurrent neural network that implements a sub-optimal version of the decision strategy. This network model has n neurons, one for each option, which track the belief associated with each option. The network also includes divisive normalization, which is widespread in the cortex, and an urgency signal, which increases the gain as time increases. These two features allow the model to well-approximate the optimal stopping rule, and result in a model that has a similar lower-dimensional projection and collapsing thresholds. When comparing their network model to a standard race model, the authors find that adding normalization and urgency improves model performance in both value-based and perceptual tasks, with normalization having a larger impact on performance.

Results and Predictions

Using their neural network model, the authors are able reproduce several well-established results, such as Hick’s law for response times, and explain several behavioral- and physiological-findings in humans that have long been thought to be sub-optimal. First, because of the normalization, the model is able to replicate the violation of IIA, which says that in a choice between two high-value options, adding a third option of lower value should not influence the decision process. The normalization also replicates the similarity effect, which says that when choosing between option 1 and option 2, adding a third option similar to that of option 1 decreases the probability of choosing option 1. The authors then conclude that the key explanation of these behaviors is divisive normalization.

After validating their model by reproducing these previously-observer results, the authors then make predictions about observer behavior in multi-alternative tasks. The main prediction is in the two types of strategies used for multi-alternative tasks: the “max-vs.-average” strategy and the “max-vs.-next” strategy. The model predicts that the reward distribution across choice should cause observers to smoothly transition between these two strategies; this prediction is something that could be tested in psychophysics experiments. 

Unsupervised learning algorithms

Post by Aadith Vittala.

Pehlevan, C., Chklovskii, D. B. (2019). Neuroscience-inspired online unsupervised learning algorithms. ArXiv:1908.01867 [Cs, q-Bio].

This paper serves as a review of similarity-based online unsupervised learning algorithms. These types of algorithms are important because they are biologically-plausible, produce non-trivial results, and sometimes work as well as other non-biological algorithms. In this paper, biologically-plausible algorithms have three main features: they are local (each neuron uses only pre or post synaptic information), they are online (data vectors are presented one at a time and learning occurs after each data vector is passed in), and they are unsupervised (there is no teaching signal to tell the neurons information about error). The simplest example of one of these algorithms is the Oja online PCA algorithm. Here, the system receives \textbf{x}_t each timestep and calculates y_t = \textbf{w}_{t-1} \cdot \textbf{x}_t, which represents the value of the top principal component. The weights are modified each timestep according to

\textbf{w}_t = \textbf{w}_{t-1} + \eta(\textbf{x}_t - \textbf{w}_{t-1} y_t)y_t

This algorithm is both biologically-plausible and potentially useful. This paper aims to find more algorithms like the Oja online PCA algorithm.

As a starting point, they aimed to develop a biologically-plausible algorithm that would output multiple principal components from a given data set. To do this, they chose to use a similarity-matching objective function, where the goal is to minimize the following expression

\min_{\textbf{y}_1 \ldots \textbf{y}_T} \frac{1}{T^2}\sum_{t=1}^T \sum_{t'=1}^T \big(\textbf{x}_t^T \textbf{x}_{t'} - \textbf{y}_t^T \textbf{y}_{t'} \big)^2

This expression essentially tries to match the pairwise similarities between input vectors to the pairwise similarities between output vectors. In previous work, they have shown that the solution to this problem (with \textbf{y} having fewer dimensions than \textbf{x}) is PCA. To solve this in a biologically-plausible fashion, they use a variable substitution trick (inspired by the Hubbard-Stratonovich transformation from physics) to convert this problem to a minimax problem over new variables \textbf{W} and \textbf{M}

\min_{\textbf{W}} \max_{\textbf{M}} \frac{1}{T} \sum_{t=1}^T \big[ 2 {\rm Tr\,}(\textbf{W}^T \textbf{W}) - {\rm Tr\,}(\textbf{M}^T \textbf{M}) +\min_{\textbf{y}_t} (-4 \textbf{x}_t^T \textbf{W}^T \textbf{y}_t + 2 \textbf{y}_t^T\textbf{M}\textbf{y}_t) \big]

This expression leads to an online algorithm where you solve for \textbf{y}_t during each time step t using

\dot{\textbf{y}}_t = \textbf{W} \textbf{x}_t - \textbf{M} \textbf{y}_t

and then update \textbf{W} and \textbf{M} with

W_{ij} = W_{ij} + \eta (y_i x_j - W_{ij}) \hspace{20pt} \textrm{Hebbian}

M_{ij} = M_{ij} + \frac{\eta}{2} (y_i y_j - M_{ij}) \hspace{20pt} \textrm{``anti-Hebbian"}

though after discussion, we think it would be better to call the second weights update “Hebbian for inhibitory synapses”. This online algorithm has not yet been proven to converge, but it gives relatively good results when tested. In addition, it provides a simple interpretation of \textbf{W} as the presynaptic weights mapping all inputs to all neurons, \textbf{y} as the postsynaptic output from all neurons, and \textbf{M} as inhibitory lateral projections between neurons.

The paper goes on to generalize this algorithm to accept whitening constraints (via Lagrange multipliers), to work for non-negative outputs, and to work for clustering and manifold tiling. The details for all of these processes are covered in cited papers, but not in this specific paper. Overall, the similarity-matching objective seems to give well-performing biologically-plausible algorithms for a wide range of problems. However, there are a few important caveats: none of these online algorithms have been proved to converge for the correct solution, inputs were not allowed to correlate, and there is no theoretical basis for stacking (since multiple layers would be equivalent to a single layer). In addition, during discussion we noted that similarity matching seems to essentially promote networks that just rotate their input into their output (as similarity measures the geometric dot product between vectors), so it is not obvious how this technique can conduct the more non-linear transformations necessary for complex computation. Nonetheless, this paper and similarity-matching in general provides important insight into how networks can perform computations while still remaining within the confines of biological plausibility. 

What is the dynamical regime of the cortex?

A review of a preprint by Y. Ahmadian and K. D. Miller

What is the dynamical regime of cortical networks? This question has been debated as long as we have been able to measure cortical activity. The question itself can be interpreted in multiple ways, the answers depending on spatial and temporal scales of the activity, behavioral states, and other factors. Moreover, we can characterize dynamics in terms of dimensionality, correlations, oscillatory structure, or other features of neural activity.

In this review/comment, Y. Ahmadian and K. Miller consider the dynamics of single cells in cortical circuits, as characterized by multi-electrode, and intracellular recording techniques. Numerous experiments of this type indicate that individual cells receive excitation and inhibition that are approximately balanced. As a result, activity is driven by fluctuations that cause the cell’s membrane potential to occasionally, and irregularly cross a firing threshold. Moreover, this balance is not a result of a fine tuning between excitatory and inhibitory weights, but is achieved dynamically.

There have been several theoretical approaches to explain the emergence of such balance. Perhaps the most influential of these theories was developed by C. van Vreeswijk and H. Sompolinsky in 1996. This theory of dynamic balance relies on the assumption that the number of  excitatory and inhibitory inputs to a cell, K, is large and that these inputs scale like 1/\sqrt(K). If external inputs to the network are strong, under fairly general conditions activity is irregular, and in a balanced regime: The average difference between the excitatory and inhibitory input to a cell is much smaller than either the excitatory input or inhibitory input itself. Ahmadian and Miller refer to this as the tightly balanced regime.

In contrast, excitation and inhibition still cancel approximately in loosely balanced networks. However, in such networks the residual input is comparable to the excitatory input, and cancellation is thus not as tight. This definition is too broad, however, and the authors also assume that the net input (excitation minus inhibition) driving each neuron grows sublinearly as a function of the external input. As shown previously by the authors and others, such a state emerges when the number of inputs to each cell is not too large, and each cell’s firing rate grows superlinearly with input strength. Under these conditions a sufficiently strong input to the network evokes fast inhibition that loosely balances excitation to prevent runaway activity.

Loose and tight balance can occur in the same model network, but loose balance occurs at intermediate external inputs, while tight balance emerges at high external input levels. While the transition between the two regimes is not sharp, the network behaves very differently in the two regimes: A tightly balanced network responds linearly to its inputs, while the response of a loosely balanced network can be nonlinear. Moreover, external input can be of the same order as the total input for loosely balanced networks, but must be much larger than the total input (of the same order as excitation and inhibition on their own) for tightly balanced networks.

Which of these two regimes describe the state of the cortex? Tightness of balance is difficult to measure directly, as one cannot isolate excitatory and inhibitory inputs to the same cell simultaneously. However, the authors present a number of strong, indirect arguments in favor of loose balance basing their argument on several experimental findings: 1) Recordings suggest that the ratio of the mean to the standard deviation excitatory input is not sufficiently large to necessitate precise cancellation from inhibition. This would put the network in the loosely balanced regime. Moreover, excitatory currents alone are not too strong, comparable to the voltage difference between the mean and threshold. 2) Cooling and silencing studies suggest that external input, e.g. from thalamus, to local cortical networks is comparable to the net input. This is consistent with loose balance, as tight balance is characterized by strong external inputs. 3) Perhaps most importantly cortical computations are nonlinear. Sublinear response summation, and surround suppression, for instance, can be implemented by loosely balanced networks. However, classical tightly balanced networks exhibit linear responses, and thus cannot implement these computations. 4) Tightly balanced networks are uncorrelated, and do not exhibit the stimulus modulated correlations observed in cortical networks.

These observations deserve a few comments: 1) The transition from tight to loose balance is gradual. It is therefore not exactly clear when, for instance, the mean excitatory input is sufficiently strong to require tight cancellation. As the authors suggest, some cortical areas may therefore lean more towards a tight balance, while others lean more towards loose balance. 2) It is unclear whether cooling reduces inputs to the cortical areas in question. 3 and 4) Classical tightly balanced networks are unstructured and are driven by uncorrelated inputs. Changes to these assumptions can result in networks that do exhibit a richer dynamical repertoire including, spatiotemporally structured, and correlated behaviors, as well as nonlinear computations.

Why does this this debate matter? The dynamical regime of the cortex describes how a population of neurons transforms its inputs, and thus the computations that a network can perform. The questions of which computations the cortex performs, and how it does so, are therefore closely related to questions about its dynamics. However, at present our answers are somewhat limited. Most current theories ignore several features of cortical networks that may impact their dynamics: There is a great diversity of cell types, particularly inhibitory cells, each with its own dynamical, and connectivity properties. It is likely that this diversity of cells shape the dynamical state of the network in a way that we do not yet fully understand. Moreover, the distribution of synaptic weights, and spatial synaptic interactions across the dendritic trees are not accurately captured in most models. It is possible, that these, and other, details are irrelevant, and current theories of balance are robust. However, this is not yet completely clear. Thus, while the authors make a strong case that the cortex is loosely balanced, a definitive answer to this question lies in the future.

Thanks go to Robert Rosenbaum for his input on this post.