## NeuroTheory Journal Club blog

### Featured

We are a cross-departmental student-run group, whose aim is to bring together the Houston computational neuroscience community (BCM/RICE/UH/UTHealth). We meet weekly to discuss papers. Every other week will be focused on our NeuroNex center project to infer graphical models for interactions between neurons and the world. Other weeks we will cover general topics in computational neuroscience, including cellular, systems, cognitive, stats, machine learning topics.

#### Meeting Time & Place: Friday @ 9:00am-10:00am, in BCM room S553.

Contact: KiJung [dot] Yoon [at] bcm [dot] edu

# Inferring structural connectivity using Ising couplings in models of neuronal networks

Uncovering the structure of cortical networks is a fundamental goal of neuroscience. Understanding how neuronal circuits are organized could help us understand, for instance, whether certain cell types connect preferentially to others. The patterns of connections could help explain the observed patterns of activity [1]. However, probing the patterns of synaptic connectivities directly using electrophysiological methods is difficult and expensive [2,3]. With the advent of new experimental techniques we can now record the concurrent activity of hundreds and even thousands of neurons in the cortex. It would be much simpler if we could infer the structure of cortical circuits directly from such recordings.

Inferring connectivity from activity is not a new idea: Cross-correlation functions measure the average impact of one cells’ spike on the activity of another directly from their concurrently measured spike trains. The idea that cross-correlations can be used to infer synaptic interactions between cells goes back to at least 1970 [4]. However, this early work also recognized several difficulties of this approach. Cross-correlations will reflect common inputs to the cells, and global patterns of activity of the observed populations. Disentangling synaptic interactions from these other effects is difficult, especially if only a fraction of a population is observed.

Assuming that the activity of an entire population $P$ has been observed, one approach to disentangling direct from indirect interactions between cells is to use partial correlations: the correlations between the residuals of two cells’ activities remaining after regression on the other cells’ activities, $P - \{A,B\}$. In other words, partial correlations are the correlations that remain between two neurons when their correlations with all other cells are removed. Again, this is an old idea [5], and other approaches have been proposed to tackle the problem: the connectivity inferred by fitting Ising models, generalized linear models, and other types of models have been proposed to uncover synaptic interactions.

In our recent journal club we discussed a recent addition by Kadirvelu, et al [6] to this fairly extensive body of literature. Here the authors asked how well thresholded partial correlations and thresholded weights obtained from fitting an Ising model can represent synaptic connectivity. The authors first simulated networks of 11 to 120 Izhikevich neurons under varying conditions, changing the firing rates, connectivity structure, etc., of the network. They then tried to recover the connectivity using the two methods, and compare the results to the actual ground truth used in the simulations. Synaptic weights were deemed unimportant, and instead binary matrices with 0s and 1s signifying the absence or presence of an interaction, respectively, were compared. As partial correlations do not reveal the direction of an interaction, the ground truth matrices were symmetrized before a comparison. The performance of each method was quantified by the area under the ROC curve obtained from varying the threshold. Low thresholds gives more false positives, and high thresholds more false negatives. Thus as the threshold is changed from low to high, both the fraction of falsely identified synaptic connections (false positives, FP), and the fraction of correctly identified connections (true positives, TP) both increase. The curve traced out by the false and true positive rate in FP-TP space is the ROC curve.

The main conclusion of the paper is that the performance of the methods depends on the level of correlations: At low correlations, fitting an Ising model works better, and at high correlations the partial correlation method works better. Other observations were not unexpected: increasing the firing rates improves inference (as the number of “interactions”, i.e. spikes increases). As the number of neurons increases, inference was harder, etc.

One has to ask what testing these models using simulations can tell us: These settings are highly idealized, and miss many of the features one would encounter with real data. One of the main issues is that latent inputs are not accounted for. In this particular case, all correlations were due to synaptic interactions between model cells, and all cells were observed. Global fluctuations can also induce strong correlations [7], completely overshadowing the effects of direct interactions [1]. There are many other subtleties: the direct inversion of the correlation matrix to obtain partial correlations is problematic, and typically some regularization is required [8]. Moreover, thresholding of inferred interaction weights to try to distinguish real interactions from fluctuations is known to give inconsistent estimators of interactions.

So is the inference of interactions a futile exercise? With the present data, inferring synaptic interactions is likely to be unsuccessful in all but the simplest settings. However, robustly inferring the strength of interactions is still worthwhile, even if these only measure statistical dependencies, rather than structural connections. Changes in such effective connectivity may reflect computations or mental states, and are hypothesized to change under working memory load [9]. Moreover, the effective connectivity could be modulated much more quickly than synaptic connectivity. However, as to which method is best at robustly uncovering such effective connectivity, the article we discussed is silent.

1. Rosenbaum, R., Smith, M. A., Kohn, A., Rubin, J. E., & Doiron, B. (2016). The spatial structure of correlated neuronal variability. Nature Neuroscience, 20(1), 107–114.

2. Jiang, X., Shen, S., Cadwell, C. R., Berens, P., Sinz, F., Ecker, A. S., et al. (2015). Principles of connectivity among morphologically defined cell types in adult neocortex. Science 350(6264).

3. Oswald, A.-M. M., & Reyes, A. D. (2008). Maturation of intrinsic and synaptic properties of layer 2/3 pyramidal neurons in mouse auditory cortex. Journal of Neurophysiology, 99(6), 2998–3008.

4. Moore, G. P., Segundo, J. P., Perkel, D. H., & Levitan, H. (1970). Statistical signs of synaptic interaction in neurons. Biophysical Journal, 10(9), 876–900.

5. Brillinger, D. R., Bryant, H. L., & Segundo, J. P. (1976). Identification of synaptic interactions. Biological Cybernetics, 22(4), 213–228.

6. Kadirvelu, B., Hayashi, Y., & Nasuto, S. J. (2017). Inferring structural connectivity using Ising couplings in models of neuronal networks. Scientific Reports, 7(1), 8156.

7. Ecker, A. S., Denfield, G. H., Bethge, M., & Tolias, A. S. (2015). On the structure of population activity under fluctuations in attentional state.

8. Yatsenko, D., Josić, K., Ecker, A. S., Froudarakis, E., Cotton, R. J., & Tolias, A. S. (2015). Improved estimation and interpretation of correlations in neural circuits. PLoS Computational Biology, 11(3), e1004083.

9. Pinotsis, D. A., Buschman, T. J., & Miller, E. K. (n.d.). Working Memory Load Modulates Neuronal Coupling.

# Bayesian Efficient Coding

On 15 sep 2017, we discussed Bayesian Efficient Coding by Il Memming Park and Jonathan Pillow.

As the title suggests, the authors aim to synthesize bayesian inference with efficient coding. The Bayesian brain hypothesis states that the brain computes posterior probabilities based on its model of the world (prior) and its sensory measurements (likelihood). Efficient coding assumes that the brain distributes its resources to maximize a cost, typically information. In particular, they note that efficient coding that optimizes mutual information is a special case of their more general framework, but ask whether other maximizations based on the Bayesian posterior might better explain data.

Denoting stimulus $x$, measurements $y$, and model parameters $\theta$, they use the following ingredients for their theory: a prior $p(x)$, a likelihood $p(y|x)$, an encoding capacity constraint $C(\theta)$, and a loss functional $L(\cdot)$. They assume that the brain is able to construct the true posterior $p(x|y,\theta)$. The goal is to find a model that optimizes the expected loss

$\bar{L}(\theta)=\mathbb{E}_{p(y|\theta)}\left[L(p(x|y,\theta))\right]$

under the constraint $C(\theta)\leq c$.

The loss functional is the key. The authors consider two things the loss might depend on: the posterior $L(p(x|y))$, or the ground truth $L(x,p(x|y))$. They needed to make the loss explicitly dependent on the posterior in order to optimize for mutual information. It was unclear whether they also considered a loss depending on both, which seems critical. We communicated with them and they said they’d clarify this in the next version.

They state that there is no clear a priori reason to maximize mutual information (or equivalently to minimize the average posterior entropy, since the prior is fixed). They give a nice example of a multiple choice test for which encodings that maximize information will achieve fewer correct answers than encodings that maximize percent correct for the MAP estimates. The ‘best’ answer depends on how one defines ‘best’.

After another few interesting gaussian examples, they revisit the famous Laughlin (1981) result on efficient coding in the blowfly. This was hailed as a triumph for efficient coding theory in predicting the nonlinear input-output photoreceptor curve derived directly from the measured prior over luminance. But here the authors found that instead a different loss function on the posterior gave a better fit. Interestingly, though, that loss function was based on a point estimate,

$L(x,p(x|y))=\mathbb{E}_{p(x|y)}\left[\left|x-\hat{x}(y)\right|^p\right]$

where the point estimate is the Bayesian optimum for this cost function and $p$ is a parameter. The limit $p\to 0$ gives the familiar entropy, $p=2$ is the conventional squared error, and the best fit to the data was $p=1/2$, a “square root loss.” It’s hard to provide any normative explanation of why this or any other choice is best (since the loss is basically the definition of ‘best’, and you’d have to relate the theoretical loss to some real consequences in the world), it is very interesting that the efficient coding solution explains data worse than their other Bayesian efficient coding losses.

Besides the minor confusion about whether their loss does/should include the ground truth $x$, and some minor disagreement about how much others have done things along this line (Ganguli and Simoncelli, Wei and Stocker, whom they do cite), my biggest question is whether the cost really should depend on the posterior as opposed to a point estimate. I’m a fan of Bayesianism, but ultimately one must take a single action, not a distribution. I discussed this with Jonathan over email, and he maintained that it’s important to distinguish an action from a point estimate of the stimulus: there’s a difference between the width of the river and whether to try to jump over it. I countered that one could refer actions back to the stimulus: the river is jumpable, or unjumpable (essentially a Gibsonian affordance). In a world of latent variables, any point estimate based on a posterior is a compromise based on the loss function.

So when should you keep around a posterior, rather than a point estimate? It may be that the appropriate loss function changes with context, and so the best point estimate would change too. While one could certainly consider that to be a bigger computation to produce a context-dependent point estimate, it may be more parsimonious to just represent information about the posterior directly.