Rats optimally accumulate and discount evidence in a dynamic environment

This link will take you to the presentation slides used in our journal club. The presented paper’s preprint is available here.

Normative models of evidence accumulation have proven useful in understanding behavioral and neural data in perceptual decision making tasks.  They allow us to understand how humans and animals use the available information to decide between alternatives. However, even for relatively simple tasks, normative models can be quite complex. A lot of recent work therefore aims to uncover when animals can learn the structure of a task, and use the available evidence in a way that is consistent with a normative model, and what computations they perform to do so. The paper under discussion presents the results of an experimental study in which rats perform a two-alternative forced choice task requiring the integration of sensory evidence. Importantly, the correct choice is not constant in time.

The stimulus consisted of two trains of auditory clicks, presented simultaneously to each one of the rat’s ear. The clicks were produced according to an inhomogeneous Poisson process with two possible instantaneous rates: r_1, r_2. The state of the environment was defined as the assignment of a rate to a specific ear. The experimenters forbade the assignment of the same rate to both ears. Thus at any instant in time, the environment was either in state S^1, meaning that the higher click rate was presented to the right ear and the lower rate to the left ear, or in state S^2, for the opposite assignment. The environment evolved as a telegraph process, alternating between the two statesS^1 and S^2,  with hazard rate h. Once prompted, a rat had to choose between two reward ports.  If it entered the correct port the one at the side of the highest rate it received a reward.

The study followed experimental setup of Brunton et al. 2013 and Erlich et al. 2015, and was inspired by the theoretical framework of recent Bayes’ optimal inference algorithms described in Veliz-Cuba et al. 2015 and Glaze et al. 2015 (a mathematical model from Brunton et al. 2013 is also revisited). The stated aim was to:

probe whether rodents can optimally discount evidence by adapting the timescale over which they accumulate it.

The authors reported the following main results:
  1. Optimal timescale for evidence discounting depends on both:
    1. environment volatility (the hazard rate)
    2. noise in sensory processing (modeled as the probability of mislocalizing a click)
  2. Rats accumulate evidence almost optimally, if both variables above are considered.
  3. Rats adapt their integration timescale to the volatility of the environment.
  4. The authors’ model makes quantitative predictions about timing of changes of mind.
  5. Overall, the paper establishes a quantitative behavioral framework to study adaptive evidence accumulation.
The first result above is derived mathematically. The optimal evidence accumulation equation is,
\displaystyle \frac{da}{dt} = \delta_{R,t}-\delta_{L,t}-\frac{2h}{\kappa}\sinh(\kappa a)
where,
a               is the posterior-odds ratio
\delta_{R,t},\ \ \delta_{L,t}  are the right and left auditory click trains (sum of delta functions)
h               is the hazard rate, or volatility of the environment

\kappa               is the click reliability parameter. It indicates how much evidence a single click provides.

The standard formula for the click reliability parameter is (assuming r_1>r_2), 
\displaystyle \kappa = \log\frac{r_1}{r_2}
Sensory noise is modeled by a probability, n, of a click being mislocalized, changing click reliability into: 
 \displaystyle \kappa=\log\frac{r_1\cdot(1-n)+r_2\cdot n}{r_2\cdot(1-n)+r_1\cdot n} 

Thus, sensory noise has the effect of reducing the distance between the two click rates (the numerator and denominator of the above fraction become the effective click rates to the rat), thereby increasing the difficulty of the task. In the supplementary material, the section Sensory noise parameterization details analyzes other types of sensory noise, such as the possibility of missing some clicks.

To obtain the second result, the authors performed a sequence of steps. First, they noted that the optimal inference model is well approximated by a linear model of the form,

\displaystyle \frac{da}{dt} = \delta_{R,t}-\delta_{L,t}-\lambda \cdot a

The discounting rate, \lambda, is found by numerical optimization; it is the \lambda that maximizes accuracy of the observer’s choice, for given r_1,r_2,h,n and trial duration. Note that for fixed task parameters, the discounting rate depends on the sensory noise n. Using the linear model allows a straightforward interpretation of the parameter \lambda, as the discounting rate of the accumulated evidence. The authors define the inverse of the discounting rate, 1/\lambda, as the integration timescale.

The second step of the analysis consisted of computing reverse correlation kernels for both the behaving rats, and the best linear model, from the same stimulus set. The reverse kernel curves were computed as follows. First, the click trains from trial i, in evidence space, were smoothed with a causal Gaussian filter k(t):
\displaystyle r_i(t)=\delta_{R,t}\star k(t) - \delta_{L,t}\star k(t).
Then, subtracting the expected click rate, given the true state of the environment at each point in time, yielded the normalized variable:
e_i(t)=r_i(t)-\langle r(t)|S_i(t)\rangle 
Finally, the excess click rate, which is the y-value of the reverse correlation curves, was computed by averaging the previous quantity over trials :
excess-rate(t|choice)=\langle e(t)|choice\rangle

As a third step, the authors verified that fitting an exponential, \displaystyle ae^{bt}, to a reverse correlation curve obtained from the linear model allowed them to back out the initial, true, discounting rate. That is, after the fit, they confirm that b is close to \lambda. This justified the application of the same procedure to the reverse kernel curves obtained from rats behavior. The authors found that the backed out discounting rates from the reverse kernel curves, obtained from rat behavior and from the linear model inference, are close to each other; provided that the sensory noise value reported in Brunton et al. 2013 is factored into the linear model. No quantitative measure of closeness’ is provided (see figure 4B in the paper).

In addition to the analysis described above, the authors fit (via Maximum Likelihood Estimation) a more detailed evidence accumulation model to each rat, in order to investigate the difference in sensory noise and integration timescale between individuals. The model from Brunton et al. 2013; Hanks et al. 2015 and Erlich et al. 2015, was revisited, removing the absorbing decision boundaries. The equations are,
\displaystyle da = (\delta_{R,t}\cdot\eta_R\cdot C - \delta_{L,t}\cdot\eta_L\cdot C)dt -\lambda\cdot adt+ \sigma_a dW,
\displaystyle \frac{dC}{dt} = \frac{1-C}{\tau_\phi}+(\phi-1)\cdot C\cdot(\delta_{R,t}+\delta_{L,t}),
where the additional variables are described below:
\eta_R, \ \eta_L   multiplicative Gaussian sensory noise applied to clicks (really to jump in evidence at each click)
C            additional adaptation process filtering the clicks
\sigma_a           variance of constant Gaussian additive noise
\phi            adaptation strength
\tau_{\phi}            adaptation time constant
The upshot of this second model analysis was that:
  1. The best fit discounting rate parameter, \lambda, is compared, for each rat, to the values of \lambda obtained on another cohort, in a static environment case, in Brunton et al. 2013. A clear separation between the two cohorts is apparent, indicating that rats in the dynamic environment tend to have much shorter integration time scales.
  2. With the previous linear model, a relationship between the best $latex \lambda$ and the theoretical level of sensory noise, n, was numerically explored. Here, the sensory noise of each rat is estimated from the model parameters (see section Calculating noise level from model parameters in the paper).  The pairs (\lambda, n) for each rat lie slightly off the theoretical curve from the linear model (Fig 5C in paper). The authors find that this is due to additional constraints generated by the more detailed model. Given the other parameters from the detailed model, the authors conclude that the rats still use the best possible discounting rate, for a given level of sensory noise.

The third result was only established in a preliminary fashion in the paper, insofar as the authors only tested 3 rats from their cohort. Each one of these three rats underwent three consecutive experimental phases (each phase during at least 25 daily trial sessions), with environmental hazard rate taking on the values 0.5 Hz, 0 Hz and 0.5 Hz, respectively. In other words, each rat underwent a phase in a dynamic environment, followed by a phase in a static environment, and further followed by a phase in a dynamic environment. The reverse correlation curves display a dramatic change in shape between the dynamic environment and the static phases. As expected from an adaptive decision maker, the reverse kernel curves are fairly flat in the static environment phase (indicating equal weighting of the evidence along the trial duration), but show a decay of old evidence weighting in the the dynamic environment case.

The authors do not present any analysis of the fourth result but point out that it is a potential from their model.

In conclusion, we believe that the dynamic clicks task experiment described in this paper is key for the study of adaptive evidence accumulation. The reported evidence that some rats are able to change their evidence discounting rate according to the environment’s volatility is convincing. On the theoretical side, we wonder whether additional models could produce similar reverse correlation curves, and this could represent a route for further research projects.

References

Brunton, B. W., Botvinick, M. M., and Brody, C. D. (2013). Rats and humans can optimally accumulate evidence for decision-making. Science, 340(6128):95–98.

Erlich, J. C., Brunton, B. W., Duan, C. A., Hanks, T. D., and Brody, C. D. (2015). Distinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence task in the rat. eLife, 4:e05457.

Glaze, C. M., Kable, J. W., and Gold, J. I. (2015). Normative evidence accumulation in unpredictable environments. eLife, 4:e08825.

Veliz-Cuba, A., Kilpatrick, Z. P., and Josic, K. (2016). Stochastic models of evidence accumulation in changing environments. SIAM Review.
This entry was posted in NeuroTheory by Adrian Radillo. Bookmark the permalink.

About Adrian Radillo

PhD student in mathematical neuroscience, under the supervision of Kresimir Josic, at the University of Houston. Aside from my research, I am fond of epistemological and ethical questions regarding human-machine interaction.

Leave a Reply

Your email address will not be published. Required fields are marked *