Learning essentially induces structure at a longer timescale, such that marginally independent individual trials (i.i.d. given the true parameters ),

are no longer *conditionally* independent given the estimated parameter :

A loss function allowing learning must now account for this longer timescale. Consequently, one can again (in principle) choose a point estimate that optimizes this new, broader loss, and this point estimate will necessarily involve evidence from multiple time points. So technically you don’t *need* a posterior even during learning.

But this is a much more complicated scheme than just online learning based on posteriors! Parsimony favors a posterior.

]]>reconsider my model of the world. It may be sufficient to keep another point estimate of “certainty” to do so. But going down this road requires keeping more and more information about the posterior. ]]>

To reply very quickly about the loss function: our intent was to say that the loss is a function of the posterior, not ONLY a function of the posterior. Thus it can depend on the posterior and other stuff (e.g. the true stimulus, as in the example cited).

One other comment: the Bayesian efficient coding framework we proposed is more general than Barlow’s classical efficient coding, because it includes that as a special case, but it is also more general than settings based on estimation errors in point estimates (e.g. MSE), since it also includes those as special cases. But it also extends to cases where loss depends on a decision or action (assuming Bayesian actions or decisions, i.e., those computed using an integral over the posterior).

(This is NOT to say the brain always keeps around a full posterior over the stimulus; BEC is just a framework for normatively optimal coding that (in our view) synthesizes and generalizes a bunch of previous work in this area.)

Thanks again for the comments, we will do our best to clarify these points in the revision!

]]>