Active inference, curiosity and insight.
 This has been my intuition for a while; you can learn abstract rules via active probing of the environment. This paper supports such intuitions with extensive scholarship.
 “The basic theme of this article is that one can cast learning, inference, and decision making as processes that resolve uncertanty about the world.
 References Schmidhuber 1991
 “A learner should choose a policy that also maximizes the learner’s predictive power. This makes the world both interesting and exploitable.” (Still and Precup 2012)
 “Our approach rests on the free energy principle, which asserts that any sentient creature must minimize the entropy of its sensory exchanges with the world.” Ok, that might be generalizing things too far..
 Levels of uncertainty:
 Perceptual inference, the causes of sensory outcomes under a particular policy
 Uncertainty about policies or about future states of the world, outcomes, and the probabilistic contingencies that bind them.
 For the last element (probabilistic contingencies between the world and outcomes), they employ Bayesian model selection / Bayesian model reduction
 Can occur not only on the data, but exclusively on the initial model itself.
 “We use simulations of abstract rule learning to show that contextsensitive contingiencies, which are manifest in a highdimensional space of latent or hidden states, can be learned with straightforward variational principles (ie. minimization of free energy).
 Assume that initial states and state transitions are known.
 Perception or inference about hidden states (i.e. state estimation) corresponds to inverting a generative model gievn a sequence of outcomes, while learning involves updating the parameters of the model.
 The actual task is quite simple: central fixation leads to a color cue. The cue + peripheral color determines either which way to saccade.
 Gestalt: Good intuitions, but I’m left with the impression that the authors overexplain and / or make the description more complicated that it need be.
 The actual number of parameters to to be inferred is rather small  3 states in 4 (?) dimensions, and these parameters are not hard to learn by minimizing the variational free energy:
 $F=D[Q(x)\mid \mid P(x)]{E}_{q}[\mathrm{ln}(P({o}_{t}\mid x)]$ where D is the KullbackLeibler divergence.
 Mean field approximation: $Q(x)$ is fully factored (not here). many more notes
