PMID29074582 A generative vision model that trains with high data efficiency and breaks textbased CAPTCHAs
 Vicarious supplementary materials on their RCN (recursive cortical network).
 Factor scene into shape and appearance, which CNN or DCNN do not do  they conflate (ish? what about the style networks?)
 They call this the coloring book approach  extract shape then attach appearance.
 Hierarchy of feature layers $F_{f r c}$ (binary) and pooling layer $H_{f r c}$ (multinomial), where f is feature, r is row, c is column (e.g. over image space).
 Each layer is exclusively conditional on the layer above it, and all features in a layer are conditionally independent given the layer above.
 Pool variables $H_{f r c}$ is multinomial, and each value associated with a feature, plus one off feature.
 These features form a ‘pool’, which can/does have translation invariance.
 If any of the pool variables are set to enable $F$ , then that feature is set (oroperation). Many pools can contain a given feature.
 One can think of members of a pool as different alternatives of similar features.
 Pools can be connected laterally, so each is dependent on the activity of its neighbors. This can be used to enforce edge continuity.
 Each bottomlevel feature corresponds to an edge, which defines ‘in’ and ‘out’ to define shape, $Y$ .
 These variables $Y$ are also interconnected, and form a conditional random field, a ‘Potts model’. $Y$ is generated by gibbs sampling given the FH hierarchy above it.
 Below Y, the perpixel model X specifies texture with some conditional radial dependence.
 The model amounts to a probabalistic model for which exact inference is impossible  hence you must do approximate, where a bottom up pass estimates the category (with lateral connections turned off), and a top down estimates the object mask. Multiple passes can be done for multiple objects.
 Model has a hard time moving from rgb pixels to edge ‘in’ and ‘out’; they use edge detection preprocessing stage, e.g. Gabor filter.
 Training follows a very intuitive, hierarchical feature building heuristic, where if some object or collection of lower level features is not present, it’s added to the featurepool tree.
 This includes some winnertakeall heuristic for sparsification.
 Also greedily learn some sort of feature ‘’dictionary’’ from individual unlabeled images.
 Lateral connections are learned similarly, with a quasihebbian heuristic.
 Neuroscience inspiration: see refs 9, 98 for messagepassing based Bayesian inference.
 Overall, a very heuristic, detailcentric, iteratively generated model and set of algorithms. You get the sense that this was really the work of Dileep George or only a few people; that it was generated by successively patching and improving the model/algo to make up for observed failures and problems.
 As such, it offers little longterm vision for what is possible, or how perception and cognition occurs.
 Instead, proof is shown that, well, engineering works, and the space of possible solutions  including relatively simple elements like dictionaries and WTA  is large and fecund.
 Unclear how this will scale to even more complex realworld problems, where one would desire a solution that does not have to have each level carefully engineered.
 Modern DCNN, at least, do not seem to have this property  the structure is learned from the (alas, labeled) data.
 This extends to the fact that yes, their purposebuilt system achieves state of the art performance on the designated CAPATCHA tasks.
 Check: B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Humanlevel concept learning through probabilistic program induction. Science 350, 1332–1338 (2015). doi:10.1126/science.aab3050 Medline

PMID28777724 Active inference, curiosity and insight.
Karl J. Friston, Marco Lin, Christopher D. Frith, Giovanni Pezzulo,
 This has been my intuition for a while; you can learn abstract rules via active probing of the environment. This paper supports such intuitions with extensive scholarship.
 “The basic theme of this article is that one can cast learning, inference, and decision making as processes that resolve uncertanty about the world.
 References Schmidhuber 1991
 “A learner should choose a policy that also maximizes the learner’s predictive power. This makes the world both interesting and exploitable.” (Still and Precup 2012)
 “Our approach rests on the free energy principle, which asserts that any sentient creature must minimize the entropy of its sensory exchanges with the world.” Ok, that might be generalizing things too far..
 Levels of uncertainty:
 Perceptual inference, the causes of sensory outcomes under a particular policy
 Uncertainty about policies or about future states of the world, outcomes, and the probabilistic contingencies that bind them.
 For the last element (probabilistic contingencies between the world and outcomes), they employ Bayesian model selection / Bayesian model reduction
 Can occur not only on the data, but exclusively on the initial model itself.
 “We use simulations of abstract rule learning to show that contextsensitive contingiencies, which are manifest in a highdimensional space of latent or hidden states, can be learned with straightforward variational principles (ie. minimization of free energy).
 Assume that initial states and state transitions are known.
 Perception or inference about hidden states (i.e. state estimation) corresponds to inverting a generative model gievn a sequence of outcomes, while learning involves updating the parameters of the model.
 The actual task is quite simple: central fixation leads to a color cue. The cue + peripheral color determines either which way to saccade.
 Gestalt: Good intuitions, but I’m left with the impression that the authors overexplain and / or make the description more complicated that it need be.
 The actual number of parameters to to be inferred is rather small  3 states in 4 (?) dimensions, and these parameters are not hard to learn by minimizing the variational free energy:
 $F = D[Q(x)P(x)]  E_q[ln(P(o_tx)]$ where D is the KullbackLeibler divergence.
 Mean field approximation: $Q(x)$ is fully factored (not here). many more notes
