m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1497}
hide / / print
ref: -2017 tags: human level concept learning through probabalistic program induction date: 01-20-2020 15:45 gmt revision:0 [head]

PMID-26659050 Human level concept learning through probabalistic program induction

  • Preface:
    • How do people learn new concepts from just one or a few examples?
    • And how do people learn such abstract, rich, and flexible representations?
    • How can learning succeed from such sparse dataset also produce such rich representations?
    • For any theory of learning, fitting a more complicated model requires more data, not less, to achieve some measure of good generalization, usually in the difference between new and old examples.
  • Learning proceeds bu constructing programs that best explain the observations under a Bayesian criterion, and the model 'learns to learn' by developing hierarchical priors that allow previous experience with related concepts to ease learning of new concepts.
  • These priors represent learned inductive bias that abstracts the key regularities and dimensions of variation holding actoss both types of concepts and across instances.
  • BPL can construct new programs by reusing pieced of existing ones, capturing the causal and compositional properties of real-world generative processes operating on multiple scales.
  • Posterior inference requires searching the large combinatorial space of programs that could have generated a raw image.
    • Our strategy uses fast bottom-up methods (31) to propose a range of candidate parses.
    • That is, they reduce the character to a set of lines (series of line segments), then simply the intersection of those lines, and run a series of parses to estimate the generation of those lines, with heuristic criteria to encourage continuity (e.g. no sharp angles, penalty for abruptly changing direction, etc).
    • The most promising candidates are refined by using continuous optimization and local search, forming a discrete approximation to the posterior distribution P(program, parameters | image).

{1453}
hide / / print
ref: -2019 tags: lillicrap google brain backpropagation through time temporal credit assignment date: 03-14-2019 20:24 gmt revision:2 [1] [0] [head]

PMID-22325196 Backpropagation through time and the brain

  • Timothy Lillicrap and Adam Santoro
  • Backpropagation through time: the 'canonical' expansion of backprop to assign credit in recurrent neural networks used in machine learning.
    • E.g. variable rol-outs, where the error is propagated many times through the recurrent weight matrix, W TW^T .
    • This leads to the exploding or vanishing gradient problem.
  • TCA = temporal credit assignment. What lead to this reward or error? How to affect memory to encourage or avoid this?
  • One approach is to simply truncate the error: truncated backpropagation through time (TBPTT). But this of course limits the horizon of learning.
  • The brain may do BPTT via replay in both the hippocampus and cortex Nat. Neuroscience 2007, thereby alleviating the need to retain long time histories of neuron activations (needed for derivative and credit assignment).
  • Less known method of TCA uses RTRL Real-time recurrent learning forward mode differentiation -- δh t/δθ\delta h_t / \delta \theta is computed and maintained online, often with synaptic weight updates being applied at each time step in which there is non-zero error. See A learning algorithm for continually running fully recurrent neural networks.
    • Big problem: A network with NN recurrent units requires O(N 3)O(N^3) storage and O(N 4)O(N^4) computation at each time-step.
    • Can be solved with Unbiased Online Recurrent optimization, which stores approximate but unbiased gradient estimates to reduce comp / storage.
  • Attention seems like a much better way of approaching the TCA problem: past events are stored externally, and the network learns a differentiable attention-alignment module for selecting these events.
    • Memory can be finite size, extending, or self-compressing.
    • Highlight the utility/necessity of content-addressable memory.
    • Attentional gating can eliminate the exploding / vanishing / corrupting gradient problems -- the gradient paths are skip-connections.
  • Biologically plausible: partial reactivation of CA3 memories induces re-activation of neocortical neurons responsible for initial encoding PMID-15685217 The organization of recent and remote memories. 2005

  • I remain reserved about the utility of thinking in terms of gradients when describing how the brain learns. Correlations, yes; causation, absolutely; credit assignment, for sure. Yet propagating gradients as a means for changing netwrok weights seems at best a part of the puzzle. So much of behavior and internal cognitive life involves explicit, conscious computation of cause and credit.
  • This leaves me much more sanguine about the use of external memory to guide behavior ... but differentiable attention? Hmm.