m8ta
use https for features.
text: sort by
tags: modified
type: chronology
{1453}
hide / / print
ref: -2019 tags: lillicrap google brain backpropagation through time temporal credit assignment date: 03-14-2019 20:24 gmt revision:2 [1] [0] [head]

PMID-22325196 Backpropagation through time and the brain

  • Timothy Lillicrap and Adam Santoro
  • Backpropagation through time: the 'canonical' expansion of backprop to assign credit in recurrent neural networks used in machine learning.
    • E.g. variable rol-outs, where the error is propagated many times through the recurrent weight matrix, W TW^T .
    • This leads to the exploding or vanishing gradient problem.
  • TCA = temporal credit assignment. What lead to this reward or error? How to affect memory to encourage or avoid this?
  • One approach is to simply truncate the error: truncated backpropagation through time (TBPTT). But this of course limits the horizon of learning.
  • The brain may do BPTT via replay in both the hippocampus and cortex Nat. Neuroscience 2007, thereby alleviating the need to retain long time histories of neuron activations (needed for derivative and credit assignment).
  • Less known method of TCA uses RTRL Real-time recurrent learning forward mode differentiation -- δh t/δθ\delta h_t / \delta \theta is computed and maintained online, often with synaptic weight updates being applied at each time step in which there is non-zero error. See A learning algorithm for continually running fully recurrent neural networks.
    • Big problem: A network with NN recurrent units requires O(N 3)O(N^3) storage and O(N 4)O(N^4) computation at each time-step.
    • Can be solved with Unbiased Online Recurrent optimization, which stores approximate but unbiased gradient estimates to reduce comp / storage.
  • Attention seems like a much better way of approaching the TCA problem: past events are stored externally, and the network learns a differentiable attention-alignment module for selecting these events.
    • Memory can be finite size, extending, or self-compressing.
    • Highlight the utility/necessity of content-addressable memory.
    • Attentional gating can eliminate the exploding / vanishing / corrupting gradient problems -- the gradient paths are skip-connections.
  • Biologically plausible: partial reactivation of CA3 memories induces re-activation of neocortical neurons responsible for initial encoding PMID-15685217 The organization of recent and remote memories. 2005

  • I remain reserved about the utility of thinking in terms of gradients when describing how the brain learns. Correlations, yes; causation, absolutely; credit assignment, for sure. Yet propagating gradients as a means for changing netwrok weights seems at best a part of the puzzle. So much of behavior and internal cognitive life involves explicit, conscious computation of cause and credit.
  • This leaves me much more sanguine about the use of external memory to guide behavior ... but differentiable attention? Hmm.

{1452}
hide / / print
ref: -2012 tags: DiCarlo Visual object recognition inferior temporal cortex dorsal ventral stream V1 date: 03-13-2019 22:24 gmt revision:1 [0] [head]

PMID-22325196 How Does the Brain Solve Visual Object Recognition

  • James DiCarlo, Davide Zoccolan, Nicole C Rust.
  • Infero-temporal cortex is organized into behaviorally relevant categories, not necessarily retinotopically, as demonstrated with TMS studies in humans, and lesion studies in other primates.
    • Synaptic transmission takes 1-2ms; dendritic propagation ?, axonal propagation ~1ms (e.g. pyramidal antidromic activation latency 1.2-1.3ms), so each layer can use several synapses for computation.
  • Results from the ventral stream computation can be well described by a firing rate code binned at ~ 50ms. Such a code can reliably describe and predict behavior
    • Though: this does not rule out codes with finer temporal resolution.
    • Though anyway: it may be inferential issue, as behavior operates at this timescale.
  • IT neurons' responses are sparse, but still contain information about position and size.
    • They are not narrowly tuned detectors, not grandmother cells; they are selective and complex but not narrow.
    • Indeed, IT neurons with the highest shape selectivities are the least tolerate to changes in position, scale, contrast, and visual clutter. (Zoccolan et al 2007)
    • Position information avoids the need to re-bind attributes with perceptual categories -- no need for syncrhony binding.
  • Decoded IT population activity of ~100 neurons exceeds artificial vision systems (Pinto et al 2010).
  • As in {1448}, there is a ~ 30x expansion of the number of neurons (axons?) in V1 vs the optic tract; serves to allow controlled sparsity.
  • Dispute in the field over primarily hierarchical & feed-forward vs. highly structured feedback being essential for performance (and learning?) of the system.
    • One could hypothesize that feedback signals help lower levels perform inference with noisy inputs; or feedback from higher layers, which is prevalent and manifest (and must be important; all that membrane is not wasted..)
    • DiCarlo questions if the re-entrant intra-area and inter-area communication is necessary for building object representations.
      • This could be tested with optogenetic approaches; since the publication, it may have been..
      • Feedback-type active perception may be evinced in binocular rivalry, or in visual illusions;
      • Yet 150ms immediate object recognition probably does not require it.
  • Authors propose thinking about neurons/local circuits as having 'job descriptions', an metaphor that couples neuroscience to human organization: who is providing feedback to the workers? Who is providing feeback as to job function? (Hinton 1995).
  • Propose local subspace untangling; when this is tacked and tiled, this is sufficient for object perception.
    • Indeed, modern deep convolutional networks behave this way; yet they still can't match human performance (perhaps not sparse enough, not enough representational capability)
    • Cite Hinton & Salakhutdinov 2006.
  • The AND-OR or conv-pooling architecture was proposed by Hubbel and Weisel back in 1962! In their paper's formulatin, they call it a Normalized non-linear model, NLN.
  1. Nonlinearities tend to flatten object manifolds; even with random weights, NLN models tend to produce easier to decode object identities, based on strength of normalization. See also {714}.
  2. NLNs are tuned / become tuned to the statistics of real images. But they do not get into discrimination / perception thereof..
  3. NLNs learn temporally: inputs that occur temporally adjacent lead to similar responses.
    1. But: scaades? Humans saccade 100 million times per year!
      1. This could be seen as a continuity prior: the world is unlikely to change between saccades, so one can infer the identity and positions of objects on the retina, which say can be used to tune different retinotopic IT neurons..
    2. See Li & DiCarlo -- manipulation of image statistics changing visual responses.
  • Regarding (3) above, perhaps attention is a modifier / learning gate?

{926}
hide / / print
ref: Nicolelis-1998.11 tags: spatiotemporal spiking nicolelis somatosensory tactile S1 3b microwire array rate temporal coding code date: 12-28-2011 20:42 gmt revision:3 [2] [1] [0] [head]

PMID-10196571[0] Simultaneous encoding of tactile information by three primate cortical areas

  • owl monkeys.
  • used microwires arrays to decode the location of tactile stimuli; location was encoded through te population, not within single units.
  • areas 3b, S1 & S2.
  • used LVQ (learning vector quantization) backprop, LDA to predict/ classify touch trials; all yielded about the same ~60% accuracy. Chance level 33%.
  • Interesting: "the spatiotemporal character of neuronal responses in the SII cortex was shown to contain the requisite information for the encoding of stimulus location using temporally patterned spike sequences, whereas the simultaneously recorded neuronal responses in areas 3b and 2 contained the requisite information for rate coding."
    • They support this result by varying bin widths and looking at the % of correctly classivied trials. in SII, increasing bin width decreases (slightly but significantly) the prediction accuracy.

____References____

[0] Nicolelis MA, Ghazanfar AA, Stambaugh CR, Oliveira LM, Laubach M, Chapin JK, Nelson RJ, Kaas JH, Simultaneous encoding of tactile information by three primate cortical areas.Nat Neurosci 1:7, 621-30 (1998 Nov)