use https for features.
text: sort by
tags: modified
type: chronology
[0] Schmidt EM, McIntosh JS, Durelli L, Bak MJ, Fine control of operantly conditioned firing patterns of cortical neurons.Exp Neurol 61:2, 349-69 (1978 Sep 1)[1] Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP, Instant neural control of a movement signal.Nature 416:6877, 141-2 (2002 Mar 14)[2] Fetz EE, Operant conditioning of cortical unit activity.Science 163:870, 955-8 (1969 Feb 28)[3] Fetz EE, Finocchio DV, Operant conditioning of specific patterns of neural and muscular activity.Science 174:7, 431-5 (1971 Oct 22)[4] Fetz EE, Finocchio DV, Operant conditioning of isolated activity in specific muscles and precentral cells.Brain Res 40:1, 19-23 (1972 May 12)[5] Fetz EE, Baker MA, Operantly conditioned patterns on precentral unit activity and correlated responses in adjacent cells and contralateral muscles.J Neurophysiol 36:2, 179-204 (1973 Mar)

hide / / print
ref: -2018 tags: Michael Levin youtube talk NIPS 2018 regeneration bioelectricity organism patterning flatworm date: 04-09-2019 18:50 gmt revision:1 [0] [head]

What Bodies Think About: Bioelectric Computation Outside the Nervous System - NeurIPS 2018

  • Short notes from watching the video, mostly interesting factoids: (This is a somewhat more coordinated narrative in the video. Am resisting ending each of these statements with and exclamation point).
  • Human children up to 7-11 years old can regenerate their fingertips.
  • Human embryos, when split in half early, develop into two normal humans; mouse embryos, when squished together, make one normal mouse.
  • Butterflies retain memories from their caterpillar stage, despite their brains liquefying during metamorphosis.
  • Flatworms are immortal, and can both grow and contract, as the environment requires.
    • They can also regenerate a whole body from segments, and know to make one head, tail, gut etc.
  • Single cell organisms, e.g. Lacrymaria, can have complex (and fast!) foraging / hunting plans -- without a brain or anything like it.
  • Axolotl can regenerate many parts of their body (appendages etc), including parts of the nervous system.
  • Frog embryos can self-organize an experimenter jumbled body plan, despite the initial organization having never been experienced in evolution.
  • Salamanders, when their tail is grafted into a foot/leg position, remodel the transplant into a leg and foot.
  • Neurotransmitters are ancient; fungi, who diverged from other forms of life about 1.5 billion years ago, still use the same set of inter-cell transmitters e.g. serotonin, which is why modulatory substances from them have high affinity & a strong effect on humans.
  • Levin, collaborators and other developmental biologists have been using voltage indicators in embryos ... this is not just for neurons.
  • Can make different species head shapes in flatworms by exposing them to ion-channel modulating drugs. This despite the fact that the respective head shapes are from species that have been evolving separately for 150 million years.
  • Indeed, you can reprogram (with light gated ion channels, drugs, etc) to body shapes not seen in nature or not explored by evolution.
    • That said, this was experimental, not by design; Levin himself remarks that the biology that generates these body plans is not known.
  • Flatworms can sore memory in bioelectric networks.
  • Frogs don't normally regenerate their limbs. But, with a drug cocktail targeting bioelectric signaling, they can regenerate semi-functional legs, complete with nerves, muscle, bones, and cartilage. The legs are functional (enough).
  • Manipulations of bioelectric signaling can reverse very serious genetic problems, e.g. deletion of Notch, to the point that tadpoles regain some ability for memory creation & recall.

  • I wonder how so much information can go through a the apparently scalar channel of membrane voltage. It seems you'd get symbol interference, and that many more signals would be required to pattern organs.
  • That said, calcium is used a great many places in the cell for all sorts of signaling tasks, over many different timescales as well, and it doesn't seem to be plagued by interference.
    • First question from the audience was how cells differentiate organismal patterning signals and behavioral signals, e.g. muscle contraction.

hide / / print
ref: -2017 tags: V1 V4 visual cortex granger causality date: 03-20-2019 06:00 gmt revision:0 [head]

PMID-28739915 Interactions between feedback and lateral connections in the primary visual cortex

  • Liang H1, Gong X1, Chen M2,3, Yan Y2,3, Li W4,3, Gilbert CD5.
  • Extracellular ephys on V1 and V4 neurons in macaque monkeys trained on a fixation and saccade task.
  • Contour task: monkeys had to select the patch of lines, chosen to stimulate the recorded receptive fields, which had a continuous contour in it (again chosen to elicit a response in the recorded V1 / V4 neurons).
    • Variable length of the contour: 1, 3, 5, 7 bars. First part of analysis: only 7-bar trials.
  • Granger causality (GC) in V1 horizontal connectivity decreased significantly in the 0-30Hz band after taking into account V4 activity. Hence, V4 explains some of the causal activity in V1.
    • This result holds both with contour-contour (e.g. cells both tuned to the contours in V1), contour-background, and background-background.
    • Yet there was a greater change in the contour-BG and BG-contour cells when V4 was taken into account (Granger causality is directional, like KL divergence).
      • This result passes the shuffle test, where tria identities were shuffled.
      • True also when LFP is measured.
      • That said .. even though GC is sensitive to temporal features, might be nice to control with a distant area.
      • See supplementary figures (of which there are a lot) for the controls.
  • Summarily: Feedback from V4 strengthens V1 lateral connections.
  • Then they looked at trials with a variable number of contour bars.
  • V4 seems to have a greater GC influence on background cells relative to contour cells.
  • Using conditional GC, lateral interactions in V1 contribute more to contour integration than V4.
  • Greater GC in correct trials than incorrect trials.

  • Note: differences in firing rate can affect estimation of GC. Hence, some advise using thinning of the spike trains to yield parity.
  • Note: refs for horizontal connections in V1 [7-10, 37]

hide / / print
ref: -2014 tags: gold nanowires intracellular recording korea date: 03-18-2019 23:02 gmt revision:1 [0] [head]

PMID-25112683 Subcellular Neural Probes from Single-Crystal Gold Nanowires

  • Korean authors... Mijeong Kang,† Seungmoon Jung,‡ Huanan Zhang,⊥ Taejoon Kang,∥ Hosuk Kang,† Youngdong Yoo,† Jin-Pyo Hong,# Jae-Pyoung Ahn,⊗ Juhyoun Kwak,† Daejong Jeon,‡* Nicholas A. Kotov,⊥* and Bongsoo Kim†*
  • 100nm single-crystal Au.
  • Able to get SUA despite size.
  • Springy, despite properties of bulk Au.
  • Nanowires fabricated on a sapphire substrae and picked up by a fine shapr W probe, then varnished with nail polish.

hide / / print
ref: -2011 tags: ttianium micromachining chlorine argon plasma etch oxide nitride penetrating probes Kevin Otto date: 03-18-2019 22:57 gmt revision:1 [0] [head]

PMID-21360044 Robust penetrating microelectrodes for neural interfaces realized by titanium micromachining

  • Patrick T. McCarthyKevin J. OttoMasaru P. Rao
  • Used Cl / Ar plasma to deep etch titanium film, 0.001 / 25um thick. Fine Metals Corp Ashland VA.
  • Discuss various insulation (oxide /nitride) failure modes, lithography issues.

hide / / print
ref: -0 tags: credit assignment distributed feedback alignment penn state MNIST fashion backprop date: 03-16-2019 02:21 gmt revision:1 [0] [head]

Conducting credit assignment by aligning local distributed representations

  • Alexander G. Ororbia, Ankur Mali, Daniel Kifer, C. Lee Giles
  • Propose two related algorithms: Local Representation Alignment (LRA)-diff and LRA-fdbk.
    • LRA-diff is basically a modified form of backprop.
    • LRA-fdbk is a modified version of feedback alignment. {1432} {1423}
  • Test on MNIST (easy -- many digits can be discriminated with one pixel!) and fashion-MNIST (harder -- humans only get about 85% right!)
  • Use a Cauchy or log-penalty loss at each layer, which is somewhat unique and interesting: L(z,y)= i=1 nlog(1+(y iz i) 2)L(z,y) = \sum_{i=1}^n{ log(1 + (y_i - z_i)^2)} .
    • This is hence a saturating loss.
  1. Normal multi-layer-perceptron feedforward network. pre activation h h^\ell and post activation z z^\ell are stored.
  2. Update the weights to minimize loss. This gradient calculation is identical to backprop, only they constrain the update to have a norm no bigger than c 1c_1 . Z and Y are actual and desired output of the layer, as commented. Gradient includes the derivative of the nonlinear activation function.
  3. Generaete update for the pre-nonlinearity h 1h^{\ell-1} to minimize the loss in the layer above. This again is very similar to backprop; its' the chain rule -- but the derivatives are vectors, of course, so those should be element-wise multiplication, not outer produts (i think).
    1. Note hh is updated -- derivatives of two nonlinearities.
  4. Feedback-alignment version, with random matrix E E_{\ell} (elements drawn from a gaussian distribution, σ=1\sigma = 1 ish.
    1. Only one nonlinearity derivative here -- bug?
  5. Move the rep and post activations in the specified gradient direction.
    1. Those h¯ 1\bar{h}^{\ell-1} variables are temporary holding -- but note that both lower and higher layers are updated.
  6. Do this K of times, K=1-50.
  • In practice K=1, with the LRA-fdbk algorithm, for the majority of the paper -- it works much better than LRA-diff (interesting .. bug?). Hence, this basically reduces to feedback alignment.
  • Demonstrate that LRA works much better with small initial weights, but basically because they tweak the algorithm to do this.
    • Need to see a positive control for this to be conclusive.
    • Again, why is FA so different from LRA-fdbk? Suspicious. Positive controls.
  • Attempted a network with Local Winner Take All (LWTA), which is a hard nonlinearity that LFA was able to account for & train through.
  • Also used Bernoulli neurons, and were able to successfully train. Unlike drop-out, these were stochastic at test time, and things still worked OK.

Lit review.
  • Logistic sigmoid can slow down learning, due to it's non-zero mean (Glorot & Bengio 2010).
  • Recirculation algorithm (or generalized recirculation) is a precursor for target propagation.
  • Target propagation is all about the inverse of the forward propagation: if we had access to the inverse of the network of forward propagations, we could compute which input values at the lower levels of the network would result in better values at the top that would please the global cost.
    • This is a very different way of looking at it -- almost backwards!
    • And indeed, it's not really all that different from contrastive divergence. (even though CD doesn't work well with non-Bernoulli units)
  • Contractive Hebbian learning also has two phases, one to fantasize, and done to try to make the fantasies look more like the input data.
  • Decoupled neural interfaces (Jaderberg et al 2016): learn a predictive model of error gradients (and inputs) nistead of trying to use local information to estimate updated weights.

  • Yeah, call me a critic, but I'm not clear on the contribution of this paper; it smells precocious and over-sold.
    • Even the title. I was hoping for something more 'local' than per-layer computation. BP does that already!
  • They primarily report supportive tests, not discriminative or stressing tests; how does the algorithm fail?
    • Certainly a lot of work went into it..
  • I still don't see how the computation of a target through a ransom matrix, then using delta/loss/error between that target and the feedforward activation to update weights, is much different than propagating the errors directly through a random feedback matrix. Eg. subtract then multiply, or multiply then subtract?

hide / / print
ref: -2011 tags: Andrew Ng high level unsupervised autoencoders date: 03-15-2019 06:09 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Building High-level Features Using Large Scale Unsupervised Learning

  • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
  • Input data 10M random 200x200 frames from youtube. Each video contributes only one frame.
  • Used local receptive fields, to reduce the communication requirements. 1000 computers, 16 cores each, 3 days.
  • "Strongly influenced by" Olshausen & Field {1448} -- but this is limited to a shallow architecture.
  • Lee et al 2008 show that stacked RBMs can model simple functions of the cortex.
  • Lee et al 2009 show that convolutonal DBN trained on faces can learn a face detector.
  • Their architecture: sparse deep autoencoder with
    • Local receptive fields: each feature of the autoencoder can connect to only a small region of the lower layer (e.g. non-convolutional)
      • Purely linear layer.
      • More biologically plausible & allows the learning of more invariances other than translational invariances (Le et al 2010).
      • No weight sharing means the network is extra large == 1 billion weights.
        • Still, the human visual cortex is about a million times larger in neurons and synapses.
    • L2 pooling (Hyvarinen et al 2009) which allows the learning of invariant features.
      • E.g. this is the square root of the sum of the squares of its inputs. Square root nonlinearity.
    • Local contrast normalization -- subtractive and divisive (Jarrett et al 2009)
  • Encoding weights W 1W_1 and deconding weights W 2W_2 are adjusted to minimize the reconstruction error, penalized by 0.1 * the sparse pooling layer activation. Latter term encourages the network to find invariances.
  • minimize(W 1,W 2) minimize(W_1, W_2) i=1 m(||W 2W 1 Tx (i)x (i)|| 2 2+λ j=1 kε+H j(W 1 Tx (i)) 2) \sum_{i=1}^m {({ ||W_2 W_1^T x^{(i)} - x^{(i)} ||^2_2 + \lambda \sum_{j=1}^k{ \sqrt{\epsilon + H_j(W_1^T x^{(i)})^2}} })}
    • H jH_j are the weights to the j-th pooling element, λ=0.1\lambda = 0.1 ; m examples; k pooling units.
    • This is also known as reconstruction Topographic Independent Component Analysis.
    • Weights are updated through asynchronous SGD.
    • Minibatch size 100.
    • Note deeper autoencoders don't fare consistently better.

hide / / print
ref: -2018 tags: biologically inspired deep learning feedback alignment direct difference target propagation date: 03-15-2019 05:51 gmt revision:5 [4] [3] [2] [1] [0] [head]

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

  • Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap
  • As is known, many algorithms work well on MNIST, but fail on more complicated tasks, like CIFAR and ImageNet.
  • In their experiments, backprop still fares better than any of the biologically inspired / biologically plausible learning rules. This includes:
    • Feedback alignment {1432} {1423}
    • Vanilla target propagation
      • Problem: with convergent networks, layer inverses (top-down) will map all items of the same class to one target vector in each layer, which is very limiting.
      • Hence this algorithm was not directly investigated.
    • Difference target propagation (2015)
      • Uses the per-layer target as h^ l=g(h^ l+1;λ l+1)+[h lg(h l+1;λ l+1)]\hat{h}_l = g(\hat{h}_{l+1}; \lambda_{l+1}) + [h_l - g(h_{l+1};\lambda_{l+1})]
      • Or: h^ l=h l+g(h^ l+1;λ l+1)g(h l+1;λ l+1)\hat{h}_l = h_l + g(\hat{h}_{l+1}; \lambda_{l+1}) - g(h_{l+1};\lambda_{l+1}) where λ l\lambda_{l} are the parameters for the inverse model; g()g() is the sum and nonlinearity.
      • That is, the target is modified ala delta rule by the difference between inverse-propagated higher layer target and inverse-propagated higher level activity.
        • Why? h lh_{l} should approach h^ l\hat{h}_{l} as h l+1h_{l+1} approaches h^ l+1\hat{h}_{l+1} .
        • Otherwise, the parameters in lower layers continue to be updated even when low loss is reached in the upper layers. (from original paper).
      • The last to penultimate layer weights is trained via backprop to prevent template impoverishment as noted above.
    • Simplified difference target propagation
      • The substitute a biologically plausible learning rule for the penultimate layer,
      • h^ L1=h L1+g(h^ L;λ L)g(h L;λ L)\hat{h}_{L-1} = h_{L-1} + g(\hat{h}_L;\lambda_L) - g(h_L;\lambda_L) where there are LL layers.
      • It's the same rule as the other layers.
      • Hence subject to impoverishment problem with low-entropy labels.
    • Auxiliary output simplified difference target propagation
      • Add a vector zz to the last layer activation, which carries information about the input vector.
      • zz is just a set of random features from the activation h L1h_{L-1} .
  • Used both fully connected and locally-connected (e.g. convolution without weight sharing) MLP.
  • It's not so great:
  • Target propagation seems like a weak learner, worse than feedback alignment; not only is the feedback limited, but it does not take advantage of the statistics of the input.
    • Hence, some of these schemes may work better when combined with unsupervised learning rules.
    • Still, in the original paper they use difference-target propagation with autoencoders, and get reasonable stroke features..
  • Their general result that networks and learning rules need to be tested on more difficult tasks rings true, and might well be the main point of this otherwise meh paper.

hide / / print
ref: -2019 tags: lillicrap google brain backpropagation through time temporal credit assignment date: 03-14-2019 20:24 gmt revision:2 [1] [0] [head]

PMID-22325196 Backpropagation through time and the brain

  • Timothy Lillicrap and Adam Santoro
  • Backpropagation through time: the 'canonical' expansion of backprop to assign credit in recurrent neural networks used in machine learning.
    • E.g. variable rol-outs, where the error is propagated many times through the recurrent weight matrix, W TW^T .
    • This leads to the exploding or vanishing gradient problem.
  • TCA = temporal credit assignment. What lead to this reward or error? How to affect memory to encourage or avoid this?
  • One approach is to simply truncate the error: truncated backpropagation through time (TBPTT). But this of course limits the horizon of learning.
  • The brain may do BPTT via replay in both the hippocampus and cortex Nat. Neuroscience 2007, thereby alleviating the need to retain long time histories of neuron activations (needed for derivative and credit assignment).
  • Less known method of TCA uses RTRL Real-time recurrent learning forward mode differentiation -- δh t/δθ\delta h_t / \delta \theta is computed and maintained online, often with synaptic weight updates being applied at each time step in which there is non-zero error. See A learning algorithm for continually running fully recurrent neural networks.
    • Big problem: A network with NN recurrent units requires O(N 3)O(N^3) storage and O(N 4)O(N^4) computation at each time-step.
    • Can be solved with Unbiased Online Recurrent optimization, which stores approximate but unbiased gradient estimates to reduce comp / storage.
  • Attention seems like a much better way of approaching the TCA problem: past events are stored externally, and the network learns a differentiable attention-alignment module for selecting these events.
    • Memory can be finite size, extending, or self-compressing.
    • Highlight the utility/necessity of content-addressable memory.
    • Attentional gating can eliminate the exploding / vanishing / corrupting gradient problems -- the gradient paths are skip-connections.
  • Biologically plausible: partial reactivation of CA3 memories induces re-activation of neocortical neurons responsible for initial encoding PMID-15685217 The organization of recent and remote memories. 2005

  • I remain reserved about the utility of thinking in terms of gradients when describing how the brain learns. Correlations, yes; causation, absolutely; credit assignment, for sure. Yet propagating gradients as a means for changing netwrok weights seems at best a part of the puzzle. So much of behavior and internal cognitive life involves explicit, conscious computation of cause and credit.
  • This leaves me much more sanguine about the use of external memory to guide behavior ... but differentiable attention? Hmm.

hide / / print
ref: -2012 tags: DiCarlo Visual object recognition inferior temporal cortex dorsal ventral stream V1 date: 03-13-2019 22:24 gmt revision:1 [0] [head]

PMID-22325196 How Does the Brain Solve Visual Object Recognition

  • James DiCarlo, Davide Zoccolan, Nicole C Rust.
  • Infero-temporal cortex is organized into behaviorally relevant categories, not necessarily retinotopically, as demonstrated with TMS studies in humans, and lesion studies in other primates.
    • Synaptic transmission takes 1-2ms; dendritic propagation ?, axonal propagation ~1ms (e.g. pyramidal antidromic activation latency 1.2-1.3ms), so each layer can use several synapses for computation.
  • Results from the ventral stream computation can be well described by a firing rate code binned at ~ 50ms. Such a code can reliably describe and predict behavior
    • Though: this does not rule out codes with finer temporal resolution.
    • Though anyway: it may be inferential issue, as behavior operates at this timescale.
  • IT neurons' responses are sparse, but still contain information about position and size.
    • They are not narrowly tuned detectors, not grandmother cells; they are selective and complex but not narrow.
    • Indeed, IT neurons with the highest shape selectivities are the least tolerate to changes in position, scale, contrast, and visual clutter. (Zoccolan et al 2007)
    • Position information avoids the need to re-bind attributes with perceptual categories -- no need for syncrhony binding.
  • Decoded IT population activity of ~100 neurons exceeds artificial vision systems (Pinto et al 2010).
  • As in {1448}, there is a ~ 30x expansion of the number of neurons (axons?) in V1 vs the optic tract; serves to allow controlled sparsity.
  • Dispute in the field over primarily hierarchical & feed-forward vs. highly structured feedback being essential for performance (and learning?) of the system.
    • One could hypothesize that feedback signals help lower levels perform inference with noisy inputs; or feedback from higher layers, which is prevalent and manifest (and must be important; all that membrane is not wasted..)
    • DiCarlo questions if the re-entrant intra-area and inter-area communication is necessary for building object representations.
      • This could be tested with optogenetic approaches; since the publication, it may have been..
      • Feedback-type active perception may be evinced in binocular rivalry, or in visual illusions;
      • Yet 150ms immediate object recognition probably does not require it.
  • Authors propose thinking about neurons/local circuits as having 'job descriptions', an metaphor that couples neuroscience to human organization: who is providing feedback to the workers? Who is providing feeback as to job function? (Hinton 1995).
  • Propose local subspace untangling; when this is tacked and tiled, this is sufficient for object perception.
    • Indeed, modern deep convolutional networks behave this way; yet they still can't match human performance (perhaps not sparse enough, not enough representational capability)
    • Cite Hinton & Salakhutdinov 2006.
  • The AND-OR or conv-pooling architecture was proposed by Hubbel and Weisel back in 1962! In their paper's formulatin, they call it a Normalized non-linear model, NLN.
  1. Nonlinearities tend to flatten object manifolds; even with random weights, NLN models tend to produce easier to decode object identities, based on strength of normalization. See also {714}.
  2. NLNs are tuned / become tuned to the statistics of real images. But they do not get into discrimination / perception thereof..
  3. NLNs learn temporally: inputs that occur temporally adjacent lead to similar responses.
    1. But: scaades? Humans saccade 100 million times per year!
      1. This could be seen as a continuity prior: the world is unlikely to change between saccades, so one can infer the identity and positions of objects on the retina, which say can be used to tune different retinotopic IT neurons..
    2. See Li & DiCarlo -- manipulation of image statistics changing visual responses.
  • Regarding (3) above, perhaps attention is a modifier / learning gate?

hide / / print
ref: Schmidt-1978.09 tags: Schmidt BMI original operant conditioning cortex HOT pyramidal information antidromic date: 03-12-2019 23:35 gmt revision:11 [10] [9] [8] [7] [6] [5] [head]

PMID-101388[0] Fine control of operantly conditioned firing patterns of cortical neurons.

  • Hand-arm area of M1, 11 or 12 chronic recording electrodes, 3 monkeys.
    • But, they only used one unit at a time in the conditioning task.
  • Observed conditioning in 77% of single units and 65% of combined units (multiunits?).
  • Trained to move a handle to a position indicated by 8 annular cursor lights.
    • Cursor was updated at 50hz -- this was just a series of lights! talk about simple feedback...
    • Investigated different smoothing: too fast, FR does not stay in target; too slow, cursor acquires target too slowly.
      • My gamma function is very similar to their lowpass filter used for smoothing the firing rates.
    • 4 or 8 target random tracking task
    • Time-out of 8 seconds
    • Run of 40 trials
      • The conditioning reached a significant level of performance after 2.2 runs of 40 trials (in well-trained monkeys); typically, they did 18 runs/day (720 trials)
  • Recordings:
    • Scalar mapping of unit firing rate to cursor position.
    • Filtered 600-6kHz
    • Each accepted spike triggered a generator that produced a pulse of of constant amplitude and width -> this was fed into a lowpass filter (1.5 to 2.5 & 3.5Hz cutoff), and a gain stage, then a ADC, then (presumably) the PDP.
      • can determine if these units were in the pyramidal tract by measuring antidromic delay.
    • recorded one neuron for 108 days!!
      • Neuronal activity is still being recorded from one monkey 24 months after chronic implantation of the microelectrodes.
    • Average period in which conditioning was attempted was 3.12 days.
  • Successful conditioning was always associated with specific repeatable limb movements
    • "However, what appears to be conditioned in these experiments is a movement, and the neuron under study is correlated with that movement." YES.
    • The monkeys clearly learned to make (increasingly refined) movement to modulate the firing activity of the recorded units.
    • The monkey learned to turn off certain units with specific limb positions; the monkey used exaggerated movements for these purposes.
      • e.g. finger and shoulder movements, isometric contraction in one case.
  • Trained some monkeys or > 15 months; animals got better at the task over time.
  • PDP-12 computer.
  • Information measure: 0 bits for missed targets, 2 for a 4 target task, 3 for 8 target task; information rate = total number of bits / time to acquire targets.
    • 3.85 bits/sec peak with 4 targets, 500ms hold time
    • With this, monkeys were able to exert fine control of firing rate.
    • Damn! compare to Paninski! [1]
  • 4.29 bits/sec when the same task was performed with a manipulandum & wrist movement
  • they were able to condition 77% of individual neurons and 65% of combined units.
  • Implanted a pyramidal tract electrode in one monkey; both cells recorded at that time were pyramidal tract neurons, antidromic latencies of 1.2 - 1.3ms.
    • Failures had no relation to over movements of the monkey.
  • Fetz and Baker [2,3,4,5] found that 65% of precentral neurons could be conditioned for increased or decreased firing rates.
    • and it only took 6.5 minutes, on average, for the units to change firing rates!
  • Summarized in [1].


hide / / print
ref: -0 tags: sparse coding reference list olshausen field date: 03-11-2019 21:59 gmt revision:3 [2] [1] [0] [head]

This was compiled from searching papers which referenced Olhausen and Field 1996 PMID-8637596 Emergence of simple-cell receptive field properties by learning a sparse code for natural images.

hide / / print
ref: -2018 tags: sparse representation auditory cortex excitatation inhibition balance date: 03-11-2019 20:47 gmt revision:1 [0] [head]

PMID-30307493 Sparse Representation in Awake Auditory Cortex: Cell-type Dependence, Synaptic Mechanisms, Developmental Emergence, and Modulation.

  • Sparse representation arises during development in an experience-dependent manner, accompanied by differential changes of excitatory input strength and a transition from unimodal to bimodal distribution of E/I ratios.

hide / / print
ref: -2015 tags: conjugate light electron tomography mouse visual cortex fluorescent label UNC cryoembedding date: 03-11-2019 19:37 gmt revision:1 [0] [head]

PMID-25855189 Mapping Synapses by Conjugate Light-Electron Array Tomography

  • Use aligned interleaved immunofluorescence imaging follwed by array EM (FESEM). 70nm thick sections.
  • Of IHC, tissue must be dehydrated & embedded in a resin.
  • However, the dehydration disrupts cell membranes and ultrastructural details viewed via EM ...
  • Hence, EM microscopy uses osmium tetroxide to cross-link the lipids.
  • ... Yet that also disrupt / refolds the poteins, making IHC fail.
  • Solution is to dehydrate & embed at cryo temp, -70C, where the lipids do not dissolve. They used Lowicryl HM-20.
  • We show that cryoembedding provides markedly improved ultrastructure while still permitting multiplexed immunohistochemistry.

hide / / print
ref: -2012 tags: octopamine STDP locust LTP LTD olfactory bulb date: 03-11-2019 18:59 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-22278062 Conditional modulation of spike-timing-dependent plasticity for olfactory learning.

  • Looked at the synapes from the Muschroom body (Kenyon cells, sparse code) to the beta-lobe (bLN) in locusts.
  • Used in-vivo dendrite patch, sharp micropipette.
  • Found that, with a controlled mushroom body extracellular stim for plasticity induction protocol at the KC-> bLN synapese, were able to get potentiation and depression in accord with STDP.
  • This STDP became pure depression in the presence of octopamine
  • See also / supercedes: Synaptic Learning Rules and Sparse Coding in a Model Sensory System Luca A. Finelli ,Seth Haney, Maxim Bazhenov, Mark Stopfer, Terrence J. Sejnowski 2008

hide / / print
ref: -2004 tags: Olshausen sparse coding review date: 03-08-2019 07:02 gmt revision:0 [head]

PMID-15321069 Sparse coding of sensory inputs

  • Classic review, Olshausen and Field. 15 years old now!
  • Note the sparsity here is in neuronal activation, not synaptic activity (though one should follow the other).
  • References Lewicki's auditory studies, Efficient coding of natural sounds 2002; properties of early auditory neurons are well suited for producing a sparse independent code.
    • Studies have found near binary encoding of stimuli in rat auditory cortex -- e.g. one spike per noise.
  • Suggests that overcomplete representations (e.g. where there are more 'second layer' neurons than inputs or pixels) are useful for flattening manifolds in the input space, making feature extraction easier.
    • But then you have an under-determined problem, where presumably sparsity metrics step in to restrict the actual coding space. Authors mention that this could lead to degeneracy.
    • Example is the early visual cortex, where axons to higher layers exceed those from the LGN by a factor of 25. Which, they say, may be a compromise between over-representation and degeneracy.
  • Sparse coding is a necessity from an energy standpoint -- only one in 50 neurons can be active at any given time.
  • Sparsity increases when classical receptive field stimuli in V1 is expanded with a real-world-statistics surround. (Gallant 2002).

hide / / print
ref: -2006 tags: Mark Bear reward visual cortex cholinergic date: 03-06-2019 04:54 gmt revision:1 [0] [head]

PMID-16543459 Reward timing in the primary visual cortex

  • Used 192-IgG-Saporin (saporin immunotoxin)to selectively lesion cholinergic fibers locally in V1 following a visual stimulus -> licking reward delay behavior.
  • Visual stimulus is full-field light, delivered to either the left or right eye.
    • This is scarcely a challenging task; perhaps they or others have followed up?
  • These examples illustrate that both cue 1-dominant and cue 2-dominant neurons recorded from intact animals express NRTs that appropriately reflect the new policy. Conversely, although cue 1- and cue 2-dominant neurons recorded from 192-IgG-saporin-infused animals are capable of displaying all forms of reward timing activity, ‘’’they do not update their NRTs but rather persist in reporting the now outdated policy.’’’
    • NRT = neural reaction time.
  • This needs to be controlled with recordings from other cortical areas.
  • Acquisition of reward based response is simultaneously interesting and boring -- what about the normal, discriminative and perceptual function of the cortex?
  • See also follow-up work PMID-23439124 A cholinergic mechanism for reward timing within primary visual cortex.

hide / / print
ref: -2017 tags: vicarious dileep george captcha message passing inference heuristic network date: 03-06-2019 04:31 gmt revision:2 [1] [0] [head]

PMID-29074582 A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

  • Vicarious supplementary materials on their RCN (recursive cortical network).
  • Factor scene into shape and appearance, which CNN or DCNN do not do -- they conflate (ish? what about the style networks?)
    • They call this the coloring book approach -- extract shape then attach appearance.
  • Hierarchy of feature layers F frcF_{f r c} (binary) and pooling layer H frcH_{f r c} (multinomial), where f is feature, r is row, c is column (e.g. over image space).
  • Each layer is exclusively conditional on the layer above it, and all features in a layer are conditionally independent given the layer above.
  • Pool variables H frcH_{f r c} is multinomial, and each value associated with a feature, plus one off feature.
    • These features form a ‘pool’, which can/does have translation invariance.
  • If any of the pool variables are set to enable FF , then that feature is set (or-operation). Many pools can contain a given feature.
  • One can think of members of a pool as different alternatives of similar features.
  • Pools can be connected laterally, so each is dependent on the activity of its neighbors. This can be used to enforce edge continuity.
  • Each bottom-level feature corresponds to an edge, which defines ‘in’ and ‘out’ to define shape, YY .
  • These variables YY are also interconnected, and form a conditional random field, a ‘Potts model’. YY is generated by gibbs sampling given the F-H hierarchy above it.
  • Below Y, the per-pixel model X specifies texture with some conditional radial dependence.
  • The model amounts to a probabalistic model for which exact inference is impossible -- hence you must do approximate, where a bottom up pass estimates the category (with lateral connections turned off), and a top down estimates the object mask. Multiple passes can be done for multiple objects.
  • Model has a hard time moving from rgb pixels to edge ‘in’ and ‘out’; they use edge detection pre-processing stage, e.g. Gabor filter.
  • Training follows a very intuitive, hierarchical feature building heuristic, where if some object or collection of lower level features is not present, it’s added to the feature-pool tree.
    • This includes some winner-take-all heuristic for sparsification.
    • Also greedily learn some sort of feature ‘’dictionary’’ from individual unlabeled images.
  • Lateral connections are learned similarly, with a quasi-hebbian heuristic.
  • Neuroscience inspiration: see refs 9, 98 for message-passing based Bayesian inference.

  • Overall, a very heuristic, detail-centric, iteratively generated model and set of algorithms. You get the sense that this was really the work of Dileep George or only a few people; that it was generated by successively patching and improving the model/algo to make up for observed failures and problems.
    • As such, it offers little long-term vision for what is possible, or how perception and cognition occurs.
    • Instead, proof is shown that, well, engineering works, and the space of possible solutions -- including relatively simple elements like dictionaries and WTA -- is large and fecund.
      • Unclear how this will scale to even more complex real-world problems, where one would desire a solution that does not have to have each level carefully engineered.
      • Modern DCNN, at least, do not seem to have this property -- the structure is learned from the (alas, labeled) data.
  • This extends to the fact that yes, their purpose-built system achieves state of the art performance on the designated CAPATCHA tasks.
  • Check: B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015). doi:10.1126/science.aab3050 Medline

hide / / print
ref: -2018 tags: cortex layer martinotti interneuron somatostatin S1 V1 morphology cell type morphological recovery patch seq date: 03-06-2019 02:51 gmt revision:3 [2] [1] [0] [head]

Neocortical layer 4 in adult mouse differs in major cell types and circuit organization between primary sensory areas

  • Using whole-cell recordings with morphological recovery, we identified one major excitatory and seven inhibitory types of neurons in L4 of adult mouse visual cortex (V1).
  • Nearly all excitatory neurons were pyramidal and almost all Somatostatin-positive (SOM+) neurons were Martinotti cells.
  • In contrast, in somatosensory cortex (S1), excitatory cells were mostly stellate and SOM+ cells were non-Martinotti.
  • These morphologically distinct SOM+ interneurons correspond to different transcriptomic cell types and are differentially integrated into the local circuit with only S1 cells receiving local excitatory input.
  • Our results challenge the classical view of a canonical microcircuit repeated through the neocortex.
  • Instead we propose that cell-type specific circuit motifs, such as the Martinotti/pyramidal pair, are optionally used across the cortex as building blocks to assemble cortical circuits.
  • Note preponderance of axons.
  • Classifications:
    • Pyr pyramidal cells
    • BC Basket cells
    • MC Martinotti cells
    • BPC bipolar cells
    • NFC neurogliaform cells
    • SC shrub cells
    • DBC double bouquet cells
    • HEC horizontally elongated cells.
  • Using Patch-seq

hide / / print
ref: -2012 tags: parvalbumin interneurons V1 perceptual discrimination mice date: 03-06-2019 01:46 gmt revision:0 [head]

PMID-22878719 Activation of specific interneurons improves V1 feature selectivity and visual perception

  • Lee SH1, Kwan AC, Zhang S, Phoumthipphavong V, Flannery JG, Masmanidis SC, Taniguchi H, Huang ZJ, Zhang F, Boyden ES, Deisseroth K, Dan Y.
  • Optogenetic Activation of PV+ interneurons improves neuronal feature selectivity and improves perceptual discrimination (!!!)

hide / / print
ref: -2016 tags: MAPseq Zador connectome mRNA plasmic library barcodes Peikon date: 03-06-2019 00:51 gmt revision:1 [0] [head]

PMID-27545715 High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

  • Justus M. Kebschull, Pedro Garcia da Silva, Ashlan P. Reid, Ian D. Peikon, Dinu F. Albeanu, Anthony M. Zador
  • Another tool for the toolboxes, but I still can't help but to like microscopy: while the number of labels in MAPseq is far higher, the information per read-oout is much lower; an imaged slice holds a lot of information, including dendritic / axonal morphology, which sequencing doesn't get. Natch, you'd wan to use both, or FISseq + ExM.