You are not authenticated, login.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -2019 tags: neuromorphic optical computing date: 06-19-2019 14:47 gmt revision:1 [0] [head]

Large-Scale Optical Neural Networks based on Photoelectric Multiplication

  • Critical idea: use coherent homodyne detection, and quantum photoelectric multiplication for the MACs.
    • That is, E-fields from coherent light multiplies rather than adds within a (logarithmic) photodiode detector.
    • Other lit suggests rather limited SNR for this effect -- 11db.
  • Hence need EO modulators and OE detectors followed by nonlinearity etc.
  • Pure theory, suggests that you can compute with as few as 10's of photons per MAC -- or less! Near Landauer's limit.

hide / / print
ref: -2016 tags: fluorescent proteins photobleaching quantum yield piston GFP date: 06-19-2019 14:33 gmt revision:0 [head]

PMID-27240257 Quantitative assessment of fluorescent proteins.

  • Cranfill PJ1,2, Sell BR1, Baird MA1, Allen JR1, Lavagnino Z2,3, de Gruiter HM4, Kremers GJ4, Davidson MW1, Ustione A2,3, Piston DW
  • Model bleaching as log(F)=αlog(P)+clog(F) = -\alpha log(P) + c or k bleach=bI αk_{bleach} = b I^{\alpha} where F is the fluorescence intensity, P is the illumination power, and b and c are constants.
    • Most fluorescent proteins have α\alpha > 1, which means superlinear photobleaching -- more power, bleaches faster.
  • Catalog the degree to which each protein tends to form aggregates by tagging to the ER and measuring ER morphology. Fairly thorough -- 10k cells each FP.

hide / / print
ref: -2017 tags: neuromorphic optical computing nanophotonics date: 06-17-2019 14:46 gmt revision:5 [4] [3] [2] [1] [0] [head]

Progress in neuromorphic photonics

  • Similar idea as what I had -- use lasers as the optical nonlinearity.
    • They add to this the idea of WDM and 'MRR' (micro-ring resonator) weight bank -- they don't talk about the ability to change the weihts, just specify them with some precision.
  • Definitely makes the case that III-V semiconductor integrated photonic systems have the capability, in MMACs/mm^2/pj, to exceed silicon.

See also :

hide / / print
ref: -2013 tags: microscopy space bandwidth product imaging resolution UCSF date: 06-17-2019 14:45 gmt revision:0 [head]

How much information does your microscope transmit?

  • Typical objectives 1x - 5x, about 200 Mpix!

hide / / print
ref: -0 tags: nanophotonics interferometry neural network mach zehnder interferometer optics date: 06-13-2019 21:55 gmt revision:3 [2] [1] [0] [head]

Deep Learning with Coherent Nanophotonic Circuits

  • Used a series of Mach-Zehnder interferometers with thermoelectric phase-shift elements to realize the unitary component of individual layer weight-matrix computation.
    • Weight matrix was decomposed via SVD into UV*, which formed the unitary matrix (4x4, Special unitary 4 group, SU(4)), as well as Σ\Sigma diagonal matrix via amplitude modulators. See figure above / original paper.
    • Note that interfereometric matrix multiplication can (theoretically) be zero energy with an optical system (modulo loss).
      • In practice, you need to run the phase-moduator heaters.
  • Nonlinearity was implemented electronically after the photodetector (e.g. they had only one photonic circuit; to get multiple layers, fed activations repeatedly through it. This was a demonstration!)
  • Fed network FFT'd / banded recordings of consonants through the network to get near-simulated vowel recognition.
    • Claim that noise was from imperfect phase setting in the MZI + lower resolution photodiode read-out.
  • They note that the network can more easily (??) be trained via the finite difference algorithm (e.g. test out an incremental change per weight / parameter) since running the network forward is so (relatively) low-energy and fast.
    • Well, that's not totally true -- you need to update multiple weights at once in a large / deep network to descend any high-dimensional valleys.

hide / / print
ref: -2012 tags: phase change materials neuromorphic computing synapses STDP date: 06-13-2019 21:19 gmt revision:3 [2] [1] [0] [head]

Nanoelectronic Programmable Synapses Based on Phase Change Materials for Brain-Inspired Computing

  • Here, we report a new nanoscale electronic synapse based on technologically mature phase change materials employed in optical data storage and nonvolatile memory applications.
  • We utilize continuous resistance transitions in phase change materials to mimic the analog nature of biological synapses, enabling the implementation of a synaptic learning rule.
  • We demonstrate different forms of spike-timing-dependent plasticity using the same nanoscale synapse with picojoule level energy consumption.
  • Again uses GST germanium-antimony-tellurium alloy.
  • 50pJ to reset (depress) the synapse, 0.675pJ to potentiate.
    • Reducing the size will linearly decrease this current.
  • Synapse resistance changes from 200k to 2M approx.

See also: Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element

hide / / print
ref: -0 tags: optical gain media lasers cross section dye date: 06-13-2019 15:13 gmt revision:2 [1] [0] [head]

Eminently useful. Source: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-974-fundamentals-of-photonics-quantum-electronics-spring-2006/lecture-notes/chapter7.pdf

Laser Dye technology by Peter Hammond

  • This paper is another great resource!
  • Lists the stimulated emission cross-section for Rhodamine-6G as 4e-16 @ 550nm, consistent with the table above.
  • At a (high) concentration of 2mMol (1 g/l), 1/e penetration depth is 20um.
    • Depending on the solvent, there may be aggregation and stacking / quenching.
  • Tumbling time of Rhodamine 6G in ethanol is 20 to 300ps; fluorescence lifetime in oscillators is 10's of ps, so there is definitely polarization sensitive amplification.
  • Generally in dye lasers, the emission cross-section must be higher than the excited state absorption, σ eσ \sigma_e - \sigma^\star most important.
  • Bacteria can actually subsist on rhodamine-similar sulfonated dyes in aqueous solutions! Wow.

hide / / print
ref: -0 tags: lenslet optical processor date: 06-10-2019 04:26 gmt revision:0 [head]

Small gathering of links on Lenslet Labs / Lenslet inc. Founded in 1999.

  • Lenslet funding $26M 3rd round. November 2000.
  • vector-matrix optical multiplier Sept 2002
  • Patent on optical processor core. Idea includes 288 modulated VCSELs, 256x256 Multi-quantum-well modulators, photodiods, and lenses for splitting the light field or performing Fourier transforms.
  • Press release on the MQW SLM. 8 terra-ops. Jan 2004.
  • EnLight 64 press release, 240 billion ops/sec. Talks about having an actual software development platform, starting in Matlab. Photo of the device. Jan 1 2005.
  • Lenslet closes Lays off its last 30 employees; CEO is trying to liquidate IP, assets. March 2006.

hide / / print
ref: -2019 tags: optical neural networks spiking phase change material learning date: 06-01-2019 19:00 gmt revision:4 [3] [2] [1] [0] [head]

All-optical spiking neurosynaptic networks with self-learning capabilities

  • J. Feldmann, N. Youngblood, C. D. Wright, H. Bhaskaran & W. H. P. Pernice
  • Idea: use phase-change material to either block or pass the light in waveguides.
    • In this case, they used GST -- germanium-antimony-tellurium. This material is less reflective in the amorphous phase, which can be reached by heating to ~150C and rapidly quenching. It is more reflective in the crystalline phase, which occurs on annealing.
  • This is used for both plastic synapses (phase change driven by the intensity of the light) and the nonlinear output of optical neurons (via a ring resonator).
  • Uses optical resonators with very high Q factors to couple different wavelengths of light into the 'dendrite'.
  • Ring resonator on the output: to match the polarity of the phase-change material. Is this for reset? Storing light until trigger?
  • Were able to get correlative-like or hebbian learning (which I suppose is not dissimilar from really slow photographic film, just re-branded, and most importantly with nonlinear feedback.)
  • Issue: every weight needs a different source wavelength! Hence they have not demonstrated a multi-layer network.
  • Previous paper: All-optical nonlinear activation function for photonic neural networks
    • Only 3db and 7db extinction ratios for induced transparency and inverse saturation

hide / / print
ref: -0 tags: synaptic plasticity 2-photon imaging inhibition excitation spines dendrites synapses 2p date: 05-31-2019 23:02 gmt revision:2 [1] [0] [head]

PMID-22542188 Clustered dynamics of inhibitory synapses and dendritic spines in the adult neocortex.

  • Cre-recombinase-dependent labeling of postsynapitc scaffolding via Gephryn-Teal fluorophore fusion.
  • Also added Cre-eYFP to lavel the neurons
  • Electroporated in utero e16 mice.
    • Low concentration of Cre, high concentrations of Gephryn-Teal and Cre-eYFP constructs to attain sparse labeling.
  • Located the same dendrite imaged in-vivo in fixed tissue - !! - using serial-section electron microscopy.
  • 2230 dendritic spines and 1211 inhibitory synapses from 83 dendritic segments in 14 cells of 6 animals.
  • Some spines had inhibitory synapses on them -- 0.7 / 10um, vs 4.4 / 10um dendrite for excitatory spines. ~ 1.7 inhibitory
  • Suggest that the data support the idea that inhibitory inputs maybe gating excitation.
  • Furthermore, co-inervated spines are stable, both during mormal experience and during monocular deprivation.
  • Monocular deprivation induces a pronounced loss of inhibitory synapses in binocular cortex.

hide / / print
ref: -0 tags: 3D SHOT Alan Hillel Waller 2p photon holography date: 05-31-2019 22:19 gmt revision:4 [3] [2] [1] [0] [head]

PMID-29089483 Three-dimensional scanless holographic optogenetics with temporal focusing (3D-SHOT).

  • Pégard NC1,2, Mardinly AR1, Oldenburg IA1, Sridharan S1, Waller L2, Adesnik H3,4
  • Combines computer-generated holography and temporal focusing for single-shot (no scanning) two-photon photo-activation of opsins.
  • The beam intensity profile determines the dimensions of the custom temporal focusing pattern (CTFP), while phase, a previously unused degree of freedom, is engineered to make 3D holograph and temporal focusing compatible.
  • "To ensure good diffraction efficiency of all spectral components by the SLM, we used a lens Lc to apply a small spherical phase pattern. The focal length was adjusted so that each spectral component of the pulse spans across the short axis of the SLM in the Fourier domain".
    • That is, they spatially and temporally defocus the pulse to better fill the SLM. The short axis of the SLM in this case is Y, per supplementary figure 2.
  • The image of the diffraction grating determines the plane of temporal focusing (with lenses L1 and L2); there is a secondary geometric focus due to Lc behind the temporal plane, which serves as an aberration.
  • The diffraction grating causes the temporal pattern to scan to produce a semi-spherical stimulated area ('disc').
  • Rather than creating a custom 3D holographic shape for each neuron, the SLM is after the diffraction grating -- it imposes phase and space modulation to the CTFP, effectively convolving it with a holograph of a cloud of points & hence replicating at each point.

hide / / print
ref: -0 tags: Na Ji 2p two photon fluorescent imaging pulse splitting damage bleaching date: 05-31-2019 19:55 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-18204458 High-speed, low-photodamage nonlinear imaging using passive pulse splitters

  • Core idea: take a single pulse and spread it out to N=2 kN= 2^k pulses using reflections and delay lines.
  • Assume two optical processes, signal SI αS \propto I^{\alpha} and photobleaching/damage DI βD \propto I^{\beta} , β>α>1\beta \gt \alpha \gt 1
  • Then an NN pulse splitter requires N 11/αN^{1-1/\alpha} greater average power but reduces the damage by N 1β/α.N^{1-\beta/\alpha}.
  • At constant signal, the same NN pulse splitter requires N\sqrt{N} more power, consistent with two photon excitation (proportional to the square of the intensity: N pulses of N/N\sqrt{N}/N intensity, 1/N per pulse fluorescence, Σ1\Sigma \rightarrow 1 overall fluorescence.)
  • This allows for shorter dwell times, higher power at the sample, lower damage, slower photobleaching, and better SNR for fluorescently labeled slices.
  • Examine the list of references too, e.g. "Multiphoton multifocal microscopy exploiting a diffractive optical element" (2003)

hide / / print
ref: -0 tags: phosphorescence fluorescence magnetic imaging slicing adam cohen date: 05-29-2019 19:41 gmt revision:8 [7] [6] [5] [4] [3] [2] [head]

A friend postulated using the triplet state phosphorescence as a magnetically-modulatable dye. E.g. magnetically slice a scattering biological sample, rather than slicing optically (light sheet, 2p) or mechanically. After a little digging:

I'd imagine that it should be possible to design a molecule -- a protein cage, perhaps a (fully unsaturated) terpine -- which isolates the excited state from oxygen quenching.

Adam Cohen at Harvard has been working a bit on this very idea, albeit with fluorescence not phosphorescence --

  • Optical imaging through scattering media via magnetically modulated fluorescence (2010)
    • The two species, pyrene and dimethylaniline are in solution.
    • Dimethylaniline absorbs photons and transfers an electron to pyrene to produce a singlet radical pair.
    • The magnetic field represses conversion of this singlet into a triplet; when two singlet electrons combine, they produce exciplex fluorescence.
  • Addition of an aliphatic-ether 12-O-2 linker improves things significantly --
  • Mapping Nanomagnetic Fields Using a Radical Pair Reaction (2011)
  • Which can be used with a 2p microscope:
  • Two-photon imaging of a magneto-fluorescent indicator for 3D optical magnetometry (2015)
    • Notably, use decay kinetics of the excited state to yield measurements that are insensitive to photobleaching, indicator concentration, or local variations in optical excitation or collection efficiency. (As opposed to ΔF/F\Delta F / F )
    • Used phenanthrene (3 aromatic rings, not 4 in pyrene) as the excited electron acceptor, dimethylaniline again as the photo-electron generator.
    • Clear description:
      • A molecule with a singlet ground state absorbs a photon.
      • The photon drives electron transfer from a donor moiety to an acceptor moiety (either inter or intra molecular).
      • The electrons [ground state and excited state, donor] become sufficiently separated so that their spins do not interact, yet initially they preserve the spin coherence arising from their starting singlet state.
      • Each electron experiences a distinct set of hyperfine couplings to it's surrounding protons (?) leading to a gradual loss of coherence and intersystem crossing (ISC) into a triplet state.
      • An external magnetic field can lock the precession of both electrons to the field axis, partially preserving coherence and supressing ISC.
      • In some chemical systems, the triplet state is non-fluorescence, whereas the singlet pair can recombine and emit a photon.
      • Magnetochemical effects are remarkable because they arise at a magnetic field strengths comparable to hyperfine energy (typically 1-10mT).
        • Compare this to the Zeeman effect, where overt splitting is at 0.1T.
    • phenylanthrene-dimethylaniline was dissolved in dimethylformamide (DMF). The solution was carefully degassed in nitrogen to prevent molecular oxygen quenching.

Yet! Magnetic field effects do exist in solution:

hide / / print
ref: -2019 tags: super-resolution microscopy fluorescent protein molecules date: 05-28-2019 16:02 gmt revision:3 [2] [1] [0] [head]

PMID-30997987 Chemistry of Photosensitive Fluorophores for Single-Molecule Localization Microscopy

  • Excellent review of all the photo-convertable, photo-switchable, and more complex (photo-oxidation or reddening) of both proteins and small molecule fluorophore.
    • E.g. PA-GFP is one of the best -- good photoactivation quantum yield, good N ~ 300
    • Other small molecules, like Alexa Fluor 647 have a photon yield > 6700, which can be increased with triplet quenchers and antioxidants.
  • Describes the chemical mechanism of the various photo switching -- review is targeted at (bio)chemists interested in getting into imaging.
  • Emphasize that critical figures of merit are photoactivation quantum yield Φ pa\Phi_{pa} and N, overall photon yield before photobleaching.
  • See also Colorado lecture

hide / / print
ref: -2018 tags: Michael Levin youtube talk NIPS 2018 regeneration bioelectricity organism patterning flatworm date: 04-09-2019 18:50 gmt revision:1 [0] [head]

What Bodies Think About: Bioelectric Computation Outside the Nervous System - NeurIPS 2018

  • Short notes from watching the video, mostly interesting factoids: (This is a somewhat more coordinated narrative in the video. Am resisting ending each of these statements with and exclamation point).
  • Human children up to 7-11 years old can regenerate their fingertips.
  • Human embryos, when split in half early, develop into two normal humans; mouse embryos, when squished together, make one normal mouse.
  • Butterflies retain memories from their caterpillar stage, despite their brains liquefying during metamorphosis.
  • Flatworms are immortal, and can both grow and contract, as the environment requires.
    • They can also regenerate a whole body from segments, and know to make one head, tail, gut etc.
  • Single cell organisms, e.g. Lacrymaria, can have complex (and fast!) foraging / hunting plans -- without a brain or anything like it.
  • Axolotl can regenerate many parts of their body (appendages etc), including parts of the nervous system.
  • Frog embryos can self-organize an experimenter jumbled body plan, despite the initial organization having never been experienced in evolution.
  • Salamanders, when their tail is grafted into a foot/leg position, remodel the transplant into a leg and foot.
  • Neurotransmitters are ancient; fungi, who diverged from other forms of life about 1.5 billion years ago, still use the same set of inter-cell transmitters e.g. serotonin, which is why modulatory substances from them have high affinity & a strong effect on humans.
  • Levin, collaborators and other developmental biologists have been using voltage indicators in embryos ... this is not just for neurons.
  • Can make different species head shapes in flatworms by exposing them to ion-channel modulating drugs. This despite the fact that the respective head shapes are from species that have been evolving separately for 150 million years.
  • Indeed, you can reprogram (with light gated ion channels, drugs, etc) to body shapes not seen in nature or not explored by evolution.
    • That said, this was experimental, not by design; Levin himself remarks that the biology that generates these body plans is not known.
  • Flatworms can sore memory in bioelectric networks.
  • Frogs don't normally regenerate their limbs. But, with a drug cocktail targeting bioelectric signaling, they can regenerate semi-functional legs, complete with nerves, muscle, bones, and cartilage. The legs are functional (enough).
  • Manipulations of bioelectric signaling can reverse very serious genetic problems, e.g. deletion of Notch, to the point that tadpoles regain some ability for memory creation & recall.

  • I wonder how so much information can go through a the apparently scalar channel of membrane voltage. It seems you'd get symbol interference, and that many more signals would be required to pattern organs.
  • That said, calcium is used a great many places in the cell for all sorts of signaling tasks, over many different timescales as well, and it doesn't seem to be plagued by interference.
    • First question from the audience was how cells differentiate organismal patterning signals and behavioral signals, e.g. muscle contraction.

hide / / print
ref: -2017 tags: V1 V4 visual cortex granger causality date: 03-20-2019 06:00 gmt revision:0 [head]

PMID-28739915 Interactions between feedback and lateral connections in the primary visual cortex

  • Liang H1, Gong X1, Chen M2,3, Yan Y2,3, Li W4,3, Gilbert CD5.
  • Extracellular ephys on V1 and V4 neurons in macaque monkeys trained on a fixation and saccade task.
  • Contour task: monkeys had to select the patch of lines, chosen to stimulate the recorded receptive fields, which had a continuous contour in it (again chosen to elicit a response in the recorded V1 / V4 neurons).
    • Variable length of the contour: 1, 3, 5, 7 bars. First part of analysis: only 7-bar trials.
  • Granger causality (GC) in V1 horizontal connectivity decreased significantly in the 0-30Hz band after taking into account V4 activity. Hence, V4 explains some of the causal activity in V1.
    • This result holds both with contour-contour (e.g. cells both tuned to the contours in V1), contour-background, and background-background.
    • Yet there was a greater change in the contour-BG and BG-contour cells when V4 was taken into account (Granger causality is directional, like KL divergence).
      • This result passes the shuffle test, where tria identities were shuffled.
      • True also when LFP is measured.
      • That said .. even though GC is sensitive to temporal features, might be nice to control with a distant area.
      • See supplementary figures (of which there are a lot) for the controls.
  • Summarily: Feedback from V4 strengthens V1 lateral connections.
  • Then they looked at trials with a variable number of contour bars.
  • V4 seems to have a greater GC influence on background cells relative to contour cells.
  • Using conditional GC, lateral interactions in V1 contribute more to contour integration than V4.
  • Greater GC in correct trials than incorrect trials.

  • Note: differences in firing rate can affect estimation of GC. Hence, some advise using thinning of the spike trains to yield parity.
  • Note: refs for horizontal connections in V1 [7-10, 37]

hide / / print
ref: -2014 tags: gold nanowires intracellular recording korea date: 03-18-2019 23:02 gmt revision:1 [0] [head]

PMID-25112683 Subcellular Neural Probes from Single-Crystal Gold Nanowires

  • Korean authors... Mijeong Kang,† Seungmoon Jung,‡ Huanan Zhang,⊥ Taejoon Kang,∥ Hosuk Kang,† Youngdong Yoo,† Jin-Pyo Hong,# Jae-Pyoung Ahn,⊗ Juhyoun Kwak,† Daejong Jeon,‡* Nicholas A. Kotov,⊥* and Bongsoo Kim†*
  • 100nm single-crystal Au.
  • Able to get SUA despite size.
  • Springy, despite properties of bulk Au.
  • Nanowires fabricated on a sapphire substrae and picked up by a fine shapr W probe, then varnished with nail polish.

hide / / print
ref: -2011 tags: ttianium micromachining chlorine argon plasma etch oxide nitride penetrating probes Kevin Otto date: 03-18-2019 22:57 gmt revision:1 [0] [head]

PMID-21360044 Robust penetrating microelectrodes for neural interfaces realized by titanium micromachining

  • Patrick T. McCarthyKevin J. OttoMasaru P. Rao
  • Used Cl / Ar plasma to deep etch titanium film, 0.001 / 25um thick. Fine Metals Corp Ashland VA.
  • Discuss various insulation (oxide /nitride) failure modes, lithography issues.

hide / / print
ref: -0 tags: credit assignment distributed feedback alignment penn state MNIST fashion backprop date: 03-16-2019 02:21 gmt revision:1 [0] [head]

Conducting credit assignment by aligning local distributed representations

  • Alexander G. Ororbia, Ankur Mali, Daniel Kifer, C. Lee Giles
  • Propose two related algorithms: Local Representation Alignment (LRA)-diff and LRA-fdbk.
    • LRA-diff is basically a modified form of backprop.
    • LRA-fdbk is a modified version of feedback alignment. {1432} {1423}
  • Test on MNIST (easy -- many digits can be discriminated with one pixel!) and fashion-MNIST (harder -- humans only get about 85% right!)
  • Use a Cauchy or log-penalty loss at each layer, which is somewhat unique and interesting: L(z,y)= i=1 nlog(1+(y iz i) 2)L(z,y) = \sum_{i=1}^n{ log(1 + (y_i - z_i)^2)} .
    • This is hence a saturating loss.
  1. Normal multi-layer-perceptron feedforward network. pre activation h h^\ell and post activation z z^\ell are stored.
  2. Update the weights to minimize loss. This gradient calculation is identical to backprop, only they constrain the update to have a norm no bigger than c 1c_1 . Z and Y are actual and desired output of the layer, as commented. Gradient includes the derivative of the nonlinear activation function.
  3. Generaete update for the pre-nonlinearity h 1h^{\ell-1} to minimize the loss in the layer above. This again is very similar to backprop; its' the chain rule -- but the derivatives are vectors, of course, so those should be element-wise multiplication, not outer produts (i think).
    1. Note hh is updated -- derivatives of two nonlinearities.
  4. Feedback-alignment version, with random matrix E E_{\ell} (elements drawn from a gaussian distribution, σ=1\sigma = 1 ish.
    1. Only one nonlinearity derivative here -- bug?
  5. Move the rep and post activations in the specified gradient direction.
    1. Those h¯ 1\bar{h}^{\ell-1} variables are temporary holding -- but note that both lower and higher layers are updated.
  6. Do this K of times, K=1-50.
  • In practice K=1, with the LRA-fdbk algorithm, for the majority of the paper -- it works much better than LRA-diff (interesting .. bug?). Hence, this basically reduces to feedback alignment.
  • Demonstrate that LRA works much better with small initial weights, but basically because they tweak the algorithm to do this.
    • Need to see a positive control for this to be conclusive.
    • Again, why is FA so different from LRA-fdbk? Suspicious. Positive controls.
  • Attempted a network with Local Winner Take All (LWTA), which is a hard nonlinearity that LFA was able to account for & train through.
  • Also used Bernoulli neurons, and were able to successfully train. Unlike drop-out, these were stochastic at test time, and things still worked OK.

Lit review.
  • Logistic sigmoid can slow down learning, due to it's non-zero mean (Glorot & Bengio 2010).
  • Recirculation algorithm (or generalized recirculation) is a precursor for target propagation.
  • Target propagation is all about the inverse of the forward propagation: if we had access to the inverse of the network of forward propagations, we could compute which input values at the lower levels of the network would result in better values at the top that would please the global cost.
    • This is a very different way of looking at it -- almost backwards!
    • And indeed, it's not really all that different from contrastive divergence. (even though CD doesn't work well with non-Bernoulli units)
  • Contractive Hebbian learning also has two phases, one to fantasize, and done to try to make the fantasies look more like the input data.
  • Decoupled neural interfaces (Jaderberg et al 2016): learn a predictive model of error gradients (and inputs) nistead of trying to use local information to estimate updated weights.

  • Yeah, call me a critic, but I'm not clear on the contribution of this paper; it smells precocious and over-sold.
    • Even the title. I was hoping for something more 'local' than per-layer computation. BP does that already!
  • They primarily report supportive tests, not discriminative or stressing tests; how does the algorithm fail?
    • Certainly a lot of work went into it..
  • I still don't see how the computation of a target through a ransom matrix, then using delta/loss/error between that target and the feedforward activation to update weights, is much different than propagating the errors directly through a random feedback matrix. Eg. subtract then multiply, or multiply then subtract?

hide / / print
ref: -2011 tags: Andrew Ng high level unsupervised autoencoders date: 03-15-2019 06:09 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Building High-level Features Using Large Scale Unsupervised Learning

  • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
  • Input data 10M random 200x200 frames from youtube. Each video contributes only one frame.
  • Used local receptive fields, to reduce the communication requirements. 1000 computers, 16 cores each, 3 days.
  • "Strongly influenced by" Olshausen & Field {1448} -- but this is limited to a shallow architecture.
  • Lee et al 2008 show that stacked RBMs can model simple functions of the cortex.
  • Lee et al 2009 show that convolutonal DBN trained on faces can learn a face detector.
  • Their architecture: sparse deep autoencoder with
    • Local receptive fields: each feature of the autoencoder can connect to only a small region of the lower layer (e.g. non-convolutional)
      • Purely linear layer.
      • More biologically plausible & allows the learning of more invariances other than translational invariances (Le et al 2010).
      • No weight sharing means the network is extra large == 1 billion weights.
        • Still, the human visual cortex is about a million times larger in neurons and synapses.
    • L2 pooling (Hyvarinen et al 2009) which allows the learning of invariant features.
      • E.g. this is the square root of the sum of the squares of its inputs. Square root nonlinearity.
    • Local contrast normalization -- subtractive and divisive (Jarrett et al 2009)
  • Encoding weights W 1W_1 and deconding weights W 2W_2 are adjusted to minimize the reconstruction error, penalized by 0.1 * the sparse pooling layer activation. Latter term encourages the network to find invariances.
  • minimize(W 1,W 2) minimize(W_1, W_2) i=1 m(||W 2W 1 Tx (i)x (i)|| 2 2+λ j=1 kε+H j(W 1 Tx (i)) 2) \sum_{i=1}^m {({ ||W_2 W_1^T x^{(i)} - x^{(i)} ||^2_2 + \lambda \sum_{j=1}^k{ \sqrt{\epsilon + H_j(W_1^T x^{(i)})^2}} })}
    • H jH_j are the weights to the j-th pooling element, λ=0.1\lambda = 0.1 ; m examples; k pooling units.
    • This is also known as reconstruction Topographic Independent Component Analysis.
    • Weights are updated through asynchronous SGD.
    • Minibatch size 100.
    • Note deeper autoencoders don't fare consistently better.