use https for features.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -2010 tags: neural signaling rate code patch clamp barrel cortex date: 03-18-2021 18:41 gmt revision:0 [head]

PMID-20596024 Sensitivity to perturbations in vivo implies high noise and suggests rate coding in cortex

  • How did I not know of this paper before.
  • Solid study showing that, while a single spike can elicit 28 spikes in post-synaptic neurons, the associated level of noise is indistinguishable from intrinsic noise.
  • Hence the cortex should communicate / compute in rate codes or large synchronized burst firing.
    • They found large bursts to be infrequent, timing precision to be low, hence rate codes.
    • Of course other examples, e.g auditory cortex, exist.

Cortical reliability amid noise and chaos

  • Noise is primarily of synaptic origin. (Dropout)
  • Recurrent cortical connectivity supports sensitivity to precise timing of thalamocortical inputs.

hide / / print
ref: -2020 tags: dreamcoder ellis program induction ai date: 02-01-2021 18:39 gmt revision:0 [head]

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

  • Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum

This paper describes a system for adaptively finding programs which succinctly and accurately produce desired output.  These desired outputs are provided by the user / test system, and come from a number of domains:

  • list (as in lisp) processing,
  • text editing,
  • regular expressions,
  • line graphics,
  • 2d lego block stacking,
  • symbolic regression (ish),
  • functional programming,
  • and physcial laws.  
Some of these domains are naturally toy-like, eg. the text processing, but others are deeply impressive: the system was able to "re-derive" basic physical laws of vector calculus in the process of looking for S-expression forms of cheat-sheet physics equations.  These advancements result from a long lineage of work, perhaps starting from the Helmholtz machine PMID-7584891 introduced by Peter Dayan, Geoff Hinton and others, where onemodel is trained to generate patterns given context (e.g.) while a second recognition module is trained to invert this model: derive context from the patterns.  The two work simultaneously to allow model-exploration in high dimensions.  

Also in the lineage is the EC2 algorithm, which most of the same authors above published in 2018.  EC2 centers around the idea of "explore - compress" : explore solutions to your program induction problem during the 'wake' phase, then compress the observed programs into a library by extracting/factoring out commonalities during the 'sleep' phase.  This of course is one of the core algorithms of human learning: explore options, keep track of both what worked and what didn't, search for commonalities among the options & their effects, and use these inferred laws or heuristics to further guide search and goal-setting, thereby building a buffer attack the curse of dimensionality.  Making the inferred laws themselves functions in a programming library allows hierarchically factoring the search task, making exploration of unbounded spaces possible.  This advantage is unique to the program synthesis approach. 

This much is said in the introduction, though perhaps with more clarity.  DreamCoder is an improved, more-accessible version of EC2, though the underlying ideas are the same.   It differs in that the method for constructing libraries has improved through the addition of a powerful version space for enumerating and evaluating refactors of the solutions generated during the wake phase.  (I will admit that I don't much understand the version space system.)  This version space allows DreamCoder to collapse the search space for re-factorings by many orders of magnitude, and seems to be a clear advancement.  Furthermore, DreamCoder incorporates a second phase of sleep: "dreaming", hence the moniker.  During dreaming the library is used to create 'dreams' consisting of combinations of the library primitives, which are then executed with training data as input.  These dreams are then used to train up a neural network to predict which library and atomic objects to use in given contexts.  Context in this case is where in the parse tree a given object has been inserted (it's parent and which argument number it sits in); how the data-context is incorporated to make this decision is not clear to me (???). 

This neural dream and replay-trained neural network is either a GRU recurrent net with 64 hidden states, or a convolutional network feeding into a RNN.  The final stage is a linear ReLu (???) which again is not clear how it feeds into the prediction of "which unit to use when".  The authors clearly demonstrate that the network, or the probabalistic context-free grammar that it controls (?) is capable of straightforward optimizations, like breaking symmetries due to commutativity, avoiding adding zero, avoiding multiplying by one, etc.  Beyond this, they do demonstrate via an ablation study that the presence of the neural network affords significant algorithmic leverage in all of the problem domains tested.  The network also seems to learn a reasonable representation of the sub-type of task encountered -- but a thorough investigation of how it works, or how it might be made to work better, remains desired. 

I've spent a little time looking around the code, which is a mix of python high-level experimental control code, and lower-level OCaml code responsible for running (emulating) the lisp-like DSL, inferring type in it's polymorphic system / reconciling types in evaluated program instances, maintaining the library, and recompressing it using aforementioned version spaces.  The code, like many things experimental, is clearly a work-in progress, with some old or unused code scattered about, glue to run the many experiments & record / analyze the data, and personal notes from the first author for making his job talks (! :).  The description in the supplemental materials, which is satisfyingly thorough (if again impenetrable wrt version spaces), is readily understandable, suggesting that one (presumably the first) author has a clear understanding of the system.  It doesn't appear that much is being hidden or glossed over, which is not the case for all scientific papers. 

With the caveat that I don't claim to understand the system to completion, there are some clear areas where the existing system could be augmented further.  The 'recognition' or perceptual module, which guides actual synthesis of candidate programs, realistically can use as much information as is available in DreamCoder as is available: full lexical and semantic scope, full input-output specifications, type information, possibly runtime binding of variables when filling holes.  This is motivated by the way that humans solve problems, at least as observed by introspection:
  • Examine problem, specification; extract patterns (via perceptual modules)
  • Compare patterns with existing library (memory) of compositionally-factored 'useful solutions' (this is identical to the library in DreamCoder)* Do something like beam-search or quasi stochastic search on selected useful solutions.  This is the same as DreamCoder, however human engineers make decisions progressively, at runtime so-to-speak: you fill not one hole per cycle, but many holes.  The addition of recursion to DreamCoder, provided a wider breadth of input information, could support this functionality. 
  • Run the program to observe input-output .. but also observe the inner workings of the program, eg. dataflow patterns.  These dataflow patterns are useful to human engineers when both debugging and when learning-by-inspection what library elements do.   DreamCoder does not really have this facility. 
  • Compare the current program results to the desired program output.  Make a stochastic decision whether to try to fix it, or to try another beam in the search.  Since this would be on a computer, this could be in parallel (as DreamCoder is); the ability to 'fix' or change a DUT is directly absent dreamcoder.   As an 'deeply philosophical' aside, this loop itself might be the effect of running a language-of-thought program, as was suggested by pioneers in AI (ref).  The loop itself is subject to modification and replacement based on goal-seeking success in the domain of interest, in a deeply-satisfying and deeply recursive manner ...
At each stage in the pipeline, the perceptual modules would have access to relevant variables in the current problem-solving context.  This is modeled on Jacques Pitrat's work.  Humans of course are even more flexible than that -- context includes roughly the whole brain, and if anything we're mushy on which level of the hierarchy we are working. 

Critical to making this work is to have, as I've written in my notes many years ago, a 'self compressing and factorizing memory'.  The version space magic + library could be considered a working example of this.  In the realm of ANNs, per recent OpenAI results with CLIP and Dall-E, really big transformers also seem to have strong compositional abilities, with the caveat that they need to be trained on segments of the whole web.  (This wouldn't be an issue here, as Dreamcoder generates a lot of its own training data via dreams).  Despite the data-inefficiency of DNN / transformers, they should be sufficient for making something in the spirit of above work, with a lot of compute, at least until more efficient models are available (which they should be shortly; see AlphaZero vs MuZero). 

hide / / print
ref: -2015 tags: olshausen redwood autoencoder VAE MNIST faces variation date: 11-27-2020 03:04 gmt revision:0 [head]

Discovering hidden factors of variation in deep networks

  • Well, they are not really that deep ...
  • Use a VAE to encode both a supervised signal (class labels) as well as unsupervised latents.
  • Penalize a combination of the MSE of reconstruction, logits of the classification error, and a special cross-covariance term to decorrelate the supervised and unsupervised latent vectors.
  • Cross-covariance penalty:
  • Tested on
    • MNIST -- discovered style / rotation of the characters
    • Toronto faces database -- seven expressions, many individuals; extracted eigen-emotions sorta.
    • Multi-PIE --many faces, many viewpoints ; was able to vary camera pose and illumination with the unsupervised latents.

hide / / print
ref: -2020 tags: replay hippocampus variational autoencoder date: 10-11-2020 04:09 gmt revision:1 [0] [head]

Brain-inspired replay for continual learning with artificial neural networks

  • Gudo M van de Ven, Hava Siegelmann, Andreas Tolias
  • In the real world, samples are not replayed in shuffled order -- they occur in a sequence, typically few times. Hence, for training an ANN (or NN?), you need to 'replay' samples.
    • Perhaps, to get at hidden structure not obvious on first pass through the sequence.
    • In the brain, reactivation / replay likely to stabilize memories.
      • Strong evidence that this occurs through sharp-wave ripples (or the underlying activity associated with this).
  • Replay is also used to combat a common problem in training ANNs - catastrophic forgetting.
    • Generally you just re-sample from your database (easy), though in real-time applications, this is not possible.
      • It might also take a lot of memory (though that is cheap these days) or violate privacy (though again who cares about that)

  • They study two different classification problems:
    • Task incremental learning (Task-IL)
      • Agent has to serially learn distinct tasks
      • OK for Atari, doesn't make sense for classification
    • Class incremental learning (Class-IL)
      • Agent has to learn one task incrementally, one/few classes at a time.
      • Like learning a 2 digits at a time in MNIST
        • But is tested on all digits shown so far.
  • Solved via Generative Replay (GR, ~2017)
  • Use a recursive formulation: 'old' generative model is used to generate samples, which are then classified and fed, interleaved with the new samples, to the new network being trained.
    • 'Old' samples can be infrequent -- it's easier to reinforce an existing memory rather than create a new one.
    • Generative model is a VAE.
  • Compared with some existing solutions to catastrophic forgetting:
    • Methods to protect parameters in the network important for previous tasks
      • Elastic weight consolidation (EWC)
      • Synaptic intelligence (SI)
        • Both methods maintain estimates of how influential parameters were for previous tasks, and penalize changes accordingly.
        • "metaplasticity"
        • Synaptic intelligence: measure the loss change relative to the individual weights.
        • δL=δLδθδθδtδt \delta L = \int \frac{\delta L}{\delta \theta} \frac{\delta \theta}{\delta t} \delta t ; converted into discrete time / SGD: L=Σ kω k=ΣδLδθδθδtδt L = \Sigma_k \omega_k = \Sigma \int \frac{\delta L}{\delta \theta} \frac{\delta \theta}{\delta t} \delta t
        • ω k\omega_k are then the weightings for how much parameter change contributed to the training improvement.
        • Use this as a per-parameter regularization strength, scaled by one over the square of 'how far it moved'.
        • This is added to the loss, so that the network is penalized for moving important weights.
    • Context-dependent gating (XdG)
      • To reduce interference between tasks, a random subset of neurons is gated off (inhibition), depending on the task.
    • Learning without forgetting (LwF)
      • Method replays current task input after labeling them (incorrectly?) using the model trained on the previous tasks.
  • Generative replay works on Class-IL!
  • And is robust -- not to many samples or hidden units needed (for MNIST)

  • Yet the generative replay system does not scale to CIFAR or permuted MNIST.
  • E.g. if you take the MNIST pixels, permute them based on a 'task', and ask a network to still learn the character identities , it can't do it ... though synaptic intelligence can.
  • Their solution is to make 'brain-inspired' modifications to the network:
    • RtF, Replay-though-feedback: the classifier and generator network are fused. Latent vector is the hippocampus. Cortex is the VAE / classifier.
    • Con, Conditional replay: normal prior for the VAE is replaced with multivariate class-conditional Gaussian.
      • Not sure how they sample from this, check the methods.
    • Gat, Gating based on internal context.
      • Gating is only applied to the feedback layers, since for classification ... you don't a priori know the class!
    • Int, Internal replay. This is maybe the most interesting: rather than generating pixels, feedback generates hidden layer activations.
      • First layer of a network is convolutional, dependent on visual feature statistics, and should not change much.
        • Indeed, for CIFAR, they use pre-trained layers.
      • Internal replay proved to be very important!
    • Dist, Soft target labeling of the generated targets; cross-entropy loss when training the classifier on generated samples. Aka distillation.
  • Results suggest that regularization / metaplasticity (keeping memories in parameter space) and replay (keeping memories in function space) are complementary strategies,
    • And that the brain uses both to create and protect memories.

  • When I first read this paper, it came across as a great story -- well thought out, well explained, a good level of detail, and sufficiently supported by data / lesioning experiments.
  • However, looking at the first authors pub record, it seems that he's been at this for >2-3 years ... things take time to do & publish.
  • Folding in of the VAE is satisfying -- taking one function approximator and use it to provide memory for another function approximator.
  • Also satisfying are the neurological inspirations -- and that full feedback to the pixel level was not required!
    • Maybe the hippocampus does work like this, providing high-level feature vectors to the cortex.
    • And it's likely that the cortex has some features of a VAE, e.g. able to perceive and imagine through the same nodes, just run in different directions.
      • The fact that both concepts led to an engineering solution is icing on the cake!

hide / / print
ref: -2017 tags: google deepmind compositional variational autoencoder date: 04-08-2020 01:16 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

SCAN: learning hierarchical compositional concepts

  • From DeepMind, first version Jul 2017 / v3 June 2018.
  • Starts broad and strong:
    • "The seemingly infinite diversity of the natural world from a relatively small set of coherent rules"
      • Relative to what? What's the order of magnitude here? In personal experience, each domain involves a large pile of relevant details..
    • "We conjecture that these rules dive rise to regularities that can be discovered through primarily unsupervised experiences and represented as abstract concepts"
    • "If such representations are compositional and hierarchical, they can be recombined into an exponentially large set of new concepts."
    • "Compositionality is at the core of such human abilities as creativity, imagination, and language-based communication.
    • This addresses the limitations of deep learning, which are overly data hungry (low sample efficiency), tend to overfit the data, and require human supervision.
  • Approach:
    • Factorize the visual world with a Β\Beta -VAE to learn a set of representational primitives through unsupervised exposure to visual data.
    • Expose SCAN (or rather, a module of it) to a small number of symbol-image pairs, from which the algorithm identifies the set if visual primitives (features from beta-VAE) that the examples have in common.
      • E.g. this is purely associative learning, with a finite one-layer association matrix.
    • Test on both image 2 symbols and symbols to image directions. For the latter, allow irrelevant attributes to be filled in from the priors (this is important later in the paper..)
    • Add in a third module, which allows learning of compositions of the features, ala set notation: AND ( \cup ), IN-COMMON ( \cap ) & IGNORE ( \setminus or '-'). This is via a low-parameter convolutional model.
  • Notation:
    • q ϕ(z x|x)q_{\phi}(z_x|x) is the encoder model. ϕ\phi are the encoder parameters, xx is the visual input, z xz_x are the latent parameters inferred from the scene.
    • p theta(x|z x)p_{theta}(x|z_x) is the decoder model. xp θ(x|z x)x \propto p_{\theta}(x|z_x) , θ\theta are the decoder parameters. xx is now the reconstructed scene.
  • From this, the loss function of the beta-VAE is:
    • 𝕃(θ,ϕ;x,z x,β)=𝔼 q ϕ(z x|x)[logp θ(x|z x)]βD KL(q ϕ(z x|x)||p(z x)) \mathbb{L}(\theta, \phi; x, z_x, \beta) = \mathbb{E}_{q_{\phi}(z_x|x)} [log p_{\theta}(x|z_x)] - \beta D_{KL} (q_{\phi}(z_x|x)|| p(z_x)) where Β>1\Beta \gt 1
      • That is, maximize the auto-encoder fit (the expectation of the decoder, over the encoder output -- aka the pixel log-likelihood) minus the KL divergence between the encoder distribution and p(z x)p(z_x)
        • p(z)𝒩(0,I)p(z) \propto \mathcal{N}(0, I) -- diagonal normal matrix.
        • β\beta comes from the Lagrangian solution to the constrained optimization problem:
        • max ϕ,θ𝔼 xD[𝔼 q ϕ(z|x)[logp θ(x|z)]]\max_{\phi,\theta} \mathbb{E}_{x \sim D} [\mathbb{E}_{q_{\phi}(z|x)}[log p_{\theta}(x|z)]] subject to D KL(q ϕ(z|x)||p(z))<εD_{KL}(q_{\phi}(z|x)||p(z)) \lt \epsilon where D is the domain of images etc.
      • Claim that this loss function tips the scale too far away from accurate reconstruction with sufficient visual de-tangling (that is: if significant features correspond to small details in pixel space, they are likely to be ignored); instead they adopt the approach of the denoising auto-encoder ref, which uses the feature L2 norm instead of the pixel log-likelihood:
    • 𝕃(θ,ϕ;X,z x,β)=𝔼 q ϕ(z x|x)||J(x^)J(x)|| 2 2βD KL(q ϕ(z x|x)||p(z x)) \mathbb{L}(\theta, \phi; X, z_x, \beta) = -\mathbb{E}_{q_{\phi}(z_x|x)}||J(\hat{x}) - J(x)||_2^2 - \beta D_{KL} (q_{\phi}(z_x|x)|| p(z_x)) where J: WxHxC NJ : \mathbb{R}^{W x H x C} \rightarrow \mathbb{R}^N maps from images to high-level features.
      • This J(x)J(x) is from another neural network (transfer learning) which learns features beforehand.
      • It's a multilayer perceptron denoising autoencoder [Vincent 2010].
  • The SCAN architecture includes an additional element, another VAE which is trained simultaneously on the labeled inputs yy and the latent outputs from encoder z xz_x given xx .
  • In this way, they can present a description yy to the network, which is then recomposed into z yz_y , that then produces an image x^\hat{x} .
    • The whole network is trained by minimizing:
    • 𝕃 y(θ y,ϕ y;y,x,z y,β,λ)=1 st2 nd3 rd \mathbb{L}_y(\theta_y, \phi_y; y, x, z_y, \beta, \lambda) = 1^{st} - 2^{nd} - 3^{rd}
      • 1st term: 𝔼 q ϕ y(z y|y)[logp θ y(y|z y)] \mathbb{E}_{q_{\phi_y}(z_y|y)}[log p_{\theta_y} (y|z_y)] log-likelihood of the decoded symbols given encoded latents z yz_y
      • 2nd term: βD KL(q ϕ y(z y|y)||p(z y)) \beta D_{KL}(q_{\phi_y}(z_y|y) || p(z_y)) weighted KL divergence between encoded latents and diagonal normal prior.
      • 3rd term: λD KL(q ϕ x(z x|y)||q ϕ y(z y|y))\lambda D_{KL}(q_{\phi_x}(z_x|y) || q_{\phi_y}(z_y|y)) weighted KL divergence between latents from the images and latents from the description yy .
        • They note that the direction of the divergence matters; I suspect it took some experimentation to see what's right.
  • Final element! A convolutional recombination element, implemented as a tensor product between z y1z_{y1} and z y2z_{y2} that outputs a one-hot encoding of set-operation that's fed to a (hardcoded?) transformation matrix.
    • I don't think this is great shakes. Could have done this with a small function; no need for a neural network.
    • Trained with very similar loss function as SCAN or the beta-VAE.

  • Testing:
  • They seem to have used a very limited subset of "DeepMind Lab" -- all of the concept or class labels could have been implimented easily, e.g. single pixel detector for the wall color. Quite disappointing.
  • This is marginally more interesting -- the network learns to eliminate latent factors as it's exposed to examples (just like perhaps a Bayesian network.)
  • Similarly, the CelebA tests are meh ... not a clear improvement over the existing VAEs.

hide / / print
ref: -0 tags: VARNUM GEVI genetically encoded voltage indicators FRET Ace date: 03-18-2020 17:12 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-30420685 Fast in-vivo voltage imaging using a red fluorescent indicator

  • Kannan M, Vasan G, Huang C, Haziza S, Li JZ, Inan H, Schnitzer MJ, Pieribone VA.
  • Other genetically encoded voltage indicators (GEVI):
    • PMID-22958819 ArcLight (Peribone also last author) ; sign of ΔF/F\Delta F / F negative, but large, 35%! Slow tho? improvement in speed
    • ASAP3 ΔF/F\Delta F / F large, τ=3ms.\tau = 3 ms.
    • PMID-26586188 Ace-mNeon FRET based, Acetabularia opsin, fast kinetics + brightness of mNeonGreen.
    • Archon1 -- fast and sensitive, found (like VARNUM) using a robotic directed evolution or direct search strategy.
  • VARNAM is based on Acetabularia (Ace) + mRuby3, also FRET based, found via high-throughput voltage screen.
  • Archaerhodopsin require 1-12 W/mm^2 of illumination, vs. 50 mw/mm^2 for GFP based probes. Lots of light!
  • Systematic optimization of voltage sensor function: both the linker region (288 mutants), which affects FRET efficiency, as well as the opsin fluorophore region (768 mutants), which affects the wavelength of absorption / emission.
  • Some intracellular clumping (which will negatively affect sensitivity), but mostly localized to the membrane.
  • Sensitivity is still imperfect -- 4% in-vivo cortical neurons, though it’s fast enough to resolve 100 Hz spiking.
  • Can resolve post-synaptic EPSCs, but < 1 % ΔF/F\Delta F/F .
  • Tested all-optical ephys using VARNAM + blueshifted channelrhodopsin, CheRiff, both sparsely, and in PV targeted transgenetic model. Both work, but this is a technique paper; no real results.
  • Tested TEMPO fiber-optic recording in freely behaving mice (ish) -- induced ketamine waves, 0.5-4Hz.
  • And odor-induced activity in flies, using split-Gal4 expression tools. So many experiments.

hide / / print
ref: -2011 tags: Andrew Ng high level unsupervised autoencoders date: 03-15-2019 06:09 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Building High-level Features Using Large Scale Unsupervised Learning

  • Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng
  • Input data 10M random 200x200 frames from youtube. Each video contributes only one frame.
  • Used local receptive fields, to reduce the communication requirements. 1000 computers, 16 cores each, 3 days.
  • "Strongly influenced by" Olshausen & Field {1448} -- but this is limited to a shallow architecture.
  • Lee et al 2008 show that stacked RBMs can model simple functions of the cortex.
  • Lee et al 2009 show that convolutonal DBN trained on faces can learn a face detector.
  • Their architecture: sparse deep autoencoder with
    • Local receptive fields: each feature of the autoencoder can connect to only a small region of the lower layer (e.g. non-convolutional)
      • Purely linear layer.
      • More biologically plausible & allows the learning of more invariances other than translational invariances (Le et al 2010).
      • No weight sharing means the network is extra large == 1 billion weights.
        • Still, the human visual cortex is about a million times larger in neurons and synapses.
    • L2 pooling (Hyvarinen et al 2009) which allows the learning of invariant features.
      • E.g. this is the square root of the sum of the squares of its inputs. Square root nonlinearity.
    • Local contrast normalization -- subtractive and divisive (Jarrett et al 2009)
  • Encoding weights W 1W_1 and deconding weights W 2W_2 are adjusted to minimize the reconstruction error, penalized by 0.1 * the sparse pooling layer activation. Latter term encourages the network to find invariances.
  • minimize(W 1,W 2) minimize(W_1, W_2) i=1 m(||W 2W 1 Tx (i)x (i)|| 2 2+λ j=1 kε+H j(W 1 Tx (i)) 2) \sum_{i=1}^m {({ ||W_2 W_1^T x^{(i)} - x^{(i)} ||^2_2 + \lambda \sum_{j=1}^k{ \sqrt{\epsilon + H_j(W_1^T x^{(i)})^2}} })}
    • H jH_j are the weights to the j-th pooling element, λ=0.1\lambda = 0.1 ; m examples; k pooling units.
    • This is also known as reconstruction Topographic Independent Component Analysis.
    • Weights are updated through asynchronous SGD.
    • Minibatch size 100.
    • Note deeper autoencoders don't fare consistently better.

hide / / print
ref: -2016 tags: MAPseq Zador connectome mRNA plasmic library barcodes Peikon date: 03-06-2019 00:51 gmt revision:1 [0] [head]

PMID-27545715 High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

  • Justus M. Kebschull, Pedro Garcia da Silva, Ashlan P. Reid, Ian D. Peikon, Dinu F. Albeanu, Anthony M. Zador
  • Another tool for the toolboxes, but I still can't help but to like microscopy: while the number of labels in MAPseq is far higher, the information per read-oout is much lower; an imaged slice holds a lot of information, including dendritic / axonal morphology, which sequencing doesn't get. Natch, you'd wan to use both, or FISseq + ExM.

hide / / print
ref: -0 tags: debugging reinvented java CMU code profiling instrumentation date: 08-02-2014 06:32 gmt revision:3 [2] [1] [0] [head]

images/1289_1.pdf -- Debugging reinvented: Asking and Answering Why and Why not Questions about Program Behavior.

  • Smart approach to allow users to quickly find the causes of bugs (or more generically, any program actions).

hide / / print
ref: Wattanapanitch-2007 tags: recording tech amplifier cascode MOS-bipolar pseudoresistor MIT date: 01-15-2012 18:13 gmt revision:5 [4] [3] [2] [1] [0] [head]

IEEE-4358095 (pdf) An Ultra-Low-Power Neural Recording Amplifier and its use in Adaptively-Biased Multi-Amplifier Arrays.

  • images/729_1.pdf -- copy, just in case.
  • Masters thesis - shows the development of, as the title explains, an ultra low power neural amplifier.
  • Probably the best amplifier out there. NEF 2.67; theoretical limit 2.02.
  • Final design uses folded cascode operational transconductance amplifier (OTA)
    • Design employs a capacitor-feedback gain stage of 40db followed by a lowpass stage.
    • Majority of the current is passed through large subthreshold PMOS input transistors.
      • PMOS has lower noise than NMOS in most processes.
      • Subthreshold has the highest transconductance-to-current ratio. (ratio of a ratio)
    • Cascode transistors self-shunt their own current noise sources.
    • Design takes 0.16 mm^2 in 0.5 um AMI CMOS process, uses 2.7 uA from a ~2.8V supply, input referred noise of 3 uVrms
    • Thesis gives all design parameters for the transistors.
    • Input is AC coupled, DC path through gigaohm MOS-bipolar psudoresistor.
      • this path gracefully decays to diode-connected MOS or bipolar transistors if the voltage is high.
    • images/729_1.pdf
  • Last chapter details the use of envelope detection to adaptively change the bias current of the input stage
    • That is, if an electrode is noisy, the bias current is decreased!

hide / / print
ref: Dethier-2011.28 tags: BMI decoder spiking neural network Kalman date: 01-06-2012 00:20 gmt revision:1 [0] [head]

IEEE-5910570 (pdf) Spiking neural network decoder for brain-machine interfaces

  • Golden standard: kalman filter.
  • Spiking neural network got within 1% of this standard.
  • THe 'neuromorphic' approach.
  • Used Nengo, freely available neural simulator.


Dethier, J. and Gilja, V. and Nuyujukian, P. and Elassaad, S.A. and Shenoy, K.V. and Boahen, K. Neural Engineering (NER), 2011 5th International IEEE/EMBS Conference on 396 -399 (2011)

hide / / print
ref: Fei-2011.05 tags: flash FPGA neural decoder BMI IGLOO f date: 01-06-2012 00:20 gmt revision:2 [1] [0] [head]

IEEE-5946801 (pdf) A low-power implantable neuroprocessor on nano-FPGA for Brain Machine interface applications

  • 5mW for 32 channels, 1.2V core voltage.
  • RLE using thresholding / transmission of DWT coefficients.
  • 5mm x 5mm.


Fei Zhang and Aghagolzadeh, M. and Oweiss, K. Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on 1593 -1596 (2011)

hide / / print
ref: Nicolelis-1998.11 tags: spatiotemporal spiking nicolelis somatosensory tactile S1 3b microwire array rate temporal coding code date: 12-28-2011 20:42 gmt revision:3 [2] [1] [0] [head]

PMID-10196571[0] Simultaneous encoding of tactile information by three primate cortical areas

  • owl monkeys.
  • used microwires arrays to decode the location of tactile stimuli; location was encoded through te population, not within single units.
  • areas 3b, S1 & S2.
  • used LVQ (learning vector quantization) backprop, LDA to predict/ classify touch trials; all yielded about the same ~60% accuracy. Chance level 33%.
  • Interesting: "the spatiotemporal character of neuronal responses in the SII cortex was shown to contain the requisite information for the encoding of stimulus location using temporally patterned spike sequences, whereas the simultaneously recorded neuronal responses in areas 3b and 2 contained the requisite information for rate coding."
    • They support this result by varying bin widths and looking at the % of correctly classivied trials. in SII, increasing bin width decreases (slightly but significantly) the prediction accuracy.


[0] Nicolelis MA, Ghazanfar AA, Stambaugh CR, Oliveira LM, Laubach M, Chapin JK, Nelson RJ, Kaas JH, Simultaneous encoding of tactile information by three primate cortical areas.Nat Neurosci 1:7, 621-30 (1998 Nov)

hide / / print
ref: notes-0 tags: usbmon decode chart linux debug date: 07-12-2010 03:29 gmt revision:3 [2] [1] [0] [head]

From this and the USB 2.0 spec, I made this quick (totally incomprehensible?) key for understanding the output of commands like

# mount -t debugfs none_debugs /sys/kernel/debug
# modprobe usbmon
# cat /sys/kernel/debug/usbmon/2u

To be used with the tables from the (free) USB 2.0 spec:

hide / / print
ref: bookmark-0 tags: code laws lawyers programming date: 11-28-2008 04:54 gmt revision:0 [head]

http://www.linux-mag.com/id/7187 -- has a very interesting and very well applied analogy between programs and laws. I am inclined to believe that they really are not all that different; legalese is structured and convoluted the way it is because it is, in effect, a programming language for laws, hence must be precise and unambiguous. Furthermore, the article is well written and evidences structured and balanced thought (via appropriate references to the real world). And he uses Debian ;-)

hide / / print
ref: notes-0 tags: old blackfin code assembly date: 09-11-2007 15:52 gmt revision:2 [1] [0] [head]

abandoned because I realized that I could work on 2 channels at once (as there are 2 MACs onboard) & could use the s2rnd multiply-accumulate flay & could load registers 32bits at a time! ah well, might as well archive my efforts :)

	r6.h = 2048; 
	r0.l = r0.l - r6.h (s) || r1.l = w[i0++] || r2.l = w[i1++]; //subtract offset, load a1[0] into r1.l, w1[0] into r2.l
	a0 = r0.l * r1.l (is) || r1.h = w[i0++];  //mac in*a1[0], load a[1] to r1.h
	a0 += r2.l * r1.h (is) || r1.l = w[i0++]|| r2.h = w[i1--]; //mac w[0]*a1[1], load a1[2] into r1.l, w1[1] to r2.h
	r4 = (a0 += r2.h * r1.l) (is) || r3.l = w[i0++]; //mac w1[1]*a1[2] store to r4, b1[0] to r3.l
	r4 = r4 >>> 14 || r3.h = w[i0++]; //arithmetic right shift, 32 bit inst, b1[1] to r3.h, r4 is new w1. 
	a0 = r4.l * r3.l (is) || w[i1++] = r4.l; //mac w1*b1[0], save w1 into w1[0]
	a0 += r2.l * r3.h (is) || w[i1++] = r2.l; //mac w1[0]*b[1], save w1[0] into w1[1]
	r4 = (a0 += r2.h * r3.l) (is) || r1.l = w[i0++] || r2.l = w[i1++];//mac w1[1]*b1[0] store r4, a2[0] to r1.l, w2[0] to r2.l
	r4 = r4 >>> 14 || r1.h = w[i0++] || r2.h = w[i1--]; //arith. right shift, a2[1] to r1.h, w2[1] to r2.h 
	a0 = r4.l * r1.l (is);  //mac in*a2[0],  a2[2] into r1.l
	a0 += r2.l * r1.h (is) ||  rl.l = w[i0++]; //mac w2[0]*a2[1], b2[0] into r3.l
	r4 = (a0 += r2.h * r1.l) (is) || r3.l = w[i0++]; //mac w2[1]*a2[2] store r4, b2[1] into r3.h
	r4 = r4 >>> 14 || r3.h = w[i0++]; //arithmetic shift to get w2, b2[2] to r3.h
	a0 = r4.l * r3.l (is) || w[i1++] = r4.l; //mac w2 * b2[0], store w2 to w2[0]
	a0 += r2.l * r3.h (is) || w[i1++] = r2.l; //mac w2[0]*b2[1], store w2[0] to w2[1]. i1 now pointing to secondary channel. 
	r4 = (a0 += r2.h * r3.l) (is) || i0 -= 10; //mac w2[1]*b2[0].  reset coeff ptr. done with pri chan, save in r5.
	r5 = r4 >>> 14; 
	//time for the secondary channel!
	r0.h = r0.h - r6.h (s) || r1.l = w[i0++] || r2.l = w[i1++]; //subtract offset, load a1[0] to r1.1, w1[0] to r2.l
	a0 = r0.h * r1.l (is) || r1.h = w[i0++] ; //mac in*a1[0], a1[1] to r1.h, save out samp pri.
	a0 += r2.l * r1.h (is) || r1.l = w[r0++] || r2.h = w[i1--]; //mac w1[0]*a1[1], a1[2] to r1.l, w1[1] to r2.h
	r4 = (a0 += r2.h * r1.l) (is) || r3.l = w[i0++]; //mac, b1[0] to r3.l
	r4 = r4 >>> 14 || r3.h = w[i0++]; //arithmetic shift, b1[1] to r3.h
	a0 = r4.l * r3.l (is) || w[i1++] = r4.l; //mac w1*b1[0], save w1 to w1[0]
	a0 += r2.l * r3.h (is) || w[i++] = r2.l; //mac w1[0], save w1[0] to w1[1]
	r4 = (a0 += r2.h * r3.l) (is) || r1.l = w[i0++] || r2.l = w[i1++]; //mac w1[1]*b1[0] store r4, a2[0] to r1.l, w2[0] to r2.l
	r4 = r4 >>> 14 || r2.h = w[i1--]; // r4 output of 1st biquad, w2[1] to r2.h
	a0 = r4.l * r1.l (is) || r1.h = w[i0++] ; //mac in* a2[0], a2[1] to r1.h
	a0 += r2.l * r1.h (is) || r1.h = w[i0++] ;  //mac w2[0]*a2[1], a2[2] to r1.l
	r4 = (a0 += r2.h * r1.l) (is) || r3.l = w[i0++]; //mac w2[1]*a2[2], b2[0] to r3.l
	r4 = r4 >>> 14 || r3.h = w[i0++]; //r4 is w2, b2[2] to r3.h
	a0 = r4.l * r3.l (is) || w[i++] = r4.l ; //mac w2 * b2[0], store w2 to w2[0]
	a0 += r2.l * r3.h (is) || w[i++] = r2.l; //mac w2[0] * b2[1], store w2[0] to w2[1].  i1 now pointing to next channel. 
	r4 = (a0 += r2.h * r3.l) (is) || i0 -= 10; //mac w2[1] * b2[0], reset coeff. ptr, save in r4. 
	r4 = r4 >>> 14; 

here is a second (but still not final) attempt, once i realized that it is possible to issue 2 MACS per cycle

// I'm really happy with this - every cycle is doing two MMACs. :)
	//															i0	i1 (in 16 bit words)
	r1 = [i0++] || r4 = [i1++]; 						//	2	2 	r1= a0 a1 r4= w0's
	a0 = r0.l * r1.l, a1 = r0.h * r1.l || r2 = [i0++] || r5 = [i1]; 			//	4	2	r2= a2 a2 r5= w1's
	a0 += r4.l * r1.h, a1 = r4.h * r1.h  || r3 = [i0++] || [i1--] = r4; 		//	6	0	r3= b0 b1 w1's=r4
	r0.l = (a0 += r5.l * r2.l), r0.h = (a1 += r5.h * r2.l)(s2rnd); 
	a0 = r0.l * r3.l, a1 = r0.h * r3.l || [i1++] = r0; 					//	6	2	w0's = r0
	a0 += r4.l * r3.h, a1 += r4.h * r3.h || r1 = [i0++] || i1 += 4; 		//	8	4 	r1 = a0 a1 
	//load next a[0] a[1] to r1; move to next 2nd biquad w's; don't reset the coef pointer - move on to the next biquad. 
	r0.l = (a0 += r5.l * r3.l), r0.h = (a1 += r5.h * r3.l)(s2rnd) || r4 = [i1++]; //	8	6	r4 = w0's, next biquad
	//note: the s2rnd flag post-multiplies accumulator contents by 2.  see pg 581 or 15-69
	//second biquad. 
	a0 = r0.l * r1.l, a1 = r0.h * r1.l || r2 = [i0++] || r5 = [i1];			//	10	6	r2= a2 a2 r5 = w1's
	a0 += r4.l * r1.h, a1 += r4.h * r1.h || r3 = [i0++] || [i1--] = r4; 		//	12	4	r3= b0 b1 w1's = r4
	r0.l = (a0 += r5.l * r2.l), r0.h = (a1 += r5.h * r2.l)(s2rnd); 			//
	a0 = r0.l * r3.l, a1 = r0.h * r3.l || [i1++] = r0; 					//	12	6	w0's = r0
	a0 += r4.l * r3.h, a1 += r4.h * r3.h || r1 = [i0++] || i1 += 4; 		//	14	8	r1 = a0 a1
	r0.l = (a0 += r5.l * r3.l), r0.h = (a1 += r5.h * r3.l)(s2rnd) || r4 = [i1++]; //	14	10	r4 = w0's
	//third biquad. 
	a0 = r0.l * r1.l, a1 = r0.h * r1.l || r2 = [i0++] || r5 = [i1];			//	16	10	r2= a2 a2 r5 = w1's
	a0 += r4.l * r1.h, a1 += r4.h * r1.h || r3 = [i0++] || [i1--] = r4; 		//	18	8	r3= b0 b1 w1's = r4
	r0.l = (a0 += r5.l * r2.l), r0.h = (a1 += r5.h * r2.l)(s2rnd); 			//
	a0 = r0.l * r3.l, a1 = r0.h * r3.l || [i1++] = r0; 					//	18	10	w0's = r0
	a0 += r4.l * r3.h, a1 += r4.h * r3.h || r1 = [i0++] || i1 += 4; 		//	20	12	r1 = a0 a1
	r0.l = (a0 += r5.l * r3.l), r0.h = (a1 += r5.h * r3.l)(s2rnd) || r4 = [i1++]; //	20	14	r4 = w0's
	//fourth biquad. 
	a0 = r0.l * r1.l, a1 = r0.h * r1.l || r2 = [i0++] || r5 = [i1];			//	22	14
	a0 += r4.l * r1.h, a1 += r4.h * r1.h || r3 = [i0++] || [i1--] = r4; 		//	24	12
	r0.l = (a0 += r5.l * r2.l), r0.h = (a1 += r5.h * r2.l)(s2rnd); 
	a0 = r0.l * r3.l, a1 = r0.h * r3.l || [i1++] = r0; 					//	24	14
	a0 += r4.l * r3.h, a1 += r4.h * r3.h || i1 += 4; 					//	24	16
	r0.l = (a0 += r5.l * r3.l), r0.h = (a1 += r5.h * r3.l)(s2rnd); 			// 48: loop back; 32 bytes: move to next channel.

hide / / print
ref: bookmark-0 tags: magstripe magnetic stripe reader writer encoder date: 05-31-2007 02:49 gmt revision:1 [0] [head]

notes on reading magstripe cards:

hide / / print
ref: engineering notes-0 tags: cascode amplifier jfet circuit audio miller effect input capacitance date: 03-03-2007 04:15 gmt revision:0 [head]


  • a good tutorial on using JFETs for audio amplifier applications
  • shows use of a cascode topology to reduce the miller input-capacitance.

hide / / print
ref: bookmark-0 tags: information entropy bit rate matlab code date: 0-0-2006 0:0 revision:0 [head]


  • concise, well documented, useful.
  • number of bins = length of vector ^ (1/3).
  • information = sum(log (bincounts / prior) * bincounts) -- this is just the divergence, same as I do it.