{1492} revision 2 modified: 12-10-2019 03:41 gmt

PMID: Spiking neurons can discover predictive features by aggregate-label learning

  • This is a meandering, somewhat long-winded, and complicated paper, even for the journal Science. It's not been cited a great many times, but none-the-less is of interest.
  • The goal of the derived network is to detect fixed-pattern presynaptic sequences, and fire a prespecified number of spikes to each occurrence.
  • One key innovation is the use of a spike-threshold-surface for a 'tempotron' [12], the derivative of which is used to update the weights of synapses after trials. As the author says, spikes are hard to differentiate; the STS makes this more possible. This is hence standard gradient descent: if the neuron missed a spike then the weight is increased based on aggregate STS (for the whole trial -- hence the neuron / SGD has to perform temporal and spatial credit assignment).
    • As common, the SGD is appended with a momentum term.
  • Since STS differentiation is biologically implausible -- where would the memory lie? -- he also implements a correlational synaptic eligibility trace. The correlation is between the postsynaptic voltage and the EPSC, which seems kinda circular.
    • Unsurprisingly, it does not work as well as the SGD approximation. But does work...
  • Second innovation is the incorporation of self-supervised learning: a 'supervisory' neuron integrates the activity of a number (50) of feature detector neurons, and reinforces them to basically all fire at the same event, WTA style. This effects a unsupervised feature detection.
  • This system can be used with sort-of lateral inhibition to reinforce multiple features. Not so dramatic -- continuous feature maps.

Editorializing a bit: I said this was interesting, but why? The first part of the paper is another form of SGD, albeit in a spiking neural network, where the gradient is harder compute hence is done numerically.

It's the aggregate part that is new -- pulling in repeated patterns through synaptic learning rules. Of course, to do this, the full trace of pre and post synaptic activity must be recorded (??) for estimating the STS (i think). An eligibility trace moves in the right direction as a biologically plausible approximation, but as always nothing matches the precision of SGD. Can the eligibility trace be amended with e.g. neuromodulators to push the performance near that of SGD?

The next step of adding self supervised singular and multiple features is perhaps toward the way the brain organizes itself -- small local feedback loops. These features annotate repeated occurrences of stimuli, or tile a continuous feature space.

Still, the fact that I haven't seen any follow-up work is suggestive...

Editorializing further, there is a limited quantity of work that a single human can do. In this paper, it's a great deal of work, no doubt, and the author offers some good intuitions for the design decisions. Yet still, the total complexity that even a very determined individual can amass is limited, and likely far below the structural complexity of a mammalian brain.

This implies that inference either must be distributed and compositional (the normal path of science), or the process of evaluating & constraining models must be significantly accelerated. This later option is appealing, as current progress in neuroscience seems highly technology limited -- old results become less meaningful when the next wave of measurement tools comes around, irrespective of how much work went into it. (Though: the impedtus for measuring a particular thing in biology is only discovered through these 'less meaningful' studies...).

A third option, perhaps one which many theoretical neuroscientists believe in, is that there are some broader, physics-level organizing principles to the brain. Karl Friston's free energy principle is a good example of this. Perhaps at a meta level some organizing theory can be found, or likely a set of theories; but IMHO, you'll need at least one theory per brain area, at least, just the same as each area is morphologically, cytoarchitecturaly, and topologically distinct. (There may be only a few theories of the cortex, despite all the areas, which is why so many are eager to investigate it!)

So what constitutes a theory? Well, you have to meaningfully describe what a brain region does. (Why is almost as important; how more important to the path there.) From a sensory standpoint: what information is stored? What processing gain is enacted? How does the stored information impress itself on behavior? From a motor standpoint: how are goals selected? How are the behavioral segments to attain them sequenced? Is the goal / behavior even a reasonable way of factoring the problem?

Our dual problem, building the bridge from the other direction, is perhaps easier. Or it could be a lot more money has gone into it. Either way, much progress has been made in AI. One arm is deep function approximation / database compression for fast and organized indexing, aka deep learning. Many people are thinking about that; no need to add to the pile; anyway, as OpenAI has proven, the common solution to many problems is to simply throw more compute at it. A second is deep reinforcement learning, which is hideously sample and path inefficient, hence ripe for improvement. One side is motor: rather than indexing raw motor variables (LRUD in a video game, or joint torques with a robot..) you can index motor primitives, perhaps hierarchically built; likewise, for the sensory input, the model needs to infer structure about the world. This inference should decompose overwhelming sensory experience into navigable causes ...

But how can we do this decomposition? The cortex is more than adept at it, but now we're at the original problem, one that the paper above purports to make a stab at.