m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{715}
hide / / print
ref: Legenstein-2008.1 tags: Maass STDP reinforcement learning biofeedback Fetz synapse date: 04-09-2009 17:13 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-18846203[0] A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

  • (from abstract) The resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP.
    • This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker.
  • STDP is prevalent in the cortex ; however, it requires a second signal:
    • Dopamine seems to gate STDP in corticostriatal synapses
    • ACh does the same or similar in the cortex. -- see references 8-12
  • simple learning rule they use: d/dtW ij(t)=C ij(t)D(t) d/dt W_{ij}(t) = C_{ij}(t) D(t)
  • Their notes on the Fetz/Baker experiments: "Adjacent neurons tended to change their firing rate in the same direction, but also differential changes of directions of firing rates of pairs of neurons are reported in [17] (when these differential changes were rewarded). For example, it was shown in Figure 9 of [17] (see also Figure 1 in [19]) that pairs of neurons that were separated by no more than a few hundred microns could be independently trained to increase or decrease their firing rates."
  • Their result is actually really simple - there is no 'control' or biofeedback - there is no visual or sensory input, no real computation by the network (at least for this simulation). One neuron is simply reinforced, hence it's firing rate increases.
    • Fetz & later Schimdt's work involved feedback and precise control of firing rate; this does not.
    • This also does not address the problem that their rule may allow other synapses to forget during reinforcement.
  • They do show that exact spike times can be rewarded, which is kinda interesting ... kinda.
  • Tried a pattern classification task where all of the information was in the relative spike timings.
    • Had to run the pattern through the network 1000 times. That's a bit unrealistic (?).
      • The problem with all these algorithms is that they require so many presentations for gradient descent (or similar) to work, whereas biological systems can and do learn after one or a few presentations.
  • Next tried to train neurons to classify spoken input
    • Audio stimului was processed through a cochlear model
    • Maass previously has been able to train a network to perform speaker-independent classification.
    • Neuron model does, roughly, seem to discriminate between "one" and "two"... after 2000 trials (each with a presentation of 10 of the same digit utterance). I'm still not all that impressed. Feels like gradient descent / linear regression as per the original LSM.
  • A great many derivations in the Methods section... too much to follow.
  • Should read refs:
    • PMID-16907616[1] Gradient learning in spiking neural networks by dynamic perturbation of conductances.
    • PMID-17220510[2] Solving the distal reward problem through linkage of STDP and dopamine signaling.

____References____

[0] Legenstein R, Pecevski D, Maass W, A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.PLoS Comput Biol 4:10, e1000180 (2008 Oct)
[1] Fiete IR, Seung HS, Gradient learning in spiking neural networks by dynamic perturbation of conductances.Phys Rev Lett 97:4, 048104 (2006 Jul 28)
[2] Izhikevich EM, Solving the distal reward problem through linkage of STDP and dopamine signaling.Cereb Cortex 17:10, 2443-52 (2007 Oct)