you are not logged in, login. new entry
text: sort by
tags: modified
type: chronology
{822} is owned by tlh24.{851} is owned by tlh24.{845} is owned by tlh24.{667} is owned by tlh24.
[0] Bar-Gad I, Morris G, Bergman H, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia.Prog Neurobiol 71:6, 439-73 (2003 Dec)

[0] Carmena JM, Lebedev MA, Crist RE, O'Doherty JE, Santucci DM, Dimitrov DF, Patil PG, Henriquez CS, Nicolelis MA, Learning to control a brain-machine interface for reaching and grasping by primates.PLoS Biol 1:2, E42 (2003 Nov)

[0] Fetz EE, Baker MA, Operantly conditioned patterns on precentral unit activity and correlated responses in adjacent cells and contralateral muscles.J Neurophysiol 36:2, 179-204 (1973 Mar)

[0] Atallah HE, Lopez-Paniagua D, Rudy JW, O'Reilly RC, Separate neural substrates for skill learning and performance in the ventral and dorsal striatum.Nat Neurosci 10:1, 126-31 (2007 Jan)

[0] Jackson A, Mavoori J, Fetz EE, Long-term motor cortex plasticity induced by an electronic neural implant.Nature 444:7115, 56-60 (2006 Nov 2)

[0] Gandolfo F, Mussa-Ivaldi FA, Bizzi E, Motor learning by field approximation.Proc Natl Acad Sci U S A 93:9, 3843-6 (1996 Apr 30)[1] Mussa-Ivaldi FA, Giszter SF, Vector field approximation: a computational paradigm for motor control and learning.Biol Cybern 67:6, 491-500 (1992)

[0] Loewenstein Y, Seung HS, Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.Proc Natl Acad Sci U S A 103:41, 15224-9 (2006 Oct 10)

[0] Matsuzaka Y, Picard N, Strick PL, Skill representation in the primary motor cortex after long-term practice.J Neurophysiol 97:2, 1819-32 (2007 Feb)

[0] Diedrichsen J, Hashambhoy Y, Rane T, Shadmehr R, Neural correlates of reach errors.J Neurosci 25:43, 9919-31 (2005 Oct 26)

[0] Mehta MR, Cortico-hippocampal interaction during up-down states and memory consolidation.Nat Neurosci 10:1, 13-5 (2007 Jan)[1] Ji D, Wilson MA, Coordinated memory replay in the visual cortex and hippocampus during sleep.Nat Neurosci 10:1, 100-7 (2007 Jan)

[0] Nishida M, Walker MP, Daytime naps, motor memory consolidation and regionally specific sleep spindles.PLoS ONE 2:4, e341 (2007 Apr 4)

[0] Káli S, Dayan P, Off-line replay maintains declarative memories in a model of hippocampal-neocortical interactions.Nat Neurosci 7:3, 286-94 (2004 Mar)

[0] Tamaki M, Matsuoka T, Nittono H, Hori T, Fast sleep spindle (13-15 hz) activity correlates with sleep-dependent improvement in visuomotor performance.Sleep 31:2, 204-11 (2008 Feb 1)

[0] Morin A, Doyon J, Dostie V, Barakat M, Hadj Tahar A, Korman M, Benali H, Karni A, Ungerleider LG, Carrier J, Motor sequence learning increases sleep spindles and fast frequencies in post-training sleep.Sleep 31:8, 1149-56 (2008 Aug 1)

[0] Song S, Consciousness and the consolidation of motor learning.Behav Brain Res 196:2, 180-6 (2009 Jan 23)

[0] Peters J, Schaal S, Reinforcement learning of motor skills with policy gradients.Neural Netw 21:4, 682-97 (2008 May)

[0] Kakade S, Dayan P, Dopamine: generalization and bonuses.Neural Netw 15:4-6, 549-59 (2002 Jun-Jul)

[0] Karni A, Meyer G, Rey-Hipolito C, Jezzard P, Adams MM, Turner R, Ungerleider LG, The acquisition of skilled motor performance: fast and slow experience-driven changes in primary motor cortex.Proc Natl Acad Sci U S A 95:3, 861-8 (1998 Feb 3)

[0] Daw ND, Doya K, The computational neurobiology of learning and reward.Curr Opin Neurobiol 16:2, 199-204 (2006 Apr)

[0] Schultz W, Multiple reward signals in the brain.Nat Rev Neurosci 1:3, 199-207 (2000 Dec)[1] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Buonomano DV, Merzenich MM, Cortical plasticity: from synapses to maps.Annu Rev Neurosci 21no Issue 149-86 (1998)

[0] Recanzone GH, Schreiner CE, Merzenich MM, Plasticity in the frequency representation of primary auditory cortex following discrimination training in adult owl monkeys.J Neurosci 13:1, 87-103 (1993 Jan)

[0] Nakahara H, Doya K, Hikosaka O, Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach.J Cogn Neurosci 13:5, 626-47 (2001 Jul 1)

[0] Hikosaka O, Nakamura K, Sakai K, Nakahara H, Central mechanisms of motor skill learning.Curr Opin Neurobiol 12:2, 217-22 (2002 Apr)

[0] Graybiel AM, Aosaki T, Flaherty AW, Kimura M, The basal ganglia and adaptive motor control.Science 265:5180, 1826-31 (1994 Sep 23)

[0] Dayan P, Balleine BW, Reward, motivation, and reinforcement learning.Neuron 36:2, 285-98 (2002 Oct 10)

[0] Graybiel AM, The basal ganglia: learning new tricks and loving it.Curr Opin Neurobiol 15:6, 638-44 (2005 Dec)

[0] Radhakrishnan SM, Baker SN, Jackson A, Learning a novel myoelectric-controlled interface task.J Neurophysiol no Volume no Issue no Pages (2008 Jul 30)

[0] Li CS, Padoa-Schioppa C, Bizzi E, Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field.Neuron 30:2, 593-607 (2001 May)[1] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)

[0] Maravita A, Iriki A, Tools for the body (schema).Trends Cogn Sci 8:2, 79-86 (2004 Feb)[1] Iriki A, Tanaka M, Iwamura Y, Coding of modified body schema during tool use by macaque postcentral neurones.Neuroreport 7:14, 2325-30 (1996 Oct 2)

[0] Fetz EE, Volitional control of neural activity: implications for brain-computer interfaces.J Physiol 579:Pt 3, 571-9 (2007 Mar 15)

[0] Francis JT, Influence of the inter-reach-interval on motor learning.Exp Brain Res 167:1, 128-31 (2005 Nov)

[0] Kawato M, Internal models for motor control and trajectory planning.Curr Opin Neurobiol 9:6, 718-27 (1999 Dec)

[0] Brashers-Krug T, Shadmehr R, Bizzi E, Consolidation in human motor memory.Nature 382:6588, 252-5 (1996 Jul 18)

[0] Afanas'ev SV, Tolkunov BF, Rogatskaya NB, Orlov AA, Filatova EV, Sequential rearrangements of the ensemble activity of putamen neurons in the monkey brain as a correlate of continuous behavior.Neurosci Behav Physiol 34:3, 251-8 (2004 Mar)

hide / edit[0] / print
ref: -2018 tags: machine learning manifold deep neural net geometry regularization date: 08-29-2018 14:30 gmt revision:0 [head]

LDMNet: Low dimensional manifold regularized neural nets.

  • Synopsis of the math:
    • Fit a manifold formed from the concatenated input ‘’and’’ output variables, and use this set the loss of (hence, train) a deep convolutional neural network.
      • Manifold is fit via point integral method.
      • This requires both SGD and variational steps -- alternate between fitting the parameters, and fitting the manifold.
      • Uses a standard deep neural network.
    • Measure the dimensionality of this manifold to regularize the network. Using a 'elegant trick', whatever that means.
  • Still yet he results, in terms of error, seem not very significantly better than previous work (compared to weight decay, which is weak sauce, and dropout)
    • That said, the results in terms of feature projection, figures 1 and 2, ‘’do’’ look clearly better.
    • Of course, they apply the regularizer to same image recognition / classification problems (MNIST), and this might well be better adapted to something else.
  • Not completely thorough analysis, perhaps due to space and deadlines.

hide / edit[1] / print
ref: -0 tags: nucleus accumbens caudate stimulation learning enhancement MIT date: 09-20-2016 23:51 gmt revision:1 [0] [head]

Temporally Coordinated Deep Brain Stimulation in the Dorsal and Ventral Striatum Synergistically Enhances Associative Learning

  • Monkeys had to learn to associate an image with one of 4 reward targets.
    • Fixation period, movement period, reward period -- more or less standard task.
    • Blocked trial structure with randomized associations + control novel images + control familiar images.
  • Timed stimulation:
    • Nucleus Accumbens during fixation period
      • Shell not core; non-hedonic in separate test.
    • Caudate (which part -- targeting?) during feedback on correct trials.
  • Performance on stimulated images improved in reaction time, learning rate, and ultimate % correct.
  • Small non-significant improvement in non-stimulated novel image.
  • Wonder how many stim protocols they had to try to get this correct?

hide / edit[6] / print
ref: -0 tags: deep reinforcement learning date: 04-12-2016 17:19 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

Prioritized experience replay

  • In general, experience replay can reduce the amount of experience required to learn, and replace it with more computation and more memory – which are often cheaper resources than the RL agent’s interactions with its environment.
  • Transitions (between states) may be more or less
    • surprising (does the system in question have a model of the environment? It does have a model of the state & action expected reward, as it's Q-learning.
    • redundant, or
    • task-relevant
  • Some sundry neuroscience links:
    • Sequences associated with rewards appear to be replayed more frequently (Atherton et al., 2015; Ólafsdóttir et al., 2015; Foster & Wilson, 2006). Experiences with high magnitude TD error also appear to be replayed more often (Singer & Frank, 2009 PMID-20064396 ; McNamara et al., 2014).
  • Pose a useful example where the task is to learn (effectively) a random series of bits -- 'Blind Cliffwalk'. By choosing the replayed experiences properly (via an oracle), you can get an exponential speedup in learning.
  • Prioritized replay introduces bias because it changes [the sampled state-action] distribution in an uncontrolled fashion, and therefore changes the solution that the estimates will converge to (even if the policy and state distribution are fixed). We can correct this bias by using importance-sampling (IS) weights.
    • These weights are the inverse of the priority weights, but don't matter so much at the beginning, when things are more stochastic; they anneal the controlling exponent.
  • There are two ways of selecting (weighting) the priority weights:
    • Direct, proportional to the TD-error encountered when visiting a sequence.
    • Ranked, where errors and sequences are stored in a data structure ordered based on error and sampled 1/rank .
  • Somewhat illuminating is how the deep TD or Q learning is unable to even scratch the surface of Tetris or Montezuma's Revenge.

hide / edit[6] / print
ref: Ganguly-2011.05 tags: Carmena 2011 reversible cortical networks learning indirect BMI date: 01-23-2013 18:54 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-21499255[0] Reversible large-scale modification of cortical networks during neuroprosthetic control.

  • Split the group of recorded motor neurons into direct (decoded and controls the BMI) and indirect (passive) neurons.
  • Both groups showed changes in neuronal tuning / PD.
    • More PD. Is there no better metric?
  • Monkeys performed manual control before (MC1) and after (MC2) BMI training.
    • The majority of neurons reverted back to original tuning after BC; c.f. [1]
  • Monkeys were trained to rapidly switch between manual and brain control; still showed substantial changes in PD.
  • 'Near' (on same electrode as direct neurons) and 'far' neurons (different electrode) showed similar changes in PD.
    • Modulation Depth in indirect neurons was less in BC than manual control.
  • Prove (pretty well) that motor cortex neuronal spiking can be dissociated from movement.
  • Indirect neurons showed decreased modulation depth (MD) -> perhaps this is to decrease interference with direct neurons.
  • Quote "Studies of operant conditioning of single neurons found that conconditioned adjacent neurons were largely correlated with the conditioned neurons".
    • Well, also: Fetz and Baker showed that you can condition neurons recorded on the same electrode to covary or inversely vary.
  • Contrast with studies of motor learning in different force fields, where there is a dramatic memory trace.
    • Possibly this is from proprioception activating the cerebellum?

Other notes:

  • Scale bars on the waveforms are incorrect for figure 1.
  • Same monkeys as [2]


[0] Ganguly K, Dimitrov DF, Wallis JD, Carmena JM, Reversible large-scale modification of cortical networks during neuroprosthetic control.Nat Neurosci 14:5, 662-7 (2011 May)
[1] Gandolfo F, Li C, Benda BJ, Schioppa CP, Bizzi E, Cortical correlates of learning in monkeys adapting to a new dynamical environment.Proc Natl Acad Sci U S A 97:5, 2259-63 (2000 Feb 29)
[2] Ganguly K, Carmena JM, Emergence of a stable cortical map for neuroprosthetic control.PLoS Biol 7:7, e1000153 (2009 Jul)

hide / edit[0] / print
ref: -0 tags: artificial intelligence projection episodic memory reinforcement learning date: 08-15-2012 19:16 gmt revision:0 [head]

Projective simulation for artificial intelligence

  • Agent learns based on memory 'clips' which are combined using some pseudo-bayesian method to trigger actions.
    • These clips are learned from experience / observation.
    • Quote: "..more complex behavior seems to arise when an agent is able to “think for a while” before it “decides what to do next.” This means the agent somehow evaluates a given situation in the light of previous experience, whereby the type of evaluation is different from the execution of a simple reflex circuit"
    • Quote: "Learning is achieved by evaluating past experience, for example by simple reinforcement learning".
  • The forward exploration of learned action-stimulus patterns is seemingly a general problem-solving strategy (my generalization).
  • Pretty simple task:
    • Robot can only move left / right; shows a symbol to indicate which way it (might?) be going.

hide / edit[3] / print
ref: Hashimoto-2003.03 tags: cortex striatum learning carmena costa basal ganglia date: 03-07-2012 23:18 gmt revision:3 [2] [1] [0] [head]

PMID-22388818 Corticostriatal plasticity is necessary for learning intentional neuroprosthetic skills.

  • Trained a mouse to control an auditory cursor, as in Kipke's task {99}. Did not cite that paper, claimed it was 'novel'. oops.
  • Summed neuronal firing rate of groups of 2 or 4 M1 neurons.
    • One grou increased the frequenxy with increased firing rate; the other decreased tone with increasing FR.
  • Removal of striatal NMDA receptors impairs the ability to learn neuroprosthetic skills.
    • Hence, they argue, cortico-striatal plastciity is required to learn abstract skills, such as this tone to firing rate target acquisition task.
  • Auditory feedback was essential for operant learning.
  • Controlled by recording EMG of the vibrissae + injection of lidocane into the whisker pad.
  • One reward was sucrose solution; the other was a food pellet. When the rat was satiated on one modality, they showed increased preference for the opposite reward. Clever control.
  • Noticed pronounced oscillatory spike coupling, the coherence of which was increased in low-frequency bands in late learning relative to early learning (figure 3).
  • Genetic manipulations: knockin line that expresses Cre recombinase in both striatonigral and striatopallidal medium spiny neurons, crossed with mice carrying a floxed allele of the NMDAR1 gene.
    • These animals are relatively normal, and can learn to perform rapid sequential movements, but are unable to learn precise motor sequences.
    • Acute pharmacological blockade of NMDAR did not affect performance of the neuroprosthetic skill.
    • HEnce the deficits in the transgenic mice are due to an inability to perform the skill.

hide / edit[2] / print
ref: Jarosiewicz-2008.12 tags: Schwartz BMI learning perturbation date: 03-07-2012 17:11 gmt revision:2 [1] [0] [head]

PMID-19047633[0] Functional network reorganization during learning in a brain-computer interface paradigm.

  • quote: For example, the tuning functions of neurons in the motor cortex can change when monkeys adapt to perturbations that interfere with the execution (5–7) or visual feedback (8–10) of their movements. Check these refs - have to be good!
  • point out that only the BMI lets you see how the changes reflect changes in behavior.
  • BMI also allows pertubactions to target a subset of neurons. apparently, they had the same idea as me.
  • used the PV algorithm. yeck.
  • perturbed a select subset of neurons by rotating their tuning by 90deg. about the Z-axis. pre - perturb - washout series of experiments.
  • 3D BMI, center-out task, 8 targets at the corners of a cube.
  • looked for the following strategies for compensating to the perturbation:
    • re-aiming: to compensate for the deflected trajectory, aim at a rotated target.
    • re-waiting: decrease the strength of the rotated neurons.
    • re-mapping: use the new units based on their rotated tuning.
  • modulation depths for the rotated neurons did in fact decrease.
  • PD for the neurons that were perturbed rotated more than the control neurons.
  • rotated neurons contributed to error parallel to perturbation, unrotated compensated for this, and contributed to 'errors' in the opposite direction.
  • typical recording sessions of 3 hours - thus, the adaptation had to proceed quickly and only online. pre-perturb-washout each had about 8 * 20 trials.
  • interesting conjecture: "Another possibility is that these neurons solve the “credit-assignment problem” described in the artificial intelligence literature (25–26). By using a form of Hebbian learning (27), each neuron could reduce its contribution to error independently of other neurons via noise-driven synaptic updating rules (28–30). "
    • ref 25: Minsky - 1961;
    • ref 26: Cohen PR, Feigenbaum EA (1982) The Handbook of Artificial Intelligence; 27 references Hebb driectly - 1949 ;
    • ref 28: ALOPEX {695} ;
    • ref 29: PMID-1903542[1] A more biologically plausible learning rule for neural networks.
    • ref 30: PMID-17652414[2] Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. Fiete IR, Fee MS, Seung HS.


[0] Jarosiewicz B, Chase SM, Fraser GW, Velliste M, Kass RE, Schwartz AB, Functional network reorganization during learning in a brain-computer interface paradigm.Proc Natl Acad Sci U S A 105:49, 19486-91 (2008 Dec 9)
[1] Mazzoni P, Andersen RA, Jordan MI, A more biologically plausible learning rule for neural networks.Proc Natl Acad Sci U S A 88:10, 4433-7 (1991 May 15)
[2] Fiete IR, Fee MS, Seung HS, Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances.J Neurophysiol 98:4, 2038-57 (2007 Oct)

hide / edit[2] / print
ref: -0 tags: implicit motor sequence learning basal ganglia parkinson's disease date: 03-06-2012 22:47 gmt revision:2 [1] [0] [head]

PMID-19744484 What can man do without basal ganglia motor output? The effect of combined unilateral subthalamotomy and pallidotomy in a patient with Parkinson's disease.

  • Unilateral lesion of both STN and GPi in one patient. Hence, the patient was his own control.
    • DRastically reduced the need for medication, indicating that it had a profound effect on BG output.
  • Arm contralateral lesion showed faster reaction times and normal movement speeds; ipsilateral arm parkinsonian.
  • Implicit sequence learning in a task was absent.
  • In a go / no-go task when the percent of no-go trials increased, the RT speriority of contralateral hand was lost.
  • " THe risk of persistent dyskinesias need not be viewed as a contraindication to subthalamotomy in PD patients since they can be eliminated if necessary by a subsequent pallidotomy without producting deficits that impair daily life.
  • Subthalamotomy incurs persistent hemiballismus / chorea in 8% of patients; in many others chorea spontaneously disappears.
    • This can be treated by a subsequent pallidotomy.
  • Patient had Parkinsonian symptoms largely restricted to right side.
  • Measured TMS ability to stimulate motor cortex -- which appears to be a common treatment. STN / GPi lesion appears to have limited effect on motor cortex exitability 9other things redulate it?).
  • conclusion: interrupting BG output removes such abnormal signaling and allows the motor system to operate more normally.
    • Bath DA presumably calms hyperactive SNr neurons.
    • Yuo cannot distrupt output of the BG with compete imuntiy; the associated abnormalities may be too subtle to be detected in normal behaviors, explaniing the overall clinical improbement seen in PD patients after surgery and the scarcity fo clinical manifestations in people with focal BG lesions (Bhatia and Marsden, 1994; Marsden and Obeso 1994).
      • Our results support the prediction that surgical lesions of the BG in PD would be associated with inflexibility or reduced capability for motor learning. (Marsden and Obeso, 1994).
  • It is better to dispense with faulty BG output than to have a faulty one.

hide / edit[4] / print
ref: bookmark-0 tags: basal ganglia dopamine reinforcement learning Graybeil date: 03-06-2012 18:14 gmt revision:4 [3] [2] [1] [0] [head]

PMID-16271465 The basal ganglia: learning new tricks and loving it

  • BG analogous to the anterior forebrain pathway (AFP), which is necessary for song learning in young birds. Requires lots of practice and feedback. Studies suggest e.g. that neural activity in the AFP is correlated with song variability, and that the AFP can adjust ongoing activity in effector motor pathways.
    • LMAN = presumed homolog of cortex that receives basal ganglia outflow. Blockade of outflow from LMAN to RA creates stereotyped singing.
  • To see accurately what is happening, it's necessary to record simultaneously, or in close temporal contiguity, striatal and cortical neurons during learning.
    • Pasupathy and biller showed that changes occur in the striatum than cortex during learning.
  • She cites lots of papers -- there has been a good bit of work on this, and the theories are coming together. I should be careful not to dismiss or negatively weight things.
  • Person and Perkel [48] reports that in songbirds, the analogous GPi to thalamus pathway induces IPSPs as well as rebound spikes with highly selective timing.
  • Reference Levenesque and Parent PMID-16087877 who find elaborate column-like arrays of striatonigral terminations in the SNr, not in the dopamine-containing SNpc.

hide / edit[2] / print
ref: -0 tags: dopamine reinforcement learning funneling reduction basal ganglia striatum DBS date: 02-28-2012 01:29 gmt revision:2 [1] [0] [head]

PMID-15242667 Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia

  • Major attributes of the BG:
    • Numerical reduction in the number of neurons across layers of the 'feed forward' (wrong!) network,
    • lateral inhibitory connections within the layers
    • modulatory effects of dopamine and acetylcholine.
  • Stochastic decision making task in monkeys.
  • Dopamine and ACh deliver different messages. DA much more specific.
  • Output nuclei of BG show uncorrelated activity.
    • THey see this as a means of compression -- more likely it is a training signal.
  • Striatum:
    • each striatal projection neuron receives 5300 cortico-striatal synapses; the dendritic fields of same contains 4e5 axons.
    • Say that a typical striatal neuron is spherical (?).
    • Striatal dendritic tree is very dense, whereas pallidal dendritic tree is sparse, with 4 main and 13 tips.
    • A striatal axon provides 240 synapses in the pallidum and makes 10 contacts with one pallidal neuron on average.
  • I don't necessarily disagree with the information-compression hypothesis, but I don't disagree either.
    • Learning seems a more likely hypothesis; could be that we fail to see many effects due to the transient nature of the signals, but I cannot do a thorough literature search on this.

PMID-15233923 Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons.

  • Same task as above.
  • both ACh (putatively, TANs in this study) and DA neurons respond to reward related events.
  • dopamine neurons' response reflects mismatch between expectation and outcome in the positive domain
  • TANs are invariant to reward predictability.
  • TANs are synchronized; most DA neurons are not.
  • Striatum displays the densest staining in the CNS for dopamine (Lavoie et al 1989) and ACh (Holt et al 1997)
    • Depression of striatal acetylcholine can be used to treat PD (Pisani et al 2003).
    • Might be a DA/ ACh balance problem (Barbeau 1962).
  • Deficit of either DA or ACh has been shown to disrupt reward-related learning processes. (Kitabatake et al 2003, Matsumoto 1999, Knowlton et al 1996).
  • Upon reward, dopaminergic neurons increase firing rate, whereas ACh neurons pause.
  • Primates show overshoot -- for a probabalistic relative reward, they saturate anything above 0.8 probability to 1. Rats and pigeons do not show this effect (figure 2 F).

hide / edit[6] / print
ref: Heimer-2006.01 tags: STN DBS synchrony basal ganglia reinforcement learning beta date: 02-22-2012 17:07 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-17017503[0] Synchronizing activity of basal ganglia and pathophysiology of Parkinson's disease.

  • They worry that increased synchrony may be an epi-phenomena of tremor or independent oscillations with similar frequency.
  • Modeling using actor/critic models of the BG.
  • Dopamine depletion, as in PD, resultis in correlated pallidal activity, and reduced information capacity.
  • Other studies have found that DBS desynchronizes activity -- [1] or [2].
  • Biochemical and metabolic studies show that GPe activity does not change in Parkinsonism.
  • Pallidal neurons in normal monkeys do not show correlated discharge (Raz et al 2000, Bar-Gad et al 2003a).
  • Reinforcement driven dimensionality reduction (RDDR) (Bar-Gad et al 2003b).
  • DA activity, through action on D1 and D2 receptors on the 2 different types of MSN, affects the temporal difference learning scheme in which DA represents the difference between expectation and reality.
    • These neurons have a static 5-10 Hz firing rate, which can be modulated up or down. (Morris et al 2004).
  • "The model suggests that the chronic dopamine depletion in the striatum of PD patients is perceived as encoding a continuous state where reality is worse than predictions." Interesting theory.
    • Alternately, abnormal DA replacement leads to random organization of the cortico-striatal network, eventually leading to dyskinesia.
  • Recent human studies have found oscillatory neuronal correlation only in tremulous patients and raised the hypothesis that increased neuronal synchronization in parkinsonism is an epi-phenomenon of the tremor of independent oscillators with the same frequency (Levy et al 2000).
    • Hum. might be.
  • In rhesus and green monkey PD models, a major fraction of the primate pallidal cells develop both oscillatory and non-oscillatory pair-wise correlation
  • Our theoretical analysis of coherence functions revealed that small changes between oscillation frequencies results in non-significant coherence in recording sessions longer than 10 minutes.
  • Their theory: current DBS methods overcome this probably by imposing a null spatio-temporal firing in the basal ganglia enabling the thalamo-cortical circuits to ignore and compensate for the problematic BG".


[0] Heimer G, Rivlin M, Israel Z, Bergman H, Synchronizing activity of basal ganglia and pathophysiology of Parkinson's disease.J Neural Transm Suppl no Volume :70, 17-20 (2006)
[1] Kühn AA, Williams D, Kupsch A, Limousin P, Hariz M, Schneider GH, Yarrow K, Brown P, Event-related beta desynchronization in human subthalamic nucleus correlates with motor performance.Brain 127:Pt 4, 735-46 (2004 Apr)
[2] Goldberg JA, Boraud T, Maraton S, Haber SN, Vaadia E, Bergman H, Enhanced synchrony among primary motor cortex neurons in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine primate model of Parkinson's disease.J Neurosci 22:11, 4639-53 (2002 Jun 1)

hide / edit[3] / print
ref: Turner-2010.12 tags: STN DBS basal ganglia motor learning vigor scaling review date: 02-16-2012 21:27 gmt revision:3 [2] [1] [0] [head]

PMID-20850966[0] Basal ganglia contributions to motor control: a vigorous tutor.

  • Using single-cell recording and inactivation protocols these studies provide consistent support for two hypotheses: the BG modulates movement performance ('vigor') according to motivational factors (i.e. context-specific cost/reward functions) and the BG contributes to motor learning.
  • Most BG associated clinical conditions involve some form of striatal dysfunction -- clincal sings occur when the prinicpal input nucleus of the BG network is affected.
    • Lesions of the output nuclei are typically subtle, consistent that pallidotomy is an effective treatment for PD and dystonia.
    • It is better to block BG output completely than pervert the normal operations of motor areas that receive BG output.
    • Pathological firing patters degrade the ability of thalamic neurons to transmit information reliably.
      • Bad BG activity may block cortico-thalamic-cortico communication.
      • Hence BG treatment does not reflect negative images of normal function.
  • Years of debate have been resolved by a confirmation that the direct and indirect pathways originate from biochamically distinct and morphologically disctinct types of projection neurons [97, 105].
    • Direct: D1; indirect = D2, GPe.
  • CMPf projects back to the striatuim.
  • Movement representation in the BG: ref [36]
  • Results of GPi inactivation:
    • RT are not lengthened. These results are not consistent with the idea that the BG contributes to the selection or initiation of movement.
    • GPi inactivation does not perturb on-line error correction process or the generation of discrete corrective submovements.
      • Rapid and-path corrections are preserved in PD.
      • Challenges the idea that the BG mediates on-line correction of motor error.
    • GPi inactivation does not affect the execution of overlearned or externally cued sequences of movements.
      • contradicts claims, based on neuroimaging and clinical evidence, that the BG is involved in the long term storage of overlearned motor sequences or the ability to string together successive motor acts.
    • GPi inactivation reduces movement velocity and acceleration.
      • Very consistent finding.
      • Mirrors the bradykinesia observed in PD.
      • Common side-effect of DBS of the GPi for dystonia.
    • GPI inactivation produces marked hypometria -- unsershooting of the desired movement extent.
      • Un accompanied by changes in movement linearity or directional accuracy.
  • Conclusion: impaired gain.
    • Movement: bradykinesia and hypometria
    • hand-writing: micrographia
    • speech: hyophonia [65].
    • There is a line of evidence suggesting that movement gain is controlled independently of movement direction.
    • Motor cost terms, which scale with velocity, may link and animals' previous experience with the cost/benefit contingencies of a task [75] to its current allocation of energy to meet the demands of a specific task.
      • This is consistent with monkey rapid fatiguing following BG lesion.
      • Schmidt et al [5] showed that patients with lilateral esions of the putamen or pallidum are able to control grip forces normally in response to explicit sensory instructions, but do not increase grip force spontaneously despite full understanding that higher forces will earn more money.
    • Sensory cuse and curgent conditions increase movement speed equally in healthy subjects and PD patients.
  • BG and learning:
    • role in dopamine-mediated learning is uncontroversial and supported by a vast literature [10,14,87].
    • Seems to be involved in reward-driven acquisition, but not long-term retention or recall of well-learned motor skills.
    • Single unit recording studies have demonstrated major changes in the BG of animals as they learn procedural tasks. [88-90]
      • Learning occurs earlier in the striatum than cortex [89,90].
    • One of the sequelae associated with pallidotomy is an impaired ability to learn new motor sequences [22 92] and arbitrary stimulus-response associations [93].
    • BG is the tutor, cortex is the storage.


[0] Turner RS, Desmurget M, Basal ganglia contributions to motor control: a vigorous tutor.Curr Opin Neurobiol 20:6, 704-16 (2010 Dec)

hide / edit[1] / print
ref: Zaghloul-2009.03 tags: DBS STN reinforcement learning humans unexpected reward Baltuch date: 01-26-2012 18:19 gmt revision:1 [0] [head]

PMID-19286561[0] Human Substantia Nigra Neurons Encode Unexpected Financial Rewards

  • direct, concise.
  • 15 neurons in 11 patients -- we have far more!


[0] Zaghloul KA, Blanco JA, Weidemann CT, McGill K, Jaggi JL, Baltuch GH, Kahana MJ, Human substantia nigra neurons encode unexpected financial rewards.Science 323:5920, 1496-9 (2009 Mar 13)

hide / edit[5] / print
ref: Frank-2007.11 tags: horses PD STN DBS levodopa decision learning science date: 01-25-2012 00:50 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-17962524[0] Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism.

  • While on DBS, patients actually sped up their decisions under high-conflict conditions. Wow!
    • This impulsivity was not effected by dopaminergic medication status.
    • Impulsivity may be the cognitive equivalent of excess grip force {88}.
  • Mathematical models of decision making suggest that individuals only execute a choice once the 'evidence' in its favor crosses a critical decision threshold.
    • people can adjust decision thresholds to meet current task demands
    • One theory is that the STN modulates decision thresholds (6) and delays decision-making when faced with a conflict. Wanted to test this in a conflict situation.
    • Record from the STN in conflict task to see ??
  • Second wanted to test negative learning.
    • Dopamine replacement therapy impairs patient's ability to learn from the negative outcomes of their decisions (11 - 13), which may account for pathological gambling behavior (14).
    • PD patients did indeed score worse on avoidance, slightly less accurate on AB choice, and about the same for the rest.
  • Made a network model.
    • Found that preSMA and STN coactivation is associated with slowed reaction times under decision conflict (25).
    • And that STN-DBS reduces coupling between cingulate and basal ganglia output (27).
    • Their model they either lesioned STN or overloaded it with high frequency regular firing.
      • either one showed the same faster response in high-conflict decisions.
  • STN dysfunction does not lead to impulsivity in all behavioral situations.
    • STN lesioned rats show enhanced preference for choices that lead to large delayed rewards compared to those that yield small immediate rewards (32,33). (This is not conflict, though -- rather reward -- but nonetheless illuminating)
  • Dopaminergic medication, by tonically elevating dopamine levels and stimulating D2 receptors, prevents learning from negative decision outcomes (11, 13, 18). Hence pathological gambling behavior (14).
  • Other studies show DBS-induced impairments in cognitive control (27 PMID-17119543, 36 PMID-15079009).


[0] Frank MJ, Samanta J, Moustafa AA, Sherman SJ, Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism.Science 318:5854, 1309-12 (2007 Nov 23)

hide / edit[2] / print
ref: Lehericy-2005.08 tags: fMRI motor_learning basal_ganglia STN subthalamic date: 01-25-2012 00:20 gmt revision:2 [1] [0] [head]

PMID-16107540[0] Distinct basal ganglia territories are engaged in early and advanced motor sequence learning

  • generally a broad, well-referenced study.
  • they used a really high-field magnet (3T) during tapping-learning task over the course of a month.
  • STN was activated early in motor learning, but not afterward, specifically the sequence learning
  • during the course of learning (an as the task became progressively more automatic) associative striatal activation shifted to motor activity.
    • STN could act by inhibiting competing motor outputs, thus building a temporally ordered sequence of movements.
  • SN was active throughout the course of the experiment.
  • during the 'fast learning' stage, there was transient activation of the ACC
  • also during the beginning portion of motor learning lobules V and VI of the cerebellum were activated.
  • rostral premotor and prefrontal cortical areas are connected to the associative territory of the striatum, which projects back to the frontal cortex the VA/VL nuclei of the thalamus.


[0] Lehéricy S, Benali H, Van de Moortele PF, Pélégrini-Issac M, Waechter T, Ugurbil K, Doyon J, Distinct basal ganglia territories are engaged in early and advanced motor sequence learning.Proc Natl Acad Sci U S A 102:35, 12566-71 (2005 Aug 30)

hide / edit[1] / print
ref: BAdi-2009.09 tags: dopamine L-Dopa levodopa agonist young reward novelty punisment learning date: 01-24-2012 04:05 gmt revision:1 [0] [head]

PMID-19416950[0] Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients

  • dopamine agonist administration in young patients with Parkinson's disease resulted in increased novelty seeking, enhanced reward processing, and decreased punishment processing may shed light on the cognitive and personality bases of the impulse control disorders, which arise as side-effects of dopamine agonist therapy in some Parkinson's disease patients.


[0] Bódi N, Kéri S, Nagy H, Moustafa A, Myers CE, Daw N, Dibó G, Takáts A, Bereczki D, Gluck MA, Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients.Brain 132:Pt 9, 2385-95 (2009 Sep)

hide / edit[2] / print
ref: Parush-2011.01 tags: basal ganglia reinforcement learning hypothesis frontiers israel date: 01-24-2012 04:05 gmt revision:2 [1] [0] [head]

PMID-21603228[0] Dopaminergic Balance between Reward Maximization and Policy Complexity.

  • model complexity discounting is an implicit thing.
    • the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy
  • In order for this to work:
    • dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost.


[0] Parush N, Tishby N, Bergman H, Dopaminergic Balance between Reward Maximization and Policy Complexity.Front Syst Neurosci 5no Issue 22 (2011)

hide / edit[3] / print
ref: BarGad-2003.12 tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus date: 01-16-2012 19:18 gmt revision:3 [2] [1] [0] [head]

PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)

  • long paper! looks like they used latex.
  • they focus on a 'new model' for the basal ganglia: reinforcement driven dimensionality reduction (RDDR)
  • in order to make sense of the system - according to them - any model must ingore huge ammounts of information about the studied areas.
  • ventral striatum = nucelus accumbens!
  • striatum is broken into two, rough, parts: ventral and dorsal
    • dorsal striatum: the caudate and putamen are a part of the
    • ventral striatum: the nucelus accumbens, medial and ventral portions of the caudate and putamen, and striatal cells of the olifactory tubercle (!) and anterior perforated substance.
  • ~90 of neurons in the striatum are medium spiny neurons
    • dendrites fill 0.5mm^3
    • cells have up and down states.
      • the states are controlled by intrinsic connections
      • project to GPe GPi & SNr (primarily), using GABA.
  • 1-2% of neurons in the striatum are tonically active neurons (TANs)
    • use acetylcholine (among others)
    • fewer spines
    • more sensitive to input
    • TANs encode information relevant to reinforcement or incentive behavior


hide / edit[4] / print
ref: Ganguly-2009.07 tags: Ganguly Carmena 2009 stable neuroprosthetic BMI control learning kinarm date: 01-14-2012 21:07 gmt revision:4 [3] [2] [1] [0] [head]

PMID-19621062 Emergence of a stable cortical map for neuroprosthetic control.

  • Question: Are the neuronal adaptations evident in BMI control stable and stored like with skilled motor learning?
    • There is mixed evidence for stationary neuron -> behavior maps in motor cortex.
      • It remains unclear if the tuning relationship for M1 neurons are stable across time; if they are not stable, rather advanced adaptive algorithms will be required.
  • A stable representation did occur.
    • Small perturbations to the size of the neuronal ensemble or to the decoder could disrupt function.
    • Compare with {291} -- opposite result?
    • A second map could be learned after primary map was consolidated.
  • Used a Kinarm + Plexon, as usual.
    • Regressed linear decoder (Wiener filter) to shoulder and elbow angle.
  • Assessed waveform stability with PCA (+ amplitude) and ISI distribution (KS test).
  • Learning occurred over the course of 19 days; after about 8 days performance reached an asymptote.
    • Brain control trajectory to target became stereotyped over the course of training.
      • Stereotyped and curved -- they propose a balance of time to reach target and effort to enforce certain firing rate profiles.
    • Performance was good even at the beginning of a day -- hence motor maps could be recalled.
  • By analyzing neuron firing wrt idealized movement to target, the relationship between neuron & movement proved to be stable.
  • Tested to see if all neurons were required for accurate control by generating an online neuron dropping curve, in which a random # of units were omitted from the decoder.
    • Removal of 3 neurons (of 10 - 15) resulted in > 50% drop in accuracy.
  • Tried a shuffled decoder as well: this too could be learned in 3-8 days.
    • Shuffling was applied by permuting the neurons-to-lags mapping. Eg. the timecourse of the lags was not changed.
  • Also tried retraining the decoder (using manual control on a new day) -- performance dropped, then rapidly recovered when the original fixed decoder was reinstated.
    • This suggests that small but significant changes in the model weights (they do not analyze what) are sufficient for preventing an established cortical map from being transformed to a reliable control signal.
  • A fair bit of effort was put into making & correcting tuning curves, which is problematic as these are mostly determined by the decoder
    • Better idea would be to analyze the variance / noise properties wrt cursor trajectory?
  • Performance was about the same for smaller (10-15) and larger (41) unit ensembles.

hide / edit[5] / print
ref: Carmena-2003.11 tags: Carmena nicolelis BMI learning 2003 date: 01-08-2012 18:53 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-14624244[0] Learning to control a brain-machine interface for reaching and grasping by primates.

  • strong focus on learning & reorganization.
  • Jose's first main paper.
  • focuses on two engineering / scientific questions: what signal to use, and how much of it, and from where.
    • As for where, of course we suggest that the representation is distributed.
  • Quality of predictions: gripping force > hand velocity > hand position.
  • Showed silent EMGs during BMI control.
  • Put a robot in the feedback path; this ammounted for some nonlinearities + 60-90ms delay.
  • Predictions follow anatomical expectation:
    • M1 (33-56 cells) predicts 73% variance for hand pos, 66% velocity, 83% for gripping force .
    • SMA (16-19 cells) 51% position, 51% velocity, 19% gripping force.
    • They need a table for this shiz.
  • Relatively high-quality predictions. (When I initially looked at the data, I was frustrated with the noise!)
  • Learning was associated with increased contribution of single units.
    • appeared to be more 'learning' in SMA.
    • Training on a position model seemed to increase the ctx representation of hand position.
  • changes between pole control and brain control:
    • 68% of of sampled neurons showed reduced tuning in BCWOH
    • 14% no change
    • 18% enhanced tuning.
  • Directional tuning curves clustered in a band during brain control -- neurons clustering around the first PC?
    • All cortical areas tested showed increases in correlated firing -- arousal?
    • this puts some movements into the nullspace of the Wiener matrix. Or does it? should have had the monkey make stereotyped movements to dissociate movement directions.
  • Knocks {334} in that:
    • preferred directions were derived not from actual movements, but from firing rates during target appearance time windows.
    • tuning strength could have increased simple because the movements became straighter with practice.
  • From Fetz, {329}: Interestingly, the conversion parameters obtained for one set of trials provided increasingly poor predictions of future responses, indicating a source of drift over tens of minutes in the open-loop condition. This problem was alleviated when the monkeys observed the consequences of their neural activity in ‘real time’ and could optimize cell activity to achieve the desired goal under ‘closed-loop’ conditions.


hide / edit[1] / print
ref: Wyler-1980.05 tags: operant control motor learning interspike intervals ISI Wyler Lange Neafsey Robbins date: 01-07-2012 21:46 gmt revision:1 [0] [head]

PMID-6769536[0] Operant control of precentral neurons: Control of modal interspike intervals

  • Question: can monkeys control the ISI of operantly controlled neurons?
    • Answer: Seems they cannot. Operant and overt movement cells have about the same ISI, and this cannot be changed by conditioning.
  • Task requires a change from tonic to phasic firing, hence they call it "Differential reinforcement of Tonic Patterns".
    • That is, the monkey is trained to produce spikes within a certain ISI window.
    • PDP8 control, applesauce feedback.
    • modal ISI, in this case, means mode (vs. mean and median) of the ISI.
  • Interesting: "It was not uncommon for a neuron to display bi- or trimodal ISI distributions when the monkey was engaged in a movement unrelated to a unit's firing"
  • For 80% of the units, the more tightly a neuron's firing was related to a specific movement, the more gaussian its ISI became.
  • As the monkey gained control over reinforced units, the ISI became more gaussian.
  • Figure 2: monkey was not able to significantly change the modal ISI.
    • Monkeys instead seem to succeed at the task by decreasing the dispersion of the ISI distribution and increasing the occurrence of the modal ISI.
  • Monkeys mediate response through proprioceptive feedback:
    • Cervical spinal cord sectioning decreases the fidelity of control.
    • When contralateral C5-7 ventral roots were sectioned, PTN responsive to passive arm movements could not be statistically controlled.
    • Thus, monkeys operantly control precentral neurons through peripheral movements, perhaps even small and isometric contractions.
  • Excellent paper. Insightful conclusions.


[0] Wyler AR, Lange SC, Neafsey EJ, Robbins CA, Operant control of precentral neurons: control of modal interspike intervals.Brain Res 190:1, 29-38 (1980 May 19)

hide / edit[2] / print
ref: Fetz-1973.03 tags: operant conditioning Fetz Baker learning BMI date: 01-07-2012 19:34 gmt revision:2 [1] [0] [head]

PMID-4196269[0] Operantly conditioned patterns on precentral unit activity and correlated responses in adjacent cells and contralateral muscles

  • Looked at an operant task through the opposite direction: as a means for looking at reaction time, and muscle responses to trained bursts of activity.
  • recorded from precentral gyrus cells in leg and arm representation.
    • isonel coated tungsten microwires, with great apparent waveform records.
  • also recorded EMG, nylon-insuldated stainless-steel wire, led subcutaneuosly to the head connector.
  • references an even older study concerning the operant conditioning of neural activity in rats by Olds.
  • really simple technology - RC filter to estimate the rate; reward high rate; resets on reward.
    • the evoked operant bursts are undoubtably due to training.
  • looks like it was easy for the monkeys to increase the firing rate of their cortical cells (of course, I'm just skimming the article..)
  • 233 precentral units.
    • which they did some preliminary somatotopic mapping of.
  • neighboring cells mirrored the firing rate changes (logical as they share the local circuitry)
  • in a few sessions the operant bursts were not associated with movements.
  • Could individually condition cells when they happened to record 2 units on the same electrode.


hide / edit[1] / print
ref: tlh24-2011 tags: motor learning models BMI date: 01-06-2012 00:19 gmt revision:1 [0] [head]

Experiment: you have a key. You want that key to learn to control a BMI, but you do not want the BMI to learn how the key does things, as

  1. That is not applicable for when you don't have training data - amputees, parapalegics.
  2. That does not tell much about motor learning, which is what we are interested in.

Given this, I propose a very simple groupweight: one axis is controlled by the summed action of a certain population of neurons, the other by a second, disjoint, population; a third population serves as control. The task of the key is to figure out what does what: how does the firing of a given unit translate to movement (forward model). Then the task during actual behavior is to invert this: given movement end, what sequence of firings should be generated? I assume, for now, that the brain has inbuilt mechanisms for inverting models (not that it isn't incredibly interesting -- and I'll venture a guess that it's related to replay, perhaps backwards replay of events). This leaves us with the task of inferring the tool-model from behavior, a task that can be done now with our modern (though here-mentioned quite simple) machine learning algorithms. Specifically, it can be done through supervised learning: we know the input (neural firing rates) and the output (cursor motion), and need to learn the transform between them. I can think of many ways of doing this on a computer:

  1. Linear regression -- This is obvious given the problem statement and knowledge that the model is inherently linear and separable (no multiplication factors between the input vectors). n matlab, you'd just do mldivide (backslash opeartor) -- but but! this requires storing all behavior to date. Does the brain do this? I doubt it, but this model, for a linear BMI, is optimal. (You could extend it to be Bayesian if you want confidence intervals -- but this won't make it faster).
  2. Gradient descent -- During online performance, you (or the brain) adjusts the estimates of the weights per neuron to minimize error between observed behavior and estimated behavior (the estimated behavior would constitute a forward model..) This is just LMS; it works, but has a exponential convergence and may get stuck in local minima. This model will make predictions on which neurons change relevance in the behavior (more needed for acquiring reward) based on continuous-time updates.
  3. Batched Gradient descent -- Hypothetically, one could bolster the learning rate by running batches of data multiple times through a gradient descent algorithm. The brain very well could offline (sleep), and we can observe this. Such a mechanism would improve performance after sleep, which has been observed behaviorally in people (and primates?).
  4. Gated Gradient Descent -- This is halfway between reinforcement learning and gradient descent. Basically, the brain only updates weights when something of motivational / sensory salience occurs, e.g. juice reward. It differs from raw reinforcement learning in that there is still multiplication between sensory and motor data + subsequent derivative.
  5. Reinforcement learning -- Neurons are 'rewarded' at the instant juice is delivered; they adjust their behavior based on behavioral context (a target), which presumably (given how long we train our keys), is present in the brain at the same time the cursor enters the target. Sensory data and model-building are largely absent.

{i need to think more about model-building, model inversion, and songbird learning?}

hide / edit[1] / print
ref: Olson-2005 tags: Arizona rats BMI motor control training SVM single-unit left right closed-loop learning Olson Arizona date: 01-03-2012 23:06 gmt revision:1 [0] [head]

bibtex:Olson-2005 Evidence of a mechanism of neural adaptation in the closed loop control of directions

  • from abstract:
    • Trained rats to press left/right paddles to center a LED. e.g. paddles were arrow keys, LED was the cursor, which had to be centered. Smart rats.
      • Experiment & data from Olson 2005
    • Then trained a SVM to discriminate left/right from 2-10 motor units.
    • Once closed-loop BMI was established, monitored changes in the firing properties of the recorded neurons, specifically wrt the continually(?) re-adapted decoding SVM.
    • "but expect that the patients who use the devices will adapt to the devices using single neuron modulation changes. " --v. interesting!
  • First page of article has an excellent review back to Fetz and Schmidt. e.g. {303}
  • Excellent review of history altogether.
    • Notable is their interpretation of Sanchez 2004 {259}, who showed that most of the significant modulations are from a small group of neurons, not the large (up to 320 electrodes) populations that were actually recorded. Carmena 2003 showed that the population as a whole tended to group tuning, although this was imperfectly controlled.
  • Also reviewed: Zacksenhouse 2007 {901}
  • SVM is particularly interesting as a decoding algorithm as it weights the input vectors in projecting onto a decision boundary; these weights are experimentally informative.
  • Figure 7: The brain seems to modulate individual firing rate changes to move away from the decision boundary, or at least to minimize overlap.
  • For non-overt movements, the distance from decision function was greater than for overt movements.
  • Rho ( ρ ) is the Mann-Whitney test statistic, which non-parametrically estimates the difference between two distributions.
  • δf(X t) is the gradient wrt the p input dimensions o9f the NAV, as defined with their gaussian kernel SVM.
  • They show (i guess) that changes in ρ are correlated with the gradient -- e.g. the brain focuses on neurons that increase fidelity of control?
    • But how does the brain figure this out??
  • Not sure if i fully understand their argument / support.
  • Conclusion comes early in the paper
    • figure 5 weakly supports the single-neuron modulation result.

hide / edit[1] / print
ref: -0 tags: reinforcement learning basis function policy specialization date: 01-03-2012 02:37 gmt revision:1 [0] [head]

To read:

hide / edit[5] / print
ref: Shulgina-1986.09 tags: reinforcement learning review date: 01-03-2012 02:31 gmt revision:5 [4] [3] [2] [1] [0] [head]

Reinforcement learning in the cortex (a web scour/crawl):

  • http://www.springerlink.com/content/v211201413228x34/
    • short/long interspike intervals via pain reinforcement in immobilized rabbits.
  • PMID-3748636 Increased regularity of activity of cortical neurons in learning due to disinhibitory effect of reinforcement.
    • more rabbit shocking.
  • http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6T0F-3S1PT00-P
    • applied glutamate & noradrenaline; both responses are complex.
  • Reinforcement learning in populations of spiking neurons
    • the result: reinforcement learning can function effectively in large populations of neurons if there is a trace of the population activity in addition to the reinforcement signal. this trace must be per-synapes or perhaps per-neuron (as has been anticipated for some time). very important result, helps with the 'specificity' problem.
    • in human terms, the standard reinforcement learning approach is analogous to having a class of students write an exam and being informed by the teacher on the next day whether the majority of students passed or not.
    • this learning method is slow and achieves limited fidelity; in contrast, behavioral reinforcement learning can be reliable and fast. (perhaps this is a result of already-existing maps and or activity in the cortex?)
    • reinforcement learning is almost the opposite of backpropagation, in that in backprop, a error signal is computed per neuron, while in reinforcement learning the error is only computed for the entire system. They posit that there must be a middle ground (need something less than one neuron to compute the training/error signal per neuron, othewise the system would not be very efficient...)
    • points out a good if obvious point: to learn from trial and error different responses to a given stimulus must be explored, and, for this, randomness in the neural activities provides a convenient mechanism.
    • they use the running mean as an eligibility trace per synapse. then change in weight = eta * eligibility trace(t), evaluated at the ends of trials.
    • implemented an asymmetric rule that updates the synapses only slightly if the output is reliable and correct.
    • also needed a population signal or fed-back version of the previous neural behavior. Then individual reinforcement is a product of the reinforcement signal * the population signal * the eligibility trace (the last per synapse). Roughly, if the population signal is different than the eligability trace, and the behavior is wrong, then that synapse should be reinforced. and vice-versa.
  • PMID-17444757 Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.
    • seems to give about the same result as above, except with STDP: reinforcement-modulated STDP with an eligibility trace stored at each synapse permits learning even if a reward signal is delayed.
    • network can learn XOR problem with firing-rate or temporally coded input.
    • they want someone to look for reward-moduled STDP. paper came out June 2007.
  • PMID: Metaplasticity: the plasticity of synaptic plasticity (1996, Mark Bear)
    • there is such thing as metaplasticity! (plasticity of plasticity, or control over how effective NMDAR are..)
    • he has several other papers on this topic after this..
  • PMID-2682404 Reward or reinforcement: what's the difference? (1989)
    • reward = certain environmental stimuli have the effect of eliciting approach responses. ventral striatum / nucleus accumbens is instrumental for this.
    • reinforcement = the tendency of certain stimuli to strengthen stimulus-response tendencies. dorsolateral striatum is used here.
  • PMID-9463469 Rapid plasticity of human cortical movement representation induced by practice.
  • used TMS to evoke isolated and directionally consistent thumb movements.
  • then asked the volunteers to practice moving their thumbs in an opposite direction
  • after 5-30 minutes of practice, then TMS evoked a response in the practiced direction. wow! this may be short-term memory or the first step in skill learning.
  • PMID-12736341 Learning input correlations through nonlinear temporally asymmetric Hebbian plasticity.
    • temporally asymmetric plasticity is apparently required for a stable network (aka no epilepsy?), and can be optimized to represent the temporal structure of input correlations.

hide / edit[3] / print
ref: bookmark-0 tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning date: 12-31-2011 19:30 gmt revision:3 [2] [1] [0] [head]

hunch.net interesting posts:

  • debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid
    • quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.
  • automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.
  • The design of an optimal research environment
    • Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
  • More recently: http://hunch.net/?p=2016
    • Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.
  • The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.
    • The motivation (money!) is just not there.

hide / edit[3] / print
ref: Atallah-2007.01 tags: striatum skill motor learning VTA substantia nigra basal ganglia reinforcement learning date: 12-31-2011 18:59 gmt revision:3 [2] [1] [0] [head]

PMID-17187065[0] Separate neural substrates for skill learning and performance in the ventral and dorsal striatum.

  • good paper. via SCLin's blog. slightly confusing anatomical terminology.
  • tested in rats, which has a anatomically different basal ganglia system than primates.
  • Rats had to choose which driection in a Y maze based on olfactory cues. Normal rats figure it out in 60 trials.
  • ventral striatum (nucleus accumbens here in rats) connects to the ventral prefrontal cortices (for example, the orbitofrontal cortex)
    • in primates, includes the medial caudate, which has been shown in fMRI to respond to reward prediction error. Neural activity in the caudate is attenuated when a monkey reaches optimal performance.
  • dorsal parts of the striatum (according to web: caudate, putamen, globus pallidus in primates) connect to the dorsal prefrontal and motor cortices
    • (according to them:) this corresponds to the putamen in primates. Activity in the putamen reflects performance but not learning.
    • activity in the putamen is highest after successful learning & accurate performance.
  • used muscimol (GABAa agonist, silences neural activity) and AP-5 (blocks NMDA based plasticity), in each of the target areas.
  • dorsal striatum is involved in performance but not learning
    • Injection of muscimol during acquisition did not impair test performance
    • Injection of muscimol during test phase did impair performance
    • Injection of AP-5 during acquisition had no effect.
    • in acquisition sessions, muscimol blocked instrumental response (performance); but muscimol only has a small effect when it was injected after rats perfected the task.
      • Idea: consistent behavior creates a stimulus-response association in extrastriatal brain areas, e.g. cerebral cortex. That is, the basal ganglia is the reinforcement signal, the cortex learns the association due to feedback-driven behavior? Not part of the habit system, but make and important contribution to goal-directed behavior.
      • This is consistent with the observation that behavior is initially goal driven but is later habitual.
    • Actually, other studies show that plasticity in the dorsal striatum may be detrimental to instrumental learning.
    • The number of neurons that fire just before the execution of a response is larger in the putamen than the caudate.
  • ventral striatum is involved in learning and performance.
    • Injection of AP-5 or muscimol during acquisition (learning behavior) impairs test performance.
    • Injection of AP-5 during test performance has no effect , but muscimol impairs performance.
  • Their data support an actor-director-critic architecture of the striatum:
    • Actor = dorsal striatum; involved in performance, but not in learning them.
    • Director = ventral striatum; quote "it somehow learns the relevant task demands and directs the dorsal striatum to perform the appropriate action plans, but, crucially, it does not train the dorsal striatum"
      • ventrai striatum acts through the orbitofrontal cortex that mantains representations of task-reward contingencies.
      • ventral striatum might also select action selection through it's projections to the substantia nigra.
    • Critic = dopaminergic inputs from the ventral tegmental area and substantia nigra.


hide / edit[6] / print
ref: Jackson-2006.11 tags: Fetz Andrew Jackson BMI motor learning microstimulation date: 12-16-2011 04:20 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-17057705 Long-term motor cortex plasticity induced by an electronic neural implant.

  • used an implanted neurochip.
  • record from site A in motor cortex (encodes movement A)
  • stimulate site B of motor cortex (encodes movement B)
  • after a few days of learning, stimulate A and generate mixure of AB then B-type movements.
  • changes only occurred when stimuli were delivered within 50ms of recorded spikes.
  • quantified with measurement of (to) radial/ulnar deviation and flexion/extension of the wrist.
  • stimulation in target (site B) was completely sub-threshold (40ua)
  • distance between recording and stimulation site did not matter.
  • they claim this is from Hebb's rule: if one neuron fires just before another (e.g. it contributes to the second's firing), then the connection between the two is strengthened. However, i originally thought this was because site A was controlling the betz cells in B, therefore for consistency A's map was modified to agree with its /function/.
  • repetitive high-frequency stimulation has been shown to expand movement representations in the motor cortex of rats (hmm.. interesting)
  • motor cortex is highly active in REM


hide / edit[1] / print
ref: Schultz-1998.07 tags: dopamine reward reinforcement_learning review date: 12-07-2011 04:16 gmt revision:1 [0] [head]

PMID-9658025[0] Predictive reward signal of dopamine neurons.

  • hot article.
  • reasons why midbrain Da is involved in reward: lesions, receptor blocking, electrical self-stimulation, and drugs of abuse.
  • DA neurons show phasic response to both primary reward and reward-predicting stimul.
  • 'All responses to rewards and reward-predicting stimuli depend on event predictability.
  • Just think of the MFB work with the rats... and how powerful it is.
  • most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal (e.g. parkinsons, huntingtons) -> implying that DA has two uses: the labeling of reward, that the tonic enabling of postsynaptic neurons.
    • I just anticipated this, which is good :)
    • It is still a mystery how the neurons in the midbrain determine to fire - the pathways between reward and behavior must be very carefully segregated, otherwise we would be able to self-simulate
      • the pure expectation part of it is bound play a part in this - if we know that a certain event will be rewarding, then the expectation will diminish DA release.
  • predictive eye movements amerliorate behavioral perfromance through advance focusing. (interesting)
  • predictions are used in industry:
    • Internal Model Control is used in industry to predict future system states before they actually occur. for example, the fly-by-wire technique in aviation makes decisions to do particular manuvers based on predictable forthcoming states of the plane. (Like a human)
  • if you learn a reaction/reflex based on a conditioned stimulus, the presentation of that stimulus sets the internal state to that motivated to achieve the primary reward. there is a transfer back in time, which, generally, is what neural systems are for.
  • animals avoid foods that fail to influence important plasma/brain parameters, for example foods lacking essential amino acids like histidine, threonine, or methionine. In the case of food, the appearance/structure would be used to predict the slower plasma effects, and hence influence motivation to eat it. (of course!)
  • midbrain groups:
    • A8 = dorsal to lateral substantia nigra
    • A9 = pars compacta of substantia nigra, SNc
    • A10 = VTA, media to substantia nigra.
  • The characteristic polyphasic, relatively long impulses discharged at low frequencies make dpamine neurons easily distinguishable from other midbrain neurons.


[0] Schultz W, Predictive reward signal of dopamine neurons.J Neurophysiol 80:1, 1-27 (1998 Jul)

hide / edit[1] / print
ref: Vijayakumar-2005.12 tags: schaal motor learning LWPL PLS partial least sqares date: 12-07-2011 04:09 gmt revision:1 [0] [head]

PMID-16212764[0] Incremental online learning in high dimensions


  • use locally linear models.
  • use a small number of regressions in selected dimensions of input space in the spirit of partial least squares regression. (like partial least-squares) hence, can operate in very high dimensions.
  • function to be approximated has locally low-dimensional structure, which holds for most real-world data.
  • use: the learning of of value functions, policies, and models for learning control in high-dimensional systems (like complex robots or humans).
  • important distinction between function-approximation learning:
    • methods that fit nonlinear functions globally, possibly using input space expansions.
      • gaussian process regression
      • support vector machine regression
        • problem: requires the right kernel choice & basis vector choice.
      • variational bayes for mixture models
        • represents the conditional joint expectation, which is expensive to update. (though this is factored).
      • each above were designed for data analysis, not incremental data. (biology is incremental).
    • methods that fit simple models locally and segment the input space automatically.
      • problem: the curse of dimensionality: they require an exponential number of models for accurate approximation.
        • this is not such a problem if the function is locally low-dim, as mentioned above.
  • projection regression (PR) works via decomposing multivariate regressions into a superposition of single-variate regressions along a few axes of input space.
    • projection pursuit regression is a well-known and useful example.
    • sigmoidal neural networks can be viewed as a method of projection regression.
  • they want to use factor analysis, which assumes that the observed data is generated from a low-dimensional distribution with a limited number of latent variables related to the output via a transformation matrix + noise. (PCA/ wiener filter)
    • problem: the factor analysis must represent all high-variance dimensions in the data, even if it is irrelevant for the output.
    • solution: use joint input and output space projection to avoid elimination of regression-important dimensions.
  • practical details: they use the LPWR algorithm to model the inverse dynamics of their 7DOF hydraulically-actuated gripper arm. That is, they applied random torques while recording the resulting accelerations, velocities, and angles, then fit a function to predict torques from these variables. The robot was compliant and not very well modeled with a rigid body model, though they tried this. The resulting LPWR generated model was 27 to 7, predicted torques. The control system uses this functional approximation to compute torques from desired trajectories, i think. The desired trajectories are generated using spline-smoothing ?? and the control system is adaptive in addition to the LPWR approximation being adaptive.
  • The core of the LPWR is partial-least squares regression / progression pursuit, coupled with gaussian kernels and a distance metric (just a matrix) learned via constrained gradient descent with cross-validation. The partial least squares (PLS) appears to be very popular in many fields, and there are an number of ways of computing it. Distance metric can expand without limit, and overlap freely. Local models are added based on MSE, i think, and model adding stops when the space is well covered.
  • I think this technique is very powerful - you separate the the function evaluation from the error minimization, to avoid the problem of ambiguous causes. Instead, when applying the LPWR to the robot, the torques cause the angles and accelerations -> but you invert this relationship: want to control the torques given trajectory. Of course, the whole function approximation is stationary in time - the p/v/a is sufficient to describe the state and the required torques. Does the brain work in the same way? do random things, observe consequences, work in consequence space and invert ?? e.g. i contracted my bicep and it caused my hand to move to the face; now I want my hand to move to my face again, what caused that? Need reverse memory... or something. Hmm. let's go back to conditional learning: if any animal does an action, and subsequently it is rewarded, it will do that action again. if this is conditional on a need, then that action will be performed only when needed.. when habitual, the action will be performed no matter what.. this is the nature of all animals, i think, and corresponds to rienforcement learning? but how? I suppose it's all about memory, and assigning credit where credit is due. the same problem is dealt with rienforcement learning. and yet things like motor learning seem so far out of this paradigm - they are goal-directed and minimize some sort of error. eh, not really. Clementine is operating on the conditioned response now - has little in the way of error. but gradually this will be built; with humans, it is built very quickly by reuse of existing modes. or conciousness.
  • back to the beginning: you dont have to regress into output space - you regress into sensory space, and do as much as possible in that sensory space for control. this is very powerful, and the ISO learning people (Porr et al) have effectively discovered this: you minimize in sensory space.
    • does this abrogate the need for backprop? we are continually causality-inverting machines; we are prredictive.


[0] Vijayakumar S, D'Souza A, Schaal S, Incremental online learning in high dimensions.Neural Comput 17:12, 2602-34 (2005 Dec)

hide / edit[1] / print
ref: Fletcher-2005.07 tags: explicit implicit learning fMRI frontal_cortex MT date: 12-07-2011 03:58 gmt revision:1 [0] [head]

PMID-15537672[0] On the Benefits of not Trying: Brain Activity and Connectivity Reflecting the Interactions of Explicit and Implicit Sequence Learning

quote: ünder certain curcumstances, automatic learning may be attenuated by explicit memory processes" : expicit attemps to learn a difficult sequence (compared to a control) produces a failure in implicit learning, and this failure is caused by the supression of learning rather than the expression. There is a deleterious effect of explicit search on implicit learning.

  • implicit learning is hampered by explicit search.
  • Compare this to the known benefits of coginive effort on motor learning ... (?)


[0] Fletcher PC, Zafiris O, Frith CD, Honey RA, Corlett PR, Zilles K, Fink GR, On the benefits of not trying: brain activity and connectivity reflecting the interactions of explicit and implicit sequence learning.Cereb Cortex 15:7, 1002-15 (2005 Jul)

hide / edit[1] / print
ref: Gandolfo-1996.04 tags: learning approximation kernel field Bizzi Gandolfo date: 12-07-2011 03:40 gmt revision:1 [0] [head]

Motor learning by field approximation.

  • PMID-8632977[0]
    • studied the generalization properties of force compensation in humans.
    • learning to compensate only occurs in regions of space where the subject actually experianced the force.
    • they posit that the CNS builds an internal model of the external world in order to predict and compensate for it. what a friggn surprise! eh well.
  • PMID-1472573[1] Vector field approximation: a computational paradigm for motor control and learning
    • Recent experiments in the spinalized frog (Bizzi et al. 1991) have shown that focal microstimulation of a site in the premotor layers in the lumbar grey matter of the spinal cord results in a field of forces acting on the frog's ankle and converging to a single equilibrium position
    • they propose that the process of generating movements is the process of combining basis functions/fields. these feilds may be optimized based on making it easy to achieve goals/move in reasonable ways.
  • alternatly, these basis functions could make movements invariant under a number of output transformations. yes...


hide / edit[4] / print
ref: Loewenstein-2006.1 tags: reinforcement learning operant conditioning neural networks theory date: 12-07-2011 03:36 gmt revision:4 [3] [2] [1] [0] [head]

PMID-17008410[0] Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity

  • The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law.
  • We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching (alternative to reward) is a generic outcome of such plasticity
    • models for learning that are based on the covariance between reward and choice are common in economics and are used phenomologically to explain human behavior.
  • this model can be tested experimentally by making reward contingent not on the choices, but rather on the activity of neural activity.
  • Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.


hide / edit[2] / print
ref: Harris-2008.03 tags: retroaxonal retrosynaptic Harris learning cortex backprop date: 12-07-2011 02:34 gmt revision:2 [1] [0] [head]

PMID-18255165[0] Stability of the fittest: organizing learning through retroaxonal signals

  • the central hypothesis: strengthening of a neuron's output synapses stabilizes recent changes in the same neuron's inputs.
    • this causes representations (as are arrived at with backprop) that are tuned to task features.
  • Retroaxonal signaling in the brain is too slow for an instructive (says at least the sign of the error wrt a current neuron's output) backprop algorithm
  • hence, retroaxonal signals are not instructive but selective.
  • At SFN Harris was looking for people to test this in a model; as (yet) unmodeled and untested, I'm suspicious of it.
  • Seems plausible, yet it also just seems to be a way of moving the responsibility for learning computation to the postsynaptic neuron (which is then propagated back to the present neuron). The theory does not immediately suggest what neurons are doing to learn their stuff; rather how they may be learning.
    • If this stabilization is based on some sort of feedback (attention? reward?), which may guide learning (except for the cortex, which does not have many (any?) DA receptors...), then I may be more willing to accept it.
    • It seems likely that the cortex is doing a lot of unsupervised learning: predicting what sensory info will come next based on present sensory info (ICA, PCA).


[0] Harris KD, Stability of the fittest: organizing learning through retroaxonal signals.Trends Neurosci 31:3, 130-6 (2008 Mar)

hide / edit[6] / print
ref: BuzsAki-1996.04 tags: hippocampus neocortex theta gamma consolidation sleep Buzsaki review learning memory date: 12-07-2011 02:31 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-8670641[0] The hippocampo-neocortical dialogue.

  • the entorhinal ctx is bidirectionally conneted to nearly all areas of the neocortical mantle.
  • Buzsaki correctly predicts that information gathered during exploration is played back at a faster scale during synchronous population busts during (comnsummatory) behaviors.
  • looks like a good review of the hippocampus, but don't have time to read it now.
  • excellent explanation of the anatomy (with some omissions, click through to read the caption):
  • SPW = sharp waves, 40-120ms in duration. caused by synchronous firing in much of the cortex ; occur 0.02 - 3 times/sec in daily activity & during slow wave sleep.
    • BUzsaki thinks that this may be related to memory consolidation.
  • check the cited-by articles : http://cercor.oxfordjournals.org/cgi/content/abstract/6/2/8
[0] Buzsaiki G, The hippocampo-neocortical dialogue.Cereb Cortex 6:2, 81-92 (1996 Mar-Apr)

hide / edit[1] / print
ref: notes-0 tags: data effectiveness Norvig google statistics machine learning date: 12-06-2011 07:15 gmt revision:1 [0] [head]

The unreasonable effectiveness of data.

  • counterpoint to Eugene Wigner's "The Unreasonable effectiveness of mathematics in the natural sciences"
    • that is, math is not effective with people.
    • we should not look for elegant theories, rather embrace complexity and make use of extensive data. (google's mantra!!)
  • in 2006 google released a trillion-word corpus with all words up to 5 words long.
  • document translation and voice transcription are successful mostly because people need the services - there is demand.
    • Traditional natural language processing does not have such demand as of yet. Furthermore, it has required human-annotated data, which is expensive to produce.
  • simple models and a lot of data triumph more elaborate models based on less data.
    • for translation and any other application of ML to web data, n-gram models or linear classifiers work better than elaborate models that try to discover general rules.
  • much web data consists of individually rare but collectively frequent events.
  • because of a huge shared cognitive and cultural context, linguistic expression can be highly ambiguous and still often be understood correctly.
  • mention project halo - $10,000 per page of a chemistry textbook. (funded by DARPA)
  • ultimately suggest that there is so so much to explore now - just use unlabeled data with an unsupervised learning algorithm.

hide / edit[1] / print
ref: Wyler-1980.08 tags: Wyler Lange Robbins operant conditioning motor neurons contralateral bilateral specificity monkeys motor learning date: 12-06-2011 06:36 gmt revision:1 [0] [head]

PMID-6772272 Operant control of precentral neurons: bilateral single unit conditioning.

  • Used bilateral electrodes.
  • One neuron operantly conditioned, one not.
  • Switched the conditioned / controlled after performance was attained.
  • Evidence: neurons can be individually tuned, and operant control is not the result of spinal-level conditioning or change.
    • It is not the result of increased attention or increased muscle tone.
  • Simple question, simple paper.

hide / edit[1] / print
ref: Gandolfo-2000.02 tags: Gandolfo Bizzi dynamic environment force fields learning motor control MIT M1 date: 12-02-2011 00:10 gmt revision:1 [0] [head]

PMID-10681435 Cortical correlates of learning in monkey adapting to a new dynamical environment.

hide / edit[0] / print
ref: Burnod-1982.11 tags: operant conditioning motor control learning Burnod Maton Calvet date: 11-26-2011 02:22 gmt revision:0 [head]

PMID-7140894 Short-term changes in cell activity of areas 4 and 5 during operant conditioning.

  • Seems that layers 4 and 5 act differently during operant conditioning of a simple task.
  • Layer 5 neurons become tuned to reward (?)
  • Can't get this article, have to go from the abstract.

hide / edit[2] / print
ref: -0 tags: machine learning CMU slides tutorial date: 01-17-2011 05:05 gmt revision:2 [1] [0] [head]

http://www.autonlab.org/tutorials/ -- excellent

http://energyfirefox.blogspot.com/2010/12/data-mining-with-ubuntu.html -- apt-get!


hide / edit[2] / print
ref: -0 tags: video games education learning flow work date: 12-13-2010 04:31 gmt revision:2 [1] [0] [head]

Learning by Playing: Video Games in the Classroom

  • My initial reaction was very skeptical and critical:
    • Video games are pleasurable and addictive because they are not like real life; the problems (more accurately, puzzles) posed always have some solution, again unlike the real world.
    • A purpose of education is to convey both information about the world and strategies for understanding it / succeeding in it (or, perhaps more relevantly, strategies what not to do -- see iatrogenic science). The less the learning environment reflects the real world, the less the students learn.
      • Up to a point, of course - part of the role of education is to render hierarchical something that was arrived at very randomly and haphazardly so as to be easier to remember and use. The learning environment has to reflect the William-James-ish pragmatic balance between too simple and too complex.
  • Video games, I initially thought and still feel, reflect less of the real world and its attendant frustrations, hence are inferior for learning about said thing.
    • Upon further thought: perhaps the increase in engagement & flow more than compensates for decreased realism? By the end of the article, I was thinking this. Maybe we should just re-engineer our working environment so that all tasks can be re-framed as addictive, pleasureable games. We've been changing the environment forever, and have already gone a bit down this path, so why not? If such is to occur (as I anticipate it will), these kids will be well prepared.
    • The whole purpose of being here is to .. well, enjoy it .. if the kids like doing these things, and they are later equally able to lead productive lives, there is no problem.
  • Playing video games is not the same as learning how to force yourself to study something that you don't understand, something that heretofore you saw no interest in. Games must be designed carefully to afford such choices, so that the players do not blindly follow the task-trail laid out by the designers. Elementary school students of course should explore microcosms with the due understanding that eventually the same processes will/can be applied to the real non-designed exploration that is life...
  • The question at hand (video games in education) is hence isomorphic (or at least related) to a much deeper question: is viewing life a game, a thing to be optimized and solved, a good philosophy? (even the question shows how deeply ingrained the ideas of valuation are!). I say no.

hide / edit[3] / print
ref: -0 tags: artificial intelligence machine learning education john toobey leda cosmides date: 12-13-2010 03:43 gmt revision:3 [2] [1] [0] [head]

Notes & responses to evolutionary psychologists John Toobey and Leda Cosmides' - authors of The Adapted Mind - essay in This Will change Everything

  • quote: Currently the most keenly awaited technological development is an all-purpose artificial intelligence-perhaps even an intelligence that would revise itself and grow at an ever-accelerating rate until it enacts millennial transformations. [...] Yet somehow this goal, like the horizon, keeps retreating as fast as it is approached.
  • AI's wrong turn was assuming that the best methods for reasoning and thinking are those that can be applied successfully to any problem domain.
    • But of course it must be possible - we are here, and we did evolve!
    • My opinion: the limit is codifying abstract, assumed, and ambiguous information into program function - e.g. embodying the world.
  • Their idea: intelligences use a number of domain-specific, specialized "hacks", that work for limited tasks; general intelligence appears as a result of the combination of all of these.
    • "Our mental programs can be fiendishly well engineered to solve some problems because they are not limited to using only those strategies that can be applied to all problems."
    • Given the content of the wikipedia page (above), it seems that they have latched onto this particular idea for at least 18 years. Strange how these sorts of things work.
  • Having accurate models of human intelligence would achieve two things:
    • It would enable humans to communicate more effectively with machines via shared knowledge and reasoning.
    • (me:) The AI would be enhanced by the tricks and hacks that evolution took millions of years, billions of individuals, and 10e?? (non-discrete) interactions between individuals and the environment. This constitutes an enormous store of information, to overlook it necessitates (probably, there may be seriuos shortcuts to biological evolution) re-simulating all of the steps that it took to get here. We exist as a cashed output of the evolutionary algorithm; recomputing this particular function is energetically impossible.
  • "The long term ambition [of evolutionary psychology] is to develop a model of human nature as precise as if we had the engineering specifications for the control systems of a robot.
  • "Humanity will continue to be blind slaves to the programs evolution has built into our brains until we drag them into the light. Ordinarily, we inhabit only the versions of reality that they spontaneously construct for us -- the surfaces of things. Because we are unaware that we are in a theater, with our roles and our lines largely written for us by our mental programs, we are credulously swept up in these plays (such as the genocidal drama of us versus them). Endless chain reactions among these programs leave us the victims of history -- embedded in war and oppression, enveloped in mass delusions and cultural epidemics, mired in endless negative-sum conflict \\ If we understood these programs and the coordinated hallucinations they orchestrate in our minds, our species could awaken from the roles these programs assign to us. Yet this cannot happen if knowledge -- like quantum mechanics -- remains forever locked up in the minds of a few specialists, walled off by the years of study required to master it. " Exactly. Well said.
    • The solution, then: much much better education; education that utilizes the best knowledge about transferring knowledge.
    • The authors propose video games; this is already being tested, see {859}

hide / edit[6] / print
ref: -0 tags: meta learning Artificial intelligence competent evolutionary programming Moshe Looks MOSES date: 08-07-2010 16:30 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

Competent Program Evolution

  • An excellent start, excellent good description + meta-description / review of existing literature.
  • He thinks about things in a slightly different way - separates what I call solutions and objective functions "post- and pre-representational levels" (respectively).
  • The thesis focuses on post-representational search/optimization, not pre-representational (though, I believe that both should meet in the middle - eg. pre-representational levels/ objective functions tuned iteratively during post-representational solution creation. This is what a human would do!)
  • The primary difficulty in competent program evolution is the intense non-decomposability of programs: every variable, constant, branch effects the execution of every other little bit.
  • Competent program creation is possible - humans create programs significantly shorter than lookup tables - hence it should be possible to make a program to do the same job.
  • One solution to the problem is representation - formulate the program creation as a set of 'knobs' that can be twiddled (here he means both gradient-descent partial-derivative optimization and simplex or heuristic one-dimensional probabilistic search, of which there are many good algorithms.)
  • pp 27: outline of his MOSES program. Read it for yourself, but looks like:
  • The representation step above "explicitly addresses the underlying (semantic) structure of program space independently of the search for any kind of modularity or problem decomposition."
    • In MOSES, optimization does not operate directly on program space, but rather on subspaces defined by the representation-building process. These subspaces may be considered as being defined by templates assigning values to some of the underlying dimensions (e.g., they restrict the size and shape of any resulting trees).
  • In chapter 3 he examines the properties of the boolean programming space, which is claimed to be a good model of larger/more complicated programming spaces in that:
    • Simpler functions are much more heavily sampled - e.g. he generated 1e6 samples of 100-term boolean functions, then reduced them to minimal form using standard operators. The vast majority of the resultant minimum length (compressed) functions were simple - tautologies or of a few terms.
    • A corollary is that simply increasing syntactic sample length is insufficient for increasing program behavioral complexity / variety.
      • Actually, as random program length increases, the percentage with interesting behaviors decreases due to the structure of the minimum length function distribution.
  • Also tests random perturbations to large boolean formulae (variable replacement/removal, operator swapping) - ~90% of these do nothing.
    • These randomly perturbed programs show a similar structure to above: most of them have very similar behavior to their neighbors; only a few have unique behaviors. makes sense.
    • Run the other way: "syntactic space of large programs is nearly uniform with respect to semantic distance." Semantically similar (boolean) programs are not grouped together.
  • Results somehow seem a let-down: the program does not scale to even moderately large problem spaces. No loops, only functions with conditional evalutation - Jacques Pitrat's results are far more impressive. {815}
    • Seems that, still, there were a lot of meta-knobs to tweak in each implementation. Perhaps this is always the case?
  • My thought: perhaps you can run the optimization not on program representations, but rather program codepaths. He claims that one problem is that behavior is loosely or at worst chaotically related to program structure - which is true - hence optimization on the program itself is very difficult. This is why Moshe runs optimization on the 'knobs' of a representational structure.

hide / edit[7] / print
ref: work-0 tags: metacognition AI bootstrap machine learning Pitrat self-debugging date: 08-07-2010 04:36 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Jacques Pitrat seems to have many of the same ideas that I've had (only better, and he's implemented them!)--

A Step toward and Artificial Scientist

  • The overall structure seems good - difficult problems are attacked by 4 different levels. First level tries to solve the problem semi-directly, by writing a program to solve combinatorial problems (all problems here are constraint based; constraints are used to pare the tree of possible solutions; these trees are tested combinatorially); second level monitors lower level performance and decides which hypotheses to test (which branch to pursue on the tree) and/or which rules to apply to the tree; third level directs the second level and restarts the whole process if a snag or inconsistency is found, forth level gauges the interest of a given problem and looks for new problems to solve within a family so as to improve the skill of the 3 lower levels.
    • This makes sense, but why 4? Seems like in humans we only need 2 - the actor and the critic, bootstrapping forever.
    • Also includes a "Zeus" module that periodically checks for infinite loops of the other programs, and recompiles with trace instructions if an infinite loop is found within a subroutine.
  • Author claims that the system is highly efficient - it codes constraints and expert knowledge using a higher level language/syntax that is then converted to hundreds of thousands of lines of C code. The active search program runs runtime-generated C programs to evaluate and find solutions, wow!
  • This must have taken a decade or more to create! Very impressive. (seems it took 2 decades, at least according to http://tunes.org/wiki/jacques_20pitrat.html)
    • Despite all this work, he is not nearly done - it has not "learning" module.
    • Quote: In this paper, I do not describe some parts of the system which still need to be developed. For instance, the system performs experiments, analyzes them and finds surprising results; from these results, it is possible to learn some improvements, but the learning module, which would be able to find them, is not yet written. In that case, only a part of the system has been implemented: on how to find interesting data, but still not on how to use them.
  • Only seems to deal with symbolic problems - e.g. magic squares, magic cubes, self-referential integer series. Alas, no statistical problems.
  • The whole CAIA system can effectively be used as a tool for finding problems of arbitrary difficulty with arbitrary number of solutions from a set of problem families or meta-families.
  • Has hypothesis based testing and backtracking; does not have problem reformulation or re-projection.
  • There is mention of ALICE, but not the chatbot A.L.I.C.E - some constraint-satisfaction AI program from the 70's.
  • Has a C source version of MALICE (his version of ALICE) available on the website. Amazingly, there is no Makefile - just gcc *.c -rdynamic -ldl -o malice.
  • See also his 1995 Paper: AI Systems Are Dumb Because AI Researchers Are Too Clever images/815_1.pdf

Artificial beings - his book.

hide / edit[2] / print
ref: Friston-2010.02 tags: free energy minimization life learning large theories date: 06-08-2010 13:59 gmt revision:2 [1] [0] [head]

My letter to a friend regarding images/817_1.pdf The free-energy principle: a unified brain theory? PMID-20068583 -- like all critics, i feel the world will benefit from my criticism ;-) Hey , I did read that paper on the plane, and wrote down some comments, but haven't had a chance to actually send them until now. err..anyway.. might as well send them since I did bother writing stuff down: I thought the paper was interesting, but rather specious, especially the way the author makes 'surprise' something to be minimized. This is blatantly false! Humans and other mammals (at least) like being surprised (in the normal meaning of the word). He says things like: "This is where free energy comes in: free energy is an upper bound on surprise, which means that if agents minimize free energy, they implicity minimize surprise -- a huge logical jump, and not one that I'm willing to accept. I feel like this author is trying to capitalize on some recent developments, like variational bayes and ensemble learning, without fully understanding them or having the mathematical chops (like Hayen) to flesh it out. So far as I understand, large theories (as this proposes to be) are useful in that they permit derivation of particular update equations; Variational Bayes for example takes the Kullbeck-Leibler divergence & a factorization of the posterior to create EM update equations. So, even if the free energy idea is valid, the author uses it at such a level to make no useful, mathy predictions. One area where I agree with him is that the nervous system create a model of the internal world, for the purpose of prediction. Yes, maybe this allows 'surprise' to be minimized. But animals minimize surprise not because of free energy, but rather for the much more quotidian reason that surprise can be dangerous. Finally, i wholly reject the idea that value and surprise can be equated or even similar. They seem orthogonal to me! Value is assigned to things that help an animal survive and multiply, surprise is things it's nervous system does not expect. All these things make sense when cast against the theories of evolurion and selection. Perhaps, perhaps selection is a consequence of decreasing free energy - this intuitively and somewhat amorphously/mystically makes sense (the aggregate consequence of life on earth is somehow order, harmony and other 'goodstuff' (but this is an anthropocentric view)) - but if so the author should be able to make more coherent / mathematical prediction of observed phenomena. Eg. why animals locally violate the second law of thermodynamics. Despite my critique, thanks for sending the article, made me think. Maybe you don't want to read it now and I saved you some time ;-)

hide / edit[5] / print
ref: work-0 tags: machine learning manifold detection subspace segregation linearization spectral clustering date: 10-29-2009 05:16 gmt revision:5 [4] [3] [2] [1] [0] [head]

An interesting field in ML is nonlinear dimensionality reduction - data may appear to be in a high-dimensional space, but mostly lies along a nonlinear lower-dimensional subspace or manifold. (Linear subspaces are easily discovered with PCA or SVD(*)). Dimensionality reduction projects high-dimensional data into a low-dimensional space with minimum information loss -> maximal reconstruction accuracy; nonlinear dim reduction does this (surprise!) using nonlinear mappings. These techniques set out to find the manifold(s):

  • Spectral Clustering
  • Locally Linear Embedding
    • related: The manifold ways of perception
      • Would be interesting to run nonlinear dimensionality reduction algorithms on our data! What sort of space does the motor system inhabit? Would it help with prediction? Am quite sure people have looked at Kohonen maps for this purpose.
    • Random irrelevant thought: I haven't been watching TV lately, but when I do, I find it difficult to recognize otherwise recognizable actors. In real life, I find no difficulty recognizing people, even some whom I don't know personally - is this a data thing (little training data), or mapping thing (not enough time training my TV-not-eyes facial recognition).
  • A Global Geometric Framework for Nonlinear Dimensionality Reduction method:
    • map the points into a graph by connecting each point with a certain number of its neighbors or all neighbors within a certain radius.
    • estimate geodesic distances between all points in the graph by finding the shortest graph connection distance
    • use MDS (multidimensional scaling) to embed the original data into a smaller-dimensional euclidean space while preserving as much of the original geometry.
      • Doesn't look like a terribly fast algorithm!

(*) SVD maps into 'concept space', an interesting interpretation as per Leskovec's lecture presentation.

hide / edit[1] / print
ref: work-0 tags: machine learning reinforcement genetic algorithms date: 10-26-2009 04:49 gmt revision:1 [0] [head]

I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other:

1. Gradient descent / Newton-Rhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semi-directed search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before - if it has, edit this to indicate where!

2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that - optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**)) -- we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, cross-validated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an in-depth talk about information generation / destruction in non-linear systems.)

Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of sub-goals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great - the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?)

To my limited knowledge, there has been some work as of recent in the creation of sub-goals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of sub-goals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks - make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment.

Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :-) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not - but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it - boring!

Well, I've stated my sub-goal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible).

(**) Corollary #2 - There is no god. nod, Dawkins.

hide / edit[2] / print
ref: -0 tags: chess evolution machine learning 2004 partial derivative date: 10-26-2009 04:07 gmt revision:2 [1] [0] [head]

A Self-learning Evolutionary Chess Program

  • The evolved program is able to perform at near master level!
  • Used object networks (neural networks that can be moved about according to the symmetries of the problem space). Paul Werbos apparently invented these, too.
  • Approached the problem by assigning values to having pieces at particular places on the board (PVT, positional value tables). The value of a move was the value of the resulting global valuation (sum of value of pieces - value of opponents pieces) + PVT. They used these valuations to look a set number of moves in the future, using an alpha-beta search.
    • Used 4-plys (search depth) while in normal genetic evolution; 6 when pawns would be upgraded.
  • The neural networks looked at the first 2 rows, the last two rows, and a 4x4 square in the middle of the board - areas known to matter in real games. (The main author is a master-level chess player and chess teacher).
  • The outputs of the three neural networks were added to the material and PVT values to assess a hypothetical board position.
  • Genetic selection operated on the PVT values, neural network weights, piece valuation, and biases of the neural networks. These were initialized semi-randomly; PVT values were initialized based on open-source programs.
  • Performed 50 generations of 20 players each. The top 10 players from each generation survived.
  • Gary Kasparov was consulted in this research. Cool!
  • I wonder what would happen if you allowed the program to propose (genetically or otherwise) alternate algorithmic structures. What they describe is purely a search through weight space - what about a genetic search through algorithmic structure space? Too difficult of a search?
  • I mean, that's what humans (the authors) do while they were designing this program/algorithm. The lead author, as mentioned, is already a very good chess player, and hence he could imbue the initial program with a lot of good 'filters' 'kernels' or 'glasses' for looking at the chess board. And how did he arrive at these ideas? Practice (raw data) and communication (other peoples kernels extracted from more raw data, and validated). And how does he play? By using his experience and knowledge to predict probable moves into the future, evaluating their value, and selecting the best. And how does he evaluate his algorithmic? The same way! By using his knowledge of both chess and computer science to simulate hypothetical designs in his head, seeing how he thinks they will perform, and selecting the best one.
  • The problem with present algorithms is that they have no sense of artistic beauty - no love of symmetry, whether it be simple geometric symmetry (beautiful people have symmetric faces) or more fractal (fractional-dimensioned) symmetry, e.g. music, fractals (duh), human art. I think symmetry can enormously cut down the dimension of the search space in learning, hence is frequently worthy of its own search.
    • Algorithms do presently have a good sense of parsimony, at least, through the AIC / regularization / SVD / bayes net's priors / etc. Parsimony can be beauty, too.
  • Another notable discrepancy is that humans can reason in a concrete way - they actively search for the thing that is causing the problem, the thing that is contributing greatly to either good or bad results. They do this by the scientific method, sorta - hold all other things constant, perturb some section of the system, measure the output. This is the same as taking a partial derivative. Such derivative are used heavily/exclusively in training neural networks - weights are changed based on the partial derivative of that weight wrt the output-referenced error. So reasoning is similar to non-parallel backprop? Or a really slow way of taking partial derivatives? Maybe. The goal of both is to assign valuation/causation to a given weight/subsystem.
  • Human reasoning involves dual valuation pathways - internal, based on a model of the world, and external, which of course involves experimentation and memory (and perhaps scholarly journal papers etc). The mammalian cortex-basal ganglia-thalamus loop seems designed for running these sorts of simulations because it is the dual of the problem of selecting appropriate behaviors. (there! I said it!) In internal simulation, you take world state, apply forward transform with perturbation, then evaluate the result - see if your perturbation (partial derivative) yields information. In motor behavior, you take the body state, apply forward transformation with perturbation (muscle contraction), and evaluate the result. Same thing. Of course you don't have to do this too much, as the cortex will remember the input-perturbation-result.
  • Understanding seems to be related to this input-transform-evaluate cycle, too, except here what is changing is the forward transform, and the output is compared to known output - does a given kernel (concept) predict the output/observed data?
  • Now what would happen if you applied this input-transform-evaluate to itself, e.g. you allowed the system to evaluate itself. Nothing? Recursion? (recursion is a very beautiful concept.) Some degree of awareness?
  • Surely someone has thought of this before, and tried to simulate it on a computer. Wasn't AI research all about this in the 70's-80's? People have said that their big problem was that AI was then entirely/mostly symbolic and insufficiently probabilistic or data-intensive; the 90's-21st century seems to have solved that. This field is unfamiliar to me, it'll take some sussing about before I can grok the academic landscape.
    • Even more surely, someone is doing it right now! This is the way the world advances. Same thing happened to me with GPGPU stuff, which I was doing in 2003. Now everyone is up to that shiznit.
  • It seems that machine-learning is transitioning from informing my personal philosophy, to becoming my philosophy. Good/bad? Feel free to edit this entry!
  • It's getting late and I'm tried -> rant ends.

hide / edit[2] / print
ref: work-0 tags: Cohen Singer SLIPPER machine learning hypothesis generation date: 10-25-2009 18:42 gmt revision:2 [1] [0] [head]


  • "One disadvantage of boosting is that improvements in accuracy are often obtained at the expense of comprehensibility.
  • SLIPPER = simple learner with iterative pruning to produce error reduction.
  • Inner loop: the weak lerner splits the training data, grows a single rule using one subset of the data, and then prunes the rule using the other subset.
  • They use a confidence-rated prediction based boosting algorithm, which allows the algorithm to abstain from examples not covered by the rule.
    • the sign of h(x) - the weak learner's hyposthesis - is interpreted as the predited label and the magnitude |h(x)| is the confidence in the prediction.
  • SLIPPER only handles two-class problems now, but can be extended..
  • Is better than, though not dramatically so, than c5rules (a commercial version of Quinlan's decision tree algorithms).
  • see also the excellent overview at http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/msri.ps

hide / edit[1] / print
ref: life-0 tags: IQ intelligence Flynn effect genetics facebook social utopia data machine learning date: 10-02-2009 14:19 gmt revision:1 [0] [head]


My theory on the Flynn effect - human intelligence IS increasing, and this is NOT stopping. Look at it from a ML perspective: there is more free time to get data, the data (and world) has almost unlimited complexity, the data is much higher quality and much easier to get (the vast internet & world!(travel)), there is (hopefully) more fuel to process that data (food!). Therefore, we are getting more complex, sophisticated, and intelligent. Also, the idea that less-intelligent people having more kids will somehow 'dilute' our genetic IQ is bullshit - intelligence is mostly a product of environment and education, and is tailored to the tasks we need to do; it is not (or only very weakly, except at the extremes) tied to the wetware. Besides, things are changing far too fast for genetics to follow.

Regarding this social media, like facebook and others, you could posit that social intelligence is increasing, along similar arguments to above: social data is seemingly more prevalent, more available, and people spend more time examining it. Yet this feels to be a weaker argument, as people have always been socializing, talking, etc., and I'm not sure if any of these social media have really increased it. Irregardless, people enjoy it - that's the important part.

My utopia for today :-)

hide / edit[0] / print
ref: work-0 tags: covariance matrix adaptation learning evolution continuous function normal gaussian statistics date: 06-30-2009 15:07 gmt revision:0 [head]


  • Details a method of sampling + covariance matrix approximation to find the extrema of a continuous (but intractable) fitness function
  • HAs flavors of RLS / Kalman filtering. Indeed, i think that kalman filtering may be a more principled method for optimization?
  • Can be used in high-dimensional optimization problems like finding optimal weights for a neural network.
  • Optimum-seeking is provided by weighting the stochastic samples (generated ala a particle filter or unscented kalman filter) by their fitness.
  • Introductory material is quite good, actually...

hide / edit[1] / print
ref: life-0 tags: NYTimes genius talent skill learning date: 06-27-2009 18:36 gmt revision:1 [0] [head]

http://www.nytimes.com/2009/05/01/opinion/01brooks.html?_r=1 -- the 'modern view' of genius. Makes sense to me.

  • quote: "By practicing in this way, performers delay the automatizing process. The mind wants to turn deliberate, newly learned skills into unconscious, automatically performed skills. But the mind is sloppy and will settle for good enough. By practicing slowly, by breaking skills down into tiny parts and repeating, the strenuous student forces the brain to internalize a better pattern of performance." -- exactly!!
  • quote: The primary trait she possesses is not some mysterious genius. It’s the ability to develop a deliberate, strenuous and boring practice routine.
  • It's not who you are, it's what you do. (law of the cortex: you get good at what you do).
  • The subconcious / ability to push skills to the subconcious should not be neglected. Insight apparently is mostly subconcious, and rapid decisions are too - the rational/concious brain is simply too slow and deliberate to form realtime behavior & reactions, but as the above quote highlights, it is also too 'lazy' and accepting to carefully hone a true skill. This requires attention.
  • From the guardian -- "Sometimes an overload of facts is the mark of a dull and pedestrian mind, the antithesis of intelligence."
    • also: "Intelligence is a matter of output, not scores on a test." We know genius & talent by it's output.

hide / edit[1] / print
ref: Boyd-2004.08 tags: basal ganglia learning implicit explicit lesion stroke date: 05-05-2009 23:14 gmt revision:1 [0] [head]

PMID-15286181[0] Providing explicit information disrupts implicit motor learning after basal ganglia stroke.

  • Evidence suggests that the BG is important for advance preparation of responses in learned sequences of actions; when given knowledge about upcoming responses, healthy subjects used the information to prepare for not only their first, but subsequent movements. Individuals with PD only used advanced infor to prepare for the first movement.
  • Interestingly, evidence is accumulating that in some cases conscious strategies for movement disrupts motor learning.
  • The task here was to perform a contiuous tracking task where the middle third segment was constant between trials. (Performance on this segment was what was measured).
  • As the title says, telling the subjects that the middle third does not change (explicit knowledge) impeded their performance relative to uninformed controls. This was not seen in the matched healthy subjects.
  • They looked at improvement in tracking ability, not the ability itself.


[0] Boyd LA, Winstein CJ, Providing explicit information disrupts implicit motor learning after basal ganglia stroke.Learn Mem 11:4, 388-96 (2004 Jul-Aug)

hide / edit[5] / print
ref: Legenstein-2008.1 tags: Maass STDP reinforcement learning biofeedback Fetz synapse date: 04-09-2009 17:13 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-18846203[0] A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

  • (from abstract) The resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP.
    • This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker.
  • STDP is prevalent in the cortex ; however, it requires a second signal:
    • Dopamine seems to gate STDP in corticostriatal synapses
    • ACh does the same or similar in the cortex. -- see references 8-12
  • simple learning rule they use: d/dtW ij(t)=C ij(t)D(t)
  • Their notes on the Fetz/Baker experiments: "Adjacent neurons tended to change their firing rate in the same direction, but also differential changes of directions of firing rates of pairs of neurons are reported in [17] (when these differential changes were rewarded). For example, it was shown in Figure 9 of [17] (see also Figure 1 in [19]) that pairs of neurons that were separated by no more than a few hundred microns could be independently trained to increase or decrease their firing rates."
  • Their result is actually really simple - there is no 'control' or biofeedback - there is no visual or sensory input, no real computation by the network (at least for this simulation). One neuron is simply reinforced, hence it's firing rate increases.
    • Fetz & later Schimdt's work involved feedback and precise control of firing rate; this does not.
    • This also does not address the problem that their rule may allow other synapses to forget during reinforcement.
  • They do show that exact spike times can be rewarded, which is kinda interesting ... kinda.
  • Tried a pattern classification task where all of the information was in the relative spike timings.
    • Had to run the pattern through the network 1000 times. That's a bit unrealistic (?).
      • The problem with all these algorithms is that they require so many presentations for gradient descent (or similar) to work, whereas biological systems can and do learn after one or a few presentations.
  • Next tried to train neurons to classify spoken input
    • Audio stimului was processed through a cochlear model
    • Maass previously has been able to train a network to perform speaker-independent classification.
    • Neuron model does, roughly, seem to discriminate between "one" and "two"... after 2000 trials (each with a presentation of 10 of the same digit utterance). I'm still not all that impressed. Feels like gradient descent / linear regression as per the original LSM.
  • A great many derivations in the Methods section... too much to follow.
  • Should read refs:
    • PMID-16907616[1] Gradient learning in spiking neural networks by dynamic perturbation of conductances.
    • PMID-17220510[2] Solving the distal reward problem through linkage of STDP and dopamine signaling.


[0] Legenstein R, Pecevski D, Maass W, A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.PLoS Comput Biol 4:10, e1000180 (2008 Oct)
[1] Fiete IR, Seung HS, Gradient learning in spiking neural networks by dynamic perturbation of conductances.Phys Rev Lett 97:4, 048104 (2006 Jul 28)
[2] Izhikevich EM, Solving the distal reward problem through linkage of STDP and dopamine signaling.Cereb Cortex 17:10, 2443-52 (2007 Oct)

hide / edit[2] / print
ref: Shadmehr-1997.01 tags: Shadmehr human long term memory learning motor M1 cortex date: 03-25-2009 15:29 gmt revision:2 [1] [0] [head]

PMID-8987766[0] Functional Stages in the Formation of Human Long-Term Motor Memory

  • We demonstrate that two motor maps may be learned and retained, but only if the training sessions in the tasks are separated by an interval of ~5 hr.
  • Analysis of the after-effects suggests that with a short temporal distance, learning of the second task leads to an unlearning of the internal model for the first.
  • many many citations!


[0] Shadmehr R, Brashers-Krug T, Functional stages in the formation of human long-term motor memory.J Neurosci 17:1, 409-19 (1997 Jan 1)

hide / edit[2] / print
ref: MAPlle-2009.03 tags: sleep spindles learning ripples LFP hippocampus neocortex synchrony SWS REM date: 03-25-2009 15:05 gmt revision:2 [1] [0] [head]

PMID-19245368[0] The influence of learning on sleep slow oscillations and associated spindles and ripples in humans and rats

  • Here we examined whether slow oscillations also group learning-induced increases in spindle and ripple activity, thereby providing time-frames of facilitated hippocampus-to-neocortical information transfer underlying the conversion of temporary into long-term memories.
  • No apparent grouping effect between slow oscillations and learning-induced spindles and ripples in rats.
  • Stronger effect of learning on spindles (neocortex) and ripples (hippocampus) ; less or little effect of learning on slow waves in the neocortex.
  • have a good plot showing their time-series analysis:


[0] Mölle M, Eschenko O, Gais S, Sara SJ, Born J, The influence of learning on sleep slow oscillations and associated spindles and ripples in humans and rats.Eur J Neurosci 29:5, 1071-81 (2009 Mar)

hide / edit[1] / print
ref: BrashersKrug-1996.07 tags: motor learning sleep offline consolidation Bizzi Shadmehr date: 03-24-2009 15:39 gmt revision:1 [0] [head]

PMID-8717039[0] Consolidation in human motor memory.

  • while practice produces speed and accuracy improvements, significant improvements - ~20% also occur 24hours later following a period of sleep. Why is this? We can answer it with the recording system!


[0] Brashers-Krug T, Shadmehr R, Bizzi E, Consolidation in human motor memory.Nature 382:6588, 252-5 (1996 Jul 18)

hide / edit[3] / print
ref: Rasch-2009.04 tags: REM learning procedural memory sleep spindles date: 03-23-2009 18:32 gmt revision:3 [2] [1] [0] [head]

PMID-18836440[0] Pharmacological REM sleep suppression paradoxically improves rather than impairs skill memory

  • surpressed REM sleep with SSRIs or norepinephrine reuptake inhibitor
    • yet tested the subjects after a long wash-out: 32 hours, including 2 nights sleep.
  • did not impair word-pair recognition, and improved finger tapping accuracy.
  • sleep spindles are a feature of non-REM sleep.
  • REM sleep is characterized by an abscence of serotonin and norepinephrine; SSRIs and SNRIs increase the levels of these two neurotransmitters, respectively, at the synaptic cleft.
  • clinical studies of depressed patients show no impairment of skill performance during long-term treatment with these drugs, despite marked REM supression
  • did mirror-tracing and finger-tapping tasks.
  • SSRI supressed REM sleep; SNRI almost completely removed REM.
  • treatment increased accuracy of finger tapping task! esp. for the SNRI.
    • increase in accuracy was positively correlated to the change in spindle density.
  • For the mirror task, there were notable improvements after sleep, but no significant difference between placebo, SSRI, and SNRI groups.
  • paired-word retention task has been shown dependent on SWS; it was not affected by pharmacology.
  • They suggest that perhaps SSRI /SNRI supressed simply the typical measures of REM sleep, and that other factors critical for the associated consolidation were unaffected (e.g. high cholinergic activity).
  • result is consistent with [1]


[0] Rasch B, Pommer J, Diekelmann S, Born J, Pharmacological REM sleep suppression paradoxically improves rather than impairs skill memory.Nat Neurosci no Volume no Issue no Pages (2008 Oct 5)
[1] Tamaki M, Matsuoka T, Nittono H, Hori T, Fast sleep spindle (13-15 hz) activity correlates with sleep-dependent improvement in visuomotor performance.Sleep 31:2, 204-11 (2008 Feb 1)

hide / edit[1] / print
ref: Maquet-2001.11 tags: sleep learning memory Maquet date: 03-20-2009 18:38 gmt revision:1 [0] [head]

PMID-11691982[0] The Role of Sleep in Learning and Memory

  • 8 years ago; presumably much has changed?
  • NREM = SWS; REM = PS (paradoxical sleep)
  • nice table in there! looks as though he was careful in background research on this one; plenty of references.
  • "indeed, stress can also lead to an increase in REM sleep." -- but this may only be related to the presence of new material.
    • however, there is no increase in REM sleep if there is no material to learn.
  • reminder that theta rhythm is seen in the hippocampus in both exploratory activity and in REM sleep.
    • anticipated the presence of replay in the hippocampus
  • spindles allow the entry of Ca+2, which facilitates LTP (?).
  • I should check up on songbird learning (mentioned in the review!).
    • Young zebra finches have to establish the correspondence between vocal production (motor output) and the resulting auditory feedback (sensory).
    • This cannot be done during waking because the bird song arises from tightly time-coded sequence of activity; during sleep, however, motor output can be compared to sensory feedback (so as to capture an inverse model?)
  • PGO (ponto-geniculo-occipital) waves occur immediately before REM sleep. PGO waves are more common in rats after aversive training.
  • ACh increases cortical plasticity in adult mammals; REM sleep is characterized by a high level of ACh and 5-HT (serotonin).
  • sleep may not be necessary for recall-based learning, it just may be a goot time for it. Sharp waves and ripples are observed in both quiet waking and SWS.
  • Learning to reach in a force field is consolidated in 5 hours after training. [1]
  • Again mentions the fact that antidipressant drugs, which drastically reduce the amount of REM sleep, do not aversely affect memory.


[0] Maquet P, The role of sleep in learning and memory.Science 294:5544, 1048-52 (2001 Nov 2)
[1] Shadmehr R, Brashers-Krug T, Functional stages in the formation of human long-term motor memory.J Neurosci 17:1, 409-19 (1997 Jan 1)

hide / edit[1] / print
ref: Matsuzaka-2007.02 tags: skill learning M1 motor control practice cortex date: 03-20-2009 18:31 gmt revision:1 [0] [head]

PMID-17182912[0] Skill Representation in the Primary Motor Cortex After Long-Term Practice

  • The acquisition of motor skills can lead to profound changes in the functional organization of the primary motor cortex (M1) yes
  • 2 task modes: random target acquisition, and one of 2 repeating sequences (predictable, repeating mode)
  • 2 years of training -> 40% of units were differentially active during the two task modes
  • variations in movement types in the two classes did not fully explain the difference in activity between the 2 tasks
    • M1 neurons are more influence by the task than the actual kinematics.


hide / edit[1] / print
ref: Eschenko-2006.12 tags: sleep spindle learning rats date: 03-20-2009 00:40 gmt revision:1 [0] [head]

PMID-17167082[0] Elevated sleep spindle density after learning or after retrieval in rats.

  • sleep spindles = 12–15 Hz oscillations superimposed on slow waves (<1 Hz)
    • they say these 'promote' but infact they may just be effects of some lower-level synchronization / ensemble depolarization.
  • used an odor-response-reward task.
  • spindles reliably appear 1 hour after sleep begins.
  • hippocampal ripples are temporally related to cortical spindles and both are grouped by slow oscillations.
  • showed that pure exploration of novel environments (without the odorant pairing) does not change sleep spindle occurence frequency.


[0] Eschenko O, Mölle M, Born J, Sara SJ, Elevated sleep spindle density after learning or after retrieval in rats.J Neurosci 26:50, 12914-20 (2006 Dec 13)

hide / edit[7] / print
ref: Stickgold-2001.11 tags: review dream sleep REM NREM SWS learning memory replay date: 03-19-2009 17:09 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

PMID-11691983[0] Sleep, Learning, and Dreams: Off-line Memory Reprocessing

  • sleep can be broadly divided into REM (rapid eye movement) and NREM (no rapid eye movement) sleep, with the REM-NREM cycle lasting 90 minutes in humans.
  • REM seems involved in proper binocular wiring in the visual cortex, development of problem solving skills, and discrimination tasks.
    • REM sleep seems as important as visual experience for wiring binocular vision.
  • REM seems critical for learning procedural memories, but not declarative (by the authors claim that the tasks used in declarative tests are too simple).
    • Depriving rats of REM sleep can impair procedural learning at test points up to a week later.
    • SWS may be better for consolidation of declarative memory.
  • Strongest evidence comes from a visual texture discrimination task, where improvements are only seen after REM sleep.
    • REM has also been shown to have an effect in learning of complex logic games, foreign language acquisition, and after intensive studying.
    • Solving anagrames stronger after being woken up from REM sleep. (!)
  • REM (hypothetically) involves NC -> hippocampus; SWS involves hippocampus -> NC (hence declarative memory). (Buzaki 1996).
    • This may use theta waves, which enhance LTP in the hippocampus; the slow large depolarizations in SWS may facilitate LTP in the cortex.
  • Replay in the rat hippocampus:
    • replay occurs within layer CA1 during SWS for a half hour or so after learning, and in REM after 24 hours.
    • replay shifts from being in-phase with the theta wave activity (e.g. helping LTP) to being out of phase (coinicident with troughs, possibly used to 'erase' memories from the hippocampus?); this is in accord with memories becoming hippocampally independent.
  • ACh levels are at waking levels or higher, and levels of NE (noradrenergic) & 5-HT go near zero.
  • DLPFC (dorsolateral prefrontal cortex) is inhibited during REM sleep - presumably, this results in an inability to allocate attentional resources.
  • ACC (anterior cingulate cortex), MFC (medial frontal cortex), and the amygdala are highly active in REM sleep.
  • if you block correlates of learning - PKA pathwat, zif-268 genes during REM, learning is impaired.
  • In the context of a multilevel system of sleep-dependent memory reprocessing, dreams represent the conscious awareness of complex brain systems involved in the reprocessing of emotions and memories during sleep.
    • the whole section on dreaming is really interesting!


[0] Stickgold R, Hobson JA, Fosse R, Fosse M, Sleep, learning, and dreams: off-line memory reprocessing.Science 294:5544, 1052-7 (2001 Nov 2)

hide / edit[4] / print
ref: Seidler-2006.11 tags: basal ganglia learning fMRI adaptation date: 03-11-2009 21:34 gmt revision:4 [3] [2] [1] [0] [head]

PMID-16794848[9] Bilateral basal ganglia activation associated with sensorimotor adaptation.

  • shows that the basal ganglia is highly active durnig the initial stages of sensory motor adaptation (cursor rotation).
    • specifically: "We observed activation in the right globus pallidus and putamen, along with the right prefrontal, premotor and parietal cortex," to support spatial cognitive processes of adaptation.. and .. "activation in the left globus pallidus and caudate nucleus, along with the left premotor and supplementary motor cortex, which may support the sensorimotor processes of adaptation"
  • human subjects in a 3T MRI scanner; BOLD signal.

hide / edit[0] / print
ref: -0 tags: alopex machine learning artificial neural networks date: 03-09-2009 22:12 gmt revision:0 [head]

Alopex: A Correlation-Based Learning Algorithm for Feed-Forward and Recurrent Neural Networks (1994)

  • read the abstract! rather than using the gradient error estimate as in backpropagation, it uses the correlation between changes in network weights and changes in the error + gaussian noise.
    • backpropagation requires calculation of the derivatives of the transfer function from one neuron to the output. This is very non-local information.
    • one alternative is somewhat empirical: compute the derivatives wrt the weights through perturbations.
    • all these algorithms are solutions to the optimization problem: minimize an error measure, E, wrt the network weights.
  • all network weights are updated synchronously.
  • can be used to train both feedforward and recurrent networks.
  • algorithm apparently has a long history, especially in visual research.
  • the algorithm is quite simple! easy to understand.
    • use stochastic weight changes with a annealing schedule.
  • this is pre-pub: tables and figures at the end.
  • looks like it has comparable or faster convergence then backpropagation.
  • not sure how it will scale to problems with hundreds of neurons; though, they looked at an encoding task with 32 outputs.

hide / edit[0] / print
ref: Diedrichsen-2005.1 tags: Shadmehr error learning basal ganglia cerebellum motor cortex date: 03-09-2009 19:26 gmt revision:0 [head]

PMID-16251440[0] Neural correlates of reach errors.

  • Abstract:
  • Reach errors may be broadly classified into errors arising from unpredictable changes in target location, called target errors, and errors arising from miscalibration of internal models (e.g., when prisms alter visual feedback or a force field alters limb dynamics), called execution errors.
    • Execution errors may be caused by miscalibration of dynamics (e.g., when a force field alters limb dynamics) or by miscalibration of kinematics (e.g., when prisms alter visual feedback).
  • Although all types of errors lead to similar on-line corrections, we found that the motor system showed strong trial-by-trial adaptation in response to random execution errors but not in response to random target errors.
  • We used functional magnetic resonance imaging and a compatible robot to study brain regions involved in processing each kind of error.
  • Both kinematic and dynamic execution errors activated regions along the central and the postcentral sulci and in lobules V, VI, and VIII of the cerebellum, making these areas possible sites of plastic changes in internal models for reaching.
    • Only activity related to kinematic errors extended into parietal area 5.
    • These results are inconsistent with the idea that kinematics and dynamics of reaching are computed in separate neural entities.
  • In contrast, only target errors caused increased activity in the striatum and the posterior superior parietal lobule.
  • The cerebellum and motor cortex were as strongly activated as with execution errors. These findings indicate a neural and behavioral dissociation between errors that lead to switching of behavioral goals and errors that lead to adaptation of internal models of limb dynamics and kinematics.


hide / edit[1] / print
ref: Mehta-2007.01 tags: hippocampus visual cortex wilson replay sleep learning states date: 03-09-2009 18:53 gmt revision:1 [0] [head]

PMID-17189946[0] Cortico-hippocampal interaction during up-down states and memory consolidation.

  • (from the associated review) Good pictorial description of how the hippocampus may impinge order upon the cortex:
    • During sleep the cortex is spontaneously and randomly active. Hippocampal activity is similarly disorganized.
    • During waking, the mouse/rat moves about in the environment, activating a sequence of place cells. The weights of the associated place cells are modified to reflect this sequence.
    • When the rat falls back to sleep, the hippocampus is still not random, and replays a compressed copy of the day's events to the cortex, which can then (and with other help, eg. ACh), learn/consolidate it.
  • see [1].


hide / edit[1] / print
ref: Nishida-2007.04 tags: sleep spindle learning nap NREM date: 03-06-2009 17:56 gmt revision:1 [0] [head]

PMID-17406665[0] Daytime naps, motor memory consolidation and regionally specific sleep spindles.

  • asked subjects to learn a motor task with their non-dominant hand, and then tested them 8 hours later.
  • subjects that were allowed a 60-90 minute siesta improved their performance significantly relative to controls and relative to previous performance.
  • when they subtracted EEG activity of the non-learning hemisphere from the learning hemisphere, spindle activity was strongly correlated with offline memory improvement.


hide / edit[1] / print
ref: KAli-2004.03 tags: hippocampus memory model Dayan replay learning memory date: 03-06-2009 17:53 gmt revision:1 [0] [head]

PMID-14983183[0] Off-line replay maintains declarative memories in a model of hippocampal-neocortical interactions

  • (i'm skimming the article)
  • The neocortex acts as a probabilistic generative model. unsupervised learning extracts categories, tendencies and correlations from the statistics of the inputs into the [synaptic weights].
  • Their hypothesis is that hippocampal replay is required for maintenance of episodic memories; their model and simulations support this.
  • quote: "However, the computational goal of episodic learning is storing individual events rather than discovering statistical structure, seemingly rendering consolidation inappropriate. If initial hippocampal storage of the episode already ensures that it can later be recalled episodically, then, barring practical advantages such as storage capacity (or perhaps efficiency), there seems little point in duplicating this capacity in neocortex." makes sense!


hide / edit[1] / print
ref: Brown-2007.09 tags: motor force field learning vision date: 02-20-2009 00:28 gmt revision:1 [0] [head]

PMID-17855611 Motor Force Field Learning Influences Visual Processing of Target Motion

  • as you can see from the title - this is an interesting result.
  • learning to compensate for forces applied to the hand influenced how participants predicted target motion for interception.
  • subjects were trained on a robotic manipulandum that applied different force fields; they had to use the manipulandum to hit a accelerating target.
  • There were 3 force feilds: rightward, leftward, and null. Target accelerated left to right. Subjects with the rightward force field hit more targets than the null, and these more targets than the leftward force field. Hence motor knowledge of the environment (associated accelerations, as if there were wind or water current...) influenced how motion was perceived and acted upon.
    • perhaps there is a simple explanation for this (rather than their evolutionary information-sharing hypothesis): there exists a network that serves to convert visual-spatial coordinates into motor plans, and later muscle activations. The presence of a force field initially only affects the motor/muscle control parts of the ctx, but as training continues, the changes are propagated earlier into the system - to the visual system (or at least the visual-planning system). But this is a complicated system, and it's hard to predict how and where adaptation occurs.

hide / edit[0] / print
ref: Tamaki-2008.02 tags: sleep spindle NREM motor learning date: 02-18-2009 17:44 gmt revision:0 [head]

PMID-18274267[0] Fast sleep spindle (13-15 hz) activity correlates with sleep-dependent improvement in visuomotor performance.

  • mirror-tracing task performance improves following a night's sleep.
  • the improvement is correlated with the fast-spindle activity.
  • spindles were detected from EEG recordings with a 10-16hz butterworth filter in matlab. Spindles had to be >= 15uv, >= 0.5s
    • slow spindles = 10-13Hz, predominant in the frontal regions.
    • fast spindles > 13hz, predominant in the parietal regions.


hide / edit[2] / print
ref: Morin-2008.08 tags: sleep spindles NREM motor learning date: 02-18-2009 17:35 gmt revision:2 [1] [0] [head]

PMID-18714787[0] Motor sequence learning increases sleep spindles and fast frequencies in post-training sleep.

  • as you can read in the title, it is the motor learning that increases the spindles. They did not look for causality in the opposite direction.
  • Task was finger-tap motor sequence learning, with control. Subjects had to type on a computer keyboard using the nondominant hand. No visual feedback was given during non-training performance (e.g. during practice).
  • Beta-frequencies are greater in sleep after motor learning. , though this is not correlated with actual consolidation.
  • Other studies have shown that spindles are also more frequent after spatial or verbal learning.
  • observed no effect of SWS on motor sequence learning.


hide / edit[1] / print
ref: Song-2009.01 tags: sleep motor learning consolidation attention date: 02-18-2009 17:28 gmt revision:1 [0] [head]

PMID-18951924[0] Consciousness and the consolidation of motor learning

  • Not all consolidation occurs during sleep; in some instances consolidation only occurs during the day; in other times, neither daytime or sleep consolidates a memory.
  • Attention is an important factor that may determine if sleep or daytime replay plays a role in consolidation.
  • In a tapping task, after a night of sleep performance is faster and more accurrate. Without the sleep, but with the same 12-hour interval, the same improvement is absent.
  • Evidence suggests though we experience the sensation of 'voluntary' movement, the conscious wish to move is more an afterthought than the cause.
    • Source: Libet et al 1983. (Subjects could accurately time events, and reported that the will to move preceded actual movement. However, the cortical potentials associated with movement preceded conscious awareness).
    • nonetheless, studies indicate that conscious awareness can affect movements, and how they are consolidated.
  • people with no declarative memory (like HR) can still remember procedural skills.
  • Consolidation = the process by which a fragile memory acquired via practice or exposure is consolidated into a more permanent, stable long-term form. If it occurs in the hours after practice, then it is 'off-line'; likewise for sleep.
    • Consolidation also includes stabilization, or making the memories robust to interference from new memories (retroactive interference).
    • This seems to be dependent on sleep, specifically NREM.
    • In studies where attention was broken using a tone counting task, neither over-night nor over-day enhancements were found to occur for motor sequence learning.
    • Another interesting effect is the development of explicit memory over the course of a night's sleep. Sleep seems to encourage conscious awareness of implicit patterns. -- probably through replay and integration.
  • Regarding "thinking too much" about sports: "As in the studies cited above, motor learning may initially rely on more explicit and prefrontal areas, but after extended practice and expertise, shift to more dorsal areas, but thinking about the movement can shift activity back to the less skilled explicit areas. Although many explanations may be derived, one could argue that these athletes show that even when years of practice has given the implicit system an exquisitely fine tuned memory for a movement, the explicit system can interfere at the time of performance and erase all evidence of implicit memory."
  • Well-written throughout, especially the conclusion paragraph.


hide / edit[4] / print
ref: Peters-2008.05 tags: Schaal reinforcement learning policy gradient motor primitives date: 02-17-2009 18:49 gmt revision:4 [3] [2] [1] [0] [head]

PMID-18482830[0] Reinforcement learning of motor skills with policy gradients

  • they say that the only way to deal with reinforcement or general-type learning in a high-dimensional policy space defined by parameterized motor primitives are policy gradient methods.
  • article is rather difficult to follow; they do not always provide enough details (for me) to understand exactly what their equations mean. Perhaps this is related to their criticism that others's papers are 'ad-hoc' and not 'statistically motivated'
  • none the less, it seems interesting..
  • their previous paper - Reinforcement learning for Humanoid robotics - maybe slightly easier to understand.


hide / edit[0] / print
ref: Mamassian-2008.06 tags: overconfidence human motor learning date: 02-17-2009 17:51 gmt revision:0 [head]

PMID-18578851 Overconfidence in an objective anticipatory motor task.

  • Participants were asked to press a key in synchrony with a predictable visual event and were rewarded if they succeeded and sometimes penalized if they were too quick or too slow.
  • If they had used their own motor uncertainty in anticipating the timing of the visual stimulus, they would have maximized their gain.
  • However, they instead displayed an overconfidence in the sense that they underestimated the magnitude of their uncertainty and the cost of their error.
  • Therefore, overconfidence is not limited to subjective ratings in cognitive tasks, but rather appears to be a general characteristic of human decision making. interesting! but is overconfidence really so bad?

hide / edit[1] / print
ref: notes-0 tags: Barto Hierarchal Reinforcement Learning date: 02-17-2009 05:38 gmt revision:1 [0] [head]

Recent Advancements in Hierarchal Reinforcement Learning

  • RL with good function-approximation methods for evaluating the value function or policy function solve many problems yet...
  • RL is bedeviled by the curse of dimensionality: the number of parameters grows exponentially with the size of a compact encoding of state.
  • Recent research has tackled the problem by exploiting temporal abstraction - decisions are not required at each step, but rather invoke the activity of temporally extended sub-policies. This is somewhat similar to a macro or subroutine in programming.
  • This is fundamentally similar to adding detailed domain-specific knowledge to the controller / policy.
  • Ron Parr seems to have made significant advances in this field with 'hierarchies of abstract machines'.
    • I'm still looking for a cognitive (predictive) extension to these RL methods ... these all are about extension through programmer knowledge.
  • They also talk about concurrent RL, where agents can pursue multiple actions (or options) at the same time, and assess value of each upon completion.
  • Next are partially observable markov decision processes, where you have to estimate the present state (belief state), as well as a policy. It is known that and optimal solution to this task is intractable. They propose using Hierarchal suffix memory as a solution ; I can't really see what these are about.
    • It is also possible to attack the problem using hierarchal POMDPs, which break the task into higher and lower level 'tasks'. Little mention is given to the even harder problem of breaking sequences up into tasks.
  • Good review altogether, reasonable balance between depth and length.

hide / edit[2] / print
ref: Vasilaki-2009.02 tags: associative learning prefrontal cortex model hebbian date: 02-17-2009 03:37 gmt revision:2 [1] [0] [head]

PMID-19153762 Learning flexible sensori-motor mappings in a complex network.

  • Were looking at a task, presented to monkeys over 10 years ago, where two images were presented to the monkeys, and they had to associate left and rightward saccades with both.
  • The associations between saccade direction and image was periodically reversed. Unlike humans, who probably could very quickly change the association, the monkeys required on the order of 30 trials to learn the new association.
  • Interestingly, whenever the monkeys made a mistake, they effectively forgot previous pairings. That is, after an error, the monkeys were as likely to make another error as they were to choose correctly, independent of the number of correct trials preceding the error. Strange!
  • They implement and test reward-modulated hebbian learning (RAH), where:
    • The synaptic weights are changed based on the presynaptic activity, the postsynaptic activity minus the probability of both presynaptic and postsynaptic activity. This 'minus' effect seems similar to that of TD learning?
    • The synaptic weights are soft-bounded,
    • There is a stop-learning criteria, where the weights are not positively updated if the total neuron activity is strongly positive or strongly negative. This allows the network to ultimately obtain perfection (at some point the weights are no longer changed upon reward), and explains some of the asymmetry of the reward / punishment.
  • Their model perhaps does not scale well for large / very complicated tasks... given the presence of only a single reward signal. And the lack of attention / recall? Still, it fits the experimental data quite well.
  • They also note that for all the problems they study, adding more layers to the network does not significantly affect learning - neither the rate nor the eventual performance.

hide / edit[1] / print
ref: Pearlmutter-2009.06 tags: sleep network stability learning memory date: 02-05-2009 19:21 gmt revision:1 [0] [head]

PMID-19191602 A New Hypothesis for Sleep: Tuning for Criticality.

  • Their hypothesis: in the course of learning, the brain's networks move closer to instability, as the process of learning and information storage requires that the network move closer to instability.
    • That is, a perfectly stable network stores no information: output is the same independent of input; a highly unstable network can potentially store a lot of information, or be a very selective or critical system: output is highly sensitive to input.
  • Sleep serves to restore the stability of the network by exposing it to a variety of inputs, checking for runaway activity, and adjusting accordingly. (inhibition / glia? how?)
  • Say that when sleep is not possible, an emergency mechanism must com into play, namely tiredness, to prevent runaway behavior.
  • (From wikipedia:) a potentially serious side-effect of many antipsychotics is that they tend to lower a individual's seizure threshold. Recall that removal of all dopamine can inhibit REM sleep; it's all somehow consistent, but unclear how maintaining network stability and being able to move are related.

hide / edit[1] / print
ref: Kakade-2002.07 tags: dopamine reward reinforcement learning Kakade Dayan date: 12-09-2008 21:27 gmt revision:1 [0] [head]

PMID-12371511[0] Dopamine: generalization and bonuses

  • suggest that some anomalies of dopamine activity is related to generalization and novelty. In terms of novelty, dopamine may be shaping exploration.
  • review results that DA activity signal a global prediction error for summed future reward in conditioning tasks.
    • above, A = pre-training; B = post-training; C = catch trial.
    • this type of model is essentially TD(0); it does not involve 'eligibility traces', but still is capable of learning.
    • remind us that these cells have been found, but there are many other different types of responses of dopmamine cells.
  • storage of these predictions involves the basolateral nuclei of the amygdala and the orbitofrontal cortex. (but how do these structures learn their expectations ... ?)
  • dopamine release is associated with motor effects that are species specific, like approach behaviors, that can be irrelevant or detrimental to the delivery of reward.
  • bonuses, for the authors = fictitious quantities added to rewards or values to ensure appropriate exploration.
  • resolution of DA activity ~ 50ms.
  • Romo & Schultz have found that there are phasic increases in DA activity to both rewarded and non-rewarded events/stimuli - something that they explain as 'generalization'. But - maybe it is something else? like a startle / get ready to move response?
  • They suggest that it is a matter of intermediate states where the monkey is uncertain as to what to do / what will happen. hum, not sure about this.


hide / edit[0] / print
ref: notes-0 tags: policy gradient reinforcement learning aibo walk optimization date: 12-09-2008 17:46 gmt revision:0 [head]

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

  • simple, easy to understand policy gradient method! many papers cite this on google scholar.
  • compare to {651}

hide / edit[1] / print
ref: Karni-1998.02 tags: motor learning skill acquisition fMRI date: 10-08-2008 21:05 gmt revision:1 [0] [head]

PMID-9448252[0] The acquisition of skilled motor performance: Fast and slow experience-driven changes in primary motor cortex

  • a few minutes of daily practice on a sequential finger opposition task induced large, incremental performance gains over a few weeks of training
  • performance was lateralized
  • limited training experience can be sufficient to trigger performance gains that require time to become evident.
  • learning is characterized by two stages:
    • "fast” learning, an initial, within-session improvement phase, followed by a period of consolidation of several hours duration
      • possibly this is due to synaptic plasticity.
    • and then “slow” learning, consisting of delayed, incremental gains in performance emerging after continued practice
      • In many instances, most gains in performance evolved in a latent manner not during, but rather a minimum of 6–8 hr after training, that is, between sessions
      • this is thought to correspond to the reorganization of M1 & other cortical structures.
  • long-term training results in highly specific skilled motor performance, paralleled by the emergence of a specific, more extensive representation of a trained sequence of movements in the contralateral primary motor cortex. this is seen when imaging for activation using fMRI.
  • why is there the marked difference between declarative learning, which often only takes one presentation to learn, and procedural memory, which takes several sessions to learn? Hypothetically, they require different neural substrates.
  • pretty good series of references...


hide / edit[1] / print
ref: -0 tags: blind seeing tongue plasticity learning date: 10-08-2008 17:49 gmt revision:1 [0] [head]

“Seeing” through the tongue: cross-modal plasticity in the congenitally blind

  • tested their tongue display unit on sighted and blind volunteers; the blind volunteers showed increased PET signal in their occipital lobe, while the sighted (blindfolded) volunteers did not, though both achieved the same levels of performance on a orientation discrimination task after one week of intensive training.
  • TDU unit has 144 contacts.
  • spatial learning with this is apparently robust and rapid with people!

hide / edit[1] / print
ref: Daw-2006.04 tags: reinforcement learning reward dopamine striatum date: 10-07-2008 22:36 gmt revision:1 [0] [head]

PMID-16563737[0] The computational neurobiology of learning and reward

  • I'm sure I read this, but cannot find it in m8ta anymore.
  • short, concise review article.
  • review evidence for actor-critic architectures in the prefrontal cortex.
  • cool: "Perhaps most impressively, a trial-by-trial regression analysis of dopamine responses in a task with varying reward magnitudes showed that the response dependence on the magnitude history has the same form as that expected from TD learning". trial by trial is where it's at! article: Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal


hide / edit[1] / print
ref: Schultz-2000.12 tags: review reward dopamine VTA basal ganglia reinforcement learning date: 10-07-2008 22:35 gmt revision:1 [0] [head]

PMID-11257908[0] Multiple Reward Signals in the Brain

  • deals with regions in the brain in which reward-related activity has been found, and specifically what the activity looks like.
  • despite the 2000 date, the review feels somewhat dated?
  • similar to [1] except much sorter..


hide / edit[1] / print
ref: Schultz-2000.03 tags: review orbitofrontal cortex basal ganglia dopamine reward reinforcement learning striatum date: 10-07-2008 03:53 gmt revision:1 [0] [head]

PMID-10731222[0] Reward processing in primate orbitofrontal cortex and basal ganglia

  • Orbitofrontal neurons showed three principal forms of reward-related activity during the performance of delayed response tasks,
    • responses to reward-predicting instructions,
    • activations during the expectation period immediately preceding reward and
    • responses following reward
    • above, reward-predicting stimulus in a dopamine neuron. Left: the animal received a small quantity of apple juice at irregular intervals without performing in any behavioral task. Right: the animal performed in an operant lever-pressing task in which it released a touch-sensitive resting key and touched a small lever in reaction to an auditory trigger signal. The dopamine neuron lost its response to the primary reward and responded to the reward-predicting sound.
  • for the other figures, read the excellent paper!


hide / edit[1] / print
ref: Buonomano-1998.01 tags: cortical plasticity learning review LTD LTP date: 10-07-2008 03:27 gmt revision:1 [0] [head]

PMID-9530495[0] Cortical plasticity: from synapses to maps

  • focuses on synaptic plasticity as the underlying mechanism of behavior-dependent cortical maps/representations.
  • "within limits, the cortex can allocate cortical area in a use-dependent manner"
  • synaptic plasticity -> STDP via NMDA, etc.
    • demonstrated with intracellular recordings of cat M1 & simultaneous stimulation of the ventrolateral thalamus & intracellular depolarization. Facilitation was short lasting and not present in all neurons.
    • demonstrated in rat auditory cortex / recording in layer 2/3 , stimulate layer 2/3 & White matter/6.
    • review of Ca+ hypothesis of LTP/LTD balance: if the Ca+ influc is below a threshold, LTD occurs; if it is above a certain threshold, LTP.
      • not sure how long LTD has been demonstrated -- 15 min?
  • cellular conditioning = direct induction of plastic changes in the selective responses of individual neurons in vivo as a result of short-term conditioning protocols. this is what we are interested in, for now.
    • this review does not explicitly deal with BG-DA / ACh reinforcement, only timing dependent plasticity, in visual and auditory cortex.
  • cortical plasticity:
    • talk about the revealing/unmasking of hidden responses when sections of cortex are deafferented or digits were amputated.
    • talk about training-based approaches: training increases cortical representation of a sensory modality / skill/ etc. The cortex can differentially 'allocate' area in a use-dependent manner throughout life.
    • cortical map changes are not reflected by changes in thalamic somatotopy.


hide / edit[1] / print
ref: Recanzone-1993.01 tags: plasticity cortex learning auditory owl monkeys SUA date: 10-06-2008 22:46 gmt revision:1 [0] [head]

PMID-8423485[0] Plasticity in the frequency representation of primary auditory cortex following discrimination training in adult owl monkeys

  • Measured tonotopic organization (hence plasticity) in the owl monkey auditory cortex following training on a frequency discrimination task.
  • improvement in performance correlates with an improvement in neuronal tuning.
  • two controls:
    • monkeys that were engaged in a tactile discrimination task
    • monkeys that received the same auditory stimuli but had no reason to attend to it
  • lots of delicious behavior graphs


hide / edit[0] / print
ref: Nakahara-2001.07 tags: basal ganglia model cerebral cortex motor learning date: 10-05-2008 02:38 gmt revision:0 [head]

PMID-11506661[0] Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach.

  • Interesting model of parallel motor/visual learning, the motor through the posterior BG (the middle posterior part of the putamen) and supplementary motor areas, and the visual through the dorsolateral prefrontal cortex and the anterior BG (caudate head and rostral putamen).
  • visual tasks are learned quicker due to the simplicity of their transform.
  • require a 'coordinator' to adjust control of the visual and motor loops.
  • basal ganglia-thalamacortical loops are highly topographic; motor, oculomotor, prefrontal and limbic loops have been found.
  • pre-SMA, not the SMA, is connected to the prefrontal cortex.
  • pre-SMA receives connections from the rostral cingulate motor area.
  • used actor-critic architecture, where the critic learns to predict cumulative future rewards from state and the actor produces movements to maximize reward (motor) or transformations (sensory). visual and motor networks are actors in visual and motor representations, respectively.
  • used TD learning, where TD error is encoded via SNc.
  • more later, not finished writing (need dinner!)


hide / edit[1] / print
ref: Hikosaka-2002.04 tags: motor learning SMA basal ganglia M1 dopamine preSMA review date: 10-05-2008 02:06 gmt revision:1 [0] [head]

PMID-12015240[0] Central mechanisms of motor skill learning

  • review article.
  • neurons in the SMA become active at particular transitions in sequential movements; neurons in the pre-SMA maybe active specifically at certain rank orders in a sequence.
    • Many neurons in the preSMA were activated during learning of new sequences
  • motor skill learning is associated with coactivation of frontal and partietal cortices.
  • With practice, accuracy of performance was acquired earlier than speed of performance. interesting...
  • Striatum:
    • Reversible blockade of the anterior striatum (associative region) leads to deficits in learning new sequences
    • blockade of the posterior striatum (motor region) leads to disruptions in the execution of learned sequences
  • Cerebellum: In contrast, blockade of the dorsal part of the dentate nucleus (which is connected with M1) does not affect learning new sequences, but disrupts the performance of learned sequences. The conclude from this that long-term memories for motor skills ma be storerd in the cerebellum.
  • Doya proposed that learning in the basal ganglia and cerebellum maybe guided by error signals, as opposed to the cerebral cortex.


hide / edit[2] / print
ref: Graybiel-1994.09 tags: basal ganglia graybeil expert systems motor learning date: 10-03-2008 22:18 gmt revision:2 [1] [0] [head]

PMID-8091209[0] The basal ganglia and adaptive motor control (I couldn't find the pdf for this)

  • the basal ganglia is essentially an expert system which is trained via dopamine.


hide / edit[1] / print
ref: Dayan-2002.1 tags: actor critic pavlovian learning basal ganglia date: 10-03-2008 19:33 gmt revision:1 [0] [head]

PMID-12383782[0] Reward, motivation, and reinforcement learning.

  • criticism of the actor-critic model in the context of extensive behavioral research.
    • the critic evaluates the average future reward of given states (for the whole task - hence solving the temporal credit problem.
  • discusses temporal credit problem, which is an issue in sequential learning problems. (and nearly all learning!)
  • heheh: "For example, Hershberger, W.A., 1986. An approach through the looking glass. Anim. Learn. Behav. 14, pp. 443–451. View Record in Scopus | Cited By in Scopus (9)Hershberger (1986) trained cochral chicks to expect to find food in a specific food cup. He then arranged the situation such that if they ran toward the food cup, the cup receded at twice their approach speed whereas if they ran away from the food cup, it approached them at twice their retreat speed. As such, the chicks had to learn to run away from the distinctive food cup in order to get food. Hershberger found that the chicks were unable to learn this response in order to get the food and persisted in chasing the food away. They could, however, learn perfectly well to get the food when the cup moved away from them at only half of their approach speed."


hide / edit[1] / print
ref: Kimura-1996.12 tags: putamen globus pallidus learning basal ganglia electrophysiology projection date: 10-03-2008 17:05 gmt revision:1 [0] [head]

PMID-8985875 Neural information transferred from the putamen to the globus pallidus during learned movement in the monkey.

  • study of the physiology of the projection from the striatum to the external and internal segments of the globus pallidus.
  • Identified neurons which project from the striatum to pallidus via antridromic activation after stim to the GPe / GPi.
  • there were two classes of striatal neurons:
    • tonically active neurons (TANs, rate: 4-8hz)
      • TANs were never activated by antidromic stimulation. therefore, they probably do not project to the pallidus.
    • phasically active neurons (very low basal rate, high frequency discharge in relation to behavioral tasks
      • All PANs found projected to the globus pallidus.
      • PANs were responsive to movement or movement preparation. (or not responsive to the particular behaviors investigated)
        • the PANns that showed activity before movement initiation more frequently projected to GPi and not GPE (or both - need to look at the anatomy more).
      • PANs also show bursts of activity time-locked to the initiation of movement (e.g. time locked to a particular part of the movement).
      • no neurons with sensory response!
  • when they microstimulated in the putamen, a few pallidal neurons showed exitatory response; most showed inhibitory/supressive response.

hide / edit[3] / print
ref: Graybiel-2005.12 tags: graybiel motor_learning reinforcement_learning basal ganglia striatum thalamus cortex date: 10-03-2008 17:04 gmt revision:3 [2] [1] [0] [head]

PMID-16271465[] The basal ganglia: Learning new tricks and loving it

  • learning-related changes occur significantly earlier in the striatum than the cortex in a cue-reversal task. she says that this is because the basal ganglia instruct the cortex. I rather think that they select output dimensions from that variance-generator, the cortex.
  • dopamine agonist treatment improves learning with positive reinforcers but not learning with negative reinforcers.
  • there is a strong hyperkinetic pathway that projects directly to the subthalamic nucleus from the motor cortex. this controls output of the inhibitor pathway (GPi)
  • GABA input from the GPi to the thalamus can induce rebound spikes with precise timing. (the outputs are therefore not only inhibitory).
  • striatal neurons have up and down states. recommended action: simultaneous on-line recording of dopamine release and spike activity.
  • interesting generalization: cerebellum = supervised learning, striatum = reinforcement learning. yet yet! the cerebellum has a strong disynaptic projection to the putamen. of course, there is a continuous gradient between fully-supervised and fully-reinforcement models. the question is how to formulate both in a stable loop.
  • striosomal = striatum to the SNc
  • http://en.wikipedia.org/wiki/Substantia_nigra SNc is not an disorganized mass: the dopamergic neurons from the pars compacta project to the cortex in a topological map, dopaminergic neurons of the fringes (the lowest) go to the sensorimotor striatum and the highest to the associative striatum


hide / edit[0] / print
ref: Radhakrishnan-2008.1 tags: EMG BMI Jackson motor control learning date: 10-03-2008 16:45 gmt revision:0 [head]

PMID-18667540[0] Learning a novel myoelectric-controlled interface task.

  • EMG-controlled 2D cursor control task with variable output mapping.
  • Subjects could learn non-intuitive output transforms to a high level of performance,
  • Subjects preferred, and learned better, if hand as opposed to arm muscles were used.


hide / edit[2] / print
ref: -0 tags: differential dynamic programming machine learning date: 09-24-2008 23:39 gmt revision:2 [1] [0] [head]

excellent bibliography.

  • Jacobson, D. and Mayne, D., Differential Dynamic Programming, Elsevier, New York, 1970. in Perkins library.
  • Bertsekas, Dimitri P. Dynamic programming and optimal control Ford Library.
  • Receding horizon differential dynamic programming
    • good for high-dimensional problems. for this paper, they demonstrate control of a swimming robot.
    • webpage, including a animated gif of the swimmer
    • above is a quote from the conclusions -- very interesting!

hide / edit[5] / print
ref: Li-2001.05 tags: Bizzi motor learning force field MIT M1 plasticity memory direction tuning transform date: 09-24-2008 22:49 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-11395017[0] Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field

  • this is concerned with memory cells, cells that 'remember' or remain permanently changed after learning the force-field.
  • In the above figure, the blue lines (or rather vertices of the blue lines) indicate the firing rate during the movement period (and 200ms before); angular position indicates the target of the movement. The force-field in this case was a curl field where force was proportional to velocity.
  • Preferred direction of the motor cortical units changed when the preferred driection of the EMGs changed
  • evidence of encoding of an internal model in the changes in tuning properties of the cells.
    • this can suppor both online performance and motor learning.
    • but what mechanisms allow the motor cortex to change in this way???
  • also see [1]


hide / edit[0] / print
ref: Zhu-2003.1 tags: M1 neural adaptation motor learning date: 09-24-2008 22:17 gmt revision:0 [head]

PMID-14511525 Probing changes in neural interaction during adaptation.

  • looking at the changes in te connectivity between cells during/after motor learning.
  • convert sparse spike trains to continuous firing rates, use these as input to granger causality test
  • used the Dawn Taylor monkey task, except with push-buttons.
  • perterbed the monkey's reach trajectory with a string to a pneumatic cylinder.
  • their data looks pretty random. 9-17 neurons recorded. learning generally involves increases in interaction.
  • sponsored by DARPA
  • not a very good paper, alas.

hide / edit[2] / print
ref: Maravita-2004.02 tags: tool use monkey mirror neurons response learning date: 09-24-2008 17:02 gmt revision:2 [1] [0] [head]

PMID-15588812[0] Tools for the body schema

See also PMID-8951846[1] Coding of modified body schema during tool use by macaque postcentral neurones.


hide / edit[7] / print
ref: Fetz-2007.03 tags: hot fetz BMI biofeedback operant training learning date: 09-07-2008 18:56 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

PMID-17234689[0] Volitional control of neural activity: implications for brain-computer interfaces (part of a symposium)

  • Limits in the degree of accuracy of control in the latter studies can be attributed to several possible factors. Some of these factors, particularly limited practice time, can be addressed with long-term implanted BCIs. YES.
  • Accurate device control under diverse behavioral conditions depends significantly on the degree to which the neural activity can be volitionally modulated. YES again.
  • neurons (50%) in somatosensory (post central) cortex fire prior to volitional movement. interesting.
  • It should also be noted that the monkeys activated some motor cortex cells for operant reward without ever making any observed movements See: Fetz & Finocchio, 1975, PMID-810359.
    • Motor cortex neurons that were reliably associated with EMG activity in particular forelimb muscles could be readily dissociated from EMG when the rewarded pattern involved cell activity and muscle suppression.
    • This may be realated to switching between real and imagined movements.
  • Biofeedback worked well for activating low-threshold motor units in isolation, but not high threshold units; attempts to reverse recruitment order of motor units largely failed to demonstrate violations of the size principle.
  • This (the typical BMI decoding strategy) interposes an intermediate stage that may complicate the relationship between neural activity and the final output control of the device
    • again, in other words: "First, the complex transforms of neural activity to output parameters may complicate the degree to which neural control can be learned."
    • quote: This flexibility of internal representations (e.g. to imagine moving your arm, train the BMI on that, and rapidly directly control the arm rather than gonig through the intermediate/training step) underlies the ability to cognitively incorporate external prosthetic devices in to the body image, and explains the rapid conceptual adaptation to artificial environments, such as virtual reality or video games.
      • There is a high flexibility of input (sensory) and output (motor) for purposes of imagining / simulating movements.
  • adaptive learning algorithms may create a moving target for the robust learning algorithm; does it not make more sense to allow the cortex to work it's magic?
  • Degree of independent control of cells may be inherently contrained by ensemble interactions
    • To the extent that internal representations depend on relationships between the activities of neurons in an ensemble, processing of these representations involves corresponding constraints on the independence of those activities.
  • quote: "These factors suggest that the range and reliability of neural control in BMI might increase significantly when prolonged stable recordings are acheived and the subject can practice under consistent conditions over extended periods of time.
  • Fetz agrees that the limitation is the goddamn technology. need to fix this!
  • there is evidence of favortism in his citations (friends with Miguel??)

humm.. this paper came out a month ago, and despite the fact that he is much older and more experienced than i, we have arrived at the same conclusions by looking at the same set of data/papers. so: that's good, i guess.


hide / edit[2] / print
ref: bookmark-0 tags: language learning year french brain hack date: 09-03-2007 04:13 gmt revision:2 [1] [0] [head]

http://mirror.mricon.com/french/french.html -- "how i learned french in a year"

  • verbiste : verb conjugator for linux (Gnome)
  • When talking about software, it was FredBrooks in TheMythicalManMonth who said that people will always reinvent the wheel because it is intrinsically easier and more fun to write your own code than it is read someone else's code.

hide / edit[1] / print
ref: Francis-2005.11 tags: Joe_Francis motor_learning reaching humans delay intertrial interval date: 04-09-2007 22:48 gmt revision:1 [0] [head]

PMID-16132970[0] The Influence of the Inter-Reach-Interval on Motor Learning.

Previous studies have demonstrated changes in motor memories with the passage of time on the order of hours. We sought to further this work by determining the influence that time on the order of seconds has on motor learning by changing the duration between successive reaches (inter-reach-interval IRI). Human subjects made reaching movements to visual targets while holding onto a robotic manipulandum that presented a viscous curl field. We tested four experimental groups that differed with respect to the IRI (0.5, 5, 10 or 20 sec). The 0.5 sec IRI group performed significantly worse with respect to a learning index than the other groups over the first set of 192 reaches. Each group demonstrated significant learning during the first set. There was no significant difference with respect to the learning index between the 5, 10 or 20 sec IRI groups. During the second and third set of 192 reaches the 0.5 sec IRI group's performance became indistinguishable from the other groups indicating that fatigue did not cause the initial poor performance and that with continued training the initial deficit in performance could be overcome.


hide / edit[1] / print
ref: Kawato-1999.12 tags: kawato inverse dynamics cerebellum motor control learning date: 04-09-2007 22:45 gmt revision:1 [0] [head]

PMID-10607637[0] Internal models for motor control and trajectory planning

  • in this review, I will discuss evidence supporting the existence of internal models.
  • fast coordinated arm movement canot be executed under feedback control, as biological feedback loops are slow and have low gains. hence, the brain mostly needs to control things in a pure feedforward manner.
    • visual feedback delay is about 150-200ms.
    • fast spinal reflexes still require 30-50ms; large compared to fast movements (150ms).
    • muscle intrinsic mechanical properties produce proportional (stiffness) and derivative (viscosity) gains without delay.
    • inverse models are required for fast robotics, too. http://www.erato.atr.co.jp/DB/
  • talk about switching external force field to gauge the nature of the internal model - these types of experiments verily prove that feedforward / model-based control is happening. has anyone shown what happens neuronally during the course of this learning? I guess it might be in my datar.


hide / edit[2] / print
ref: BrashersKrug-1996.07 tags: consolidation motor learning Shadmher Bizzi date: 04-09-2007 14:35 gmt revision:2 [1] [0] [head]

PMID-8717039[0] Consolidation in human motor memory

  • tested interference between the learning of two motor skills
    • no interference if the delay between practice on each task was > 4 hours
    • this implies that some memory consolodation occurs within those 4 hours.. the same as previous work which implicated the medial temporal lobe as an important region for memory encoding.
  • found with MIT open course ware -- there are a lot of good papers referenced there.


hide / edit[0] / print
ref: AnguianoRodrAguez-2007.02 tags: serotonin learning dopamine date: 03-12-2007 02:30 gmt revision:0 [head]

PMID-17126827 Striatal serotonin depletion facilitates rat egocentric learning via dopamine modulation. facilitates - they get better! (more awake than controls? inability to forget?)

hide / edit[0] / print
ref: Shapovalova-2006.1 tags: dopamine learning neocortex rats russians D2 date: 03-12-2007 01:58 gmt revision:0 [head]

PMID-17216714 Motor and cognitive functions of the neostriatum during bilateral blocking of its dopamine receptors

  • systemic application of D1 selective blockers reduced learning in rats
    • probably this effect is not neostriatal:
  • local application of the same blocker on the cortex did not markedly affect learning, though it did effect initiation errors
  • D2 antagonist (raclopride) locally applied to the striatum blocked learning.

hide / edit[3] / print
ref: Afanasev-2004.03 tags: striatum learning reinforcement electrophysiology putamen russians date: 02-05-2007 17:33 gmt revision:3 [2] [1] [0] [head]

PMID-15151178[0] Sequential Rearrangements of the Ensemble Activity of Putamen Neurons in the Monkey Brain as a Correlate of Continuous Behavior

  • recorded 6-7 neurons in the putamen during alternative spatial selection
  • used discriminant analysis (whats that?) to analyze re-arrangements in spike activity
  • dynamics of re-arrangnement were dependent on reinforcement, and mostly contralateral striatum


hide / edit[0] / print
ref: bookmark-0 tags: book information_theory machine_learning bayes probability neural_networks mackay date: 0-0-2007 0:0 revision:0 [head]

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html -- free! (but i liked the book, so I bought it :)

hide / edit[0] / print
ref: Brown-2001.11 tags: Huntingtons motor_learning intentional implicit cognitive deficits date: 0-0-2007 0:0 revision:0 [head]

PMID-11673321 http://brain.oxfordjournals.org/cgi/content/full/124/11/2188 :

  • 16 genetically-confirmed Huntington's patients (and matched controls) trained on a task using trial and error learning (intentional), and implicit learning (unintentional).
  • the task setup was simple: they had to press one of four keys arranged in a cross (with center) either in response to commands or while guessing a sequence of a few keys.
  • Within the random, commanded task there was a sequence that could/should be noticed.
  • Huntington's patients performed worse on the intentional learning segment, but comparably on the implicit learning / implicit sequence awareness, though the latter test seems rather weak to me.

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning todorov motor_control date: 0-0-2007 0:0 revision:0 [head]

Iterative Linear Quadratic regulator design for nonlinear biological movement systems

  • paper for an international conference on informatics in control/automation/robotics

hide / edit[0] / print
ref: bookmark-0 tags: Unscented sigma_pint kalman filter speech processing machine_learning SDRE control UKF date: 0-0-2007 0:0 revision:0 [head]

hide / edit[0] / print
ref: bookmark-0 tags: STDP hebbian learning dopamine reward robot model ISO date: 0-0-2007 0:0 revision:0 [head]


  • idea: have a gating signal for the hebbian learning.
    • pure hebbian learning is unsable; it will lead to endless amplification.
  • method: use a bunch of resonators near sub-critically dampled.
  • application: a simple 2-d robot that learns to seek food. not super interesting, but still good.
  • Uses ISO learning - Isotropic sequence order learning.
  • somewhat related: runbot!

hide / edit[0] / print
ref: bookmark-0 tags: motor learning control Wolpert Ghahramani date: 0-0-2007 0:0 revision:0 [head]


  • the curse of dimensionality: there are about 600 muscles in the human body; 2^600 >> than the # of atoms in the universe! we must structure this control problem.
  • there are about 200,000 alpha motor neurons.
  • damage to parietal cortex can lead to an inability to maintain state estimates of the limb (and other objects?)
  • damage to pareital cortex can lead to and inability to mentally simulate movement with the affected hand.
  • damage to the left pareital cortex can lead to a relative inability to determine wheither viewed movements are ones own or not.
  • state prediction can reduce the effect of delays in sensorimotor feedback loops.
    • example: soleus and gastrocinemus tightent before lifting a heavy load with the arms.
  • the primate CNS models both the expected sensory feedback and represents the likelihood of the sensory feedback given the context. e.g. if people think that they are moving, they will compensate for non-existent coriolis forces.
  • ''how are we able to learn a variety of contexts?
    • when subjects try to learn two different dynamics (e.g. forward and reverse on sideskates), interference occurs when they are presented in rapid sucession, but not when they are separated by several hours.)
  • has a good list of refs.

hide / edit[0] / print
ref: bookmark-0 tags: ISO learning reflex inverse controller Porr date: 0-0-2007 0:0 revision:0 [head]

Iso learning approximates a solution to the inverse controller problem in an usupervised behavioral paradigm http://hardm.ath.cx/pdf/isolearning2002.pdf

  1. robot/actor whatever has a reflex after the presentation of a reward.
  2. the ISO learning mechanism learns to expect its own reflex -> anticipate actions, react at an appropriate time.
    1. a fixed reflex loop prevents arbitraryness by defining initial behavioral goal.
  3. iso means isotropic: all inputs are the same, and all can be used for learning.
  4. learning is proportional to the derivative of the output.
  • the central advantage of an (ideal) feed-forward controller is that it acts without the feedback-induced delay. The fatally damaging sluggishness of feedback systems makes this a highly desirable feature.
  • see figure 4 in the local paper. this basically looks like the cerebellum.. sorta. the controller takes predictive signal, and with this prior information, is able to learn the correct response to the disturbance.
  • they also include an interesting comparison to Sutton & Barto's reinforcement learning:
    • in ISO learning, the weights stabilize if a particular input condition is achieved;
    • in reinforcement learning, the weights are stabilized when a certain output condition is reached.

hide / edit[0] / print
ref: Schaal-2005.12 tags: schaal motor learning review date: 0-0-2007 0:0 revision:0 [head]

PMID-16271466 Computational Motor control in humans and robots

hide / edit[0] / print
ref: Schaal-1998.11 tags: schaal local learning PLS partial least squares function approximation date: 0-0-2007 0:0 revision:0 [head]

PMID-9804671 Constructive incremental learning from only local information

hide / edit[0] / print
ref: Nakanishi-2005.01 tags: schaal adaptive control function approximation error learning date: 0-0-2007 0:0 revision:0 [head]

PMID-15649663 Composite adaptive control with locally weighted statistical learning.

  • idea: want error-tracking plus locally-weighted peicewise linear function approximation (though , I didn't read it all that much in depth.. it is complicated)

hide / edit[0] / print
ref: Flash-2001.12 tags: Flash Sejnowski 2001 computational motor control learning PRR date: 0-0-2007 0:0 revision:0 [head]

PMID-11741014 Computational approaches to motor control. Tamar Flash and Terry Sejnowski.

  • PRR = parietal reach region
  • essential controviersies (to them):
    • the question of motor variables that are coded by neural populations.
    • equilibrium point control vs. inverse dynamics (the latter is obviously better/more correct)

hide / edit[0] / print
ref: Stefani-1995.09 tags: electrophysiology dopamine basal_ganglia motor learning date: 0-0-2007 0:0 revision:0 [head]

PMID-8539419 Electrophysiology of dopamine D-1 receptors in the basal ganglia: old facts and new perspectives.

  • D1 is inhibitory (modulatory) on striatal neurons.
  • D1 cloned in 1990
  • D1 stimulates adenyl cyclase. (cAMP)
  • D1 activity shown to be necessary, but not sufficient, to generate long-term depression in striatal slices.
  • SKF 38393 was designed as a selective D1 receptor agonist; it has been available since the late 70's; it has nanomolar affinity for D1-R. SKF 38393 inhibits action potential discharge in striatal neurons as measued through response to intracellular current depolarizations.
  • striatal cells project to the substantia nigra.
  • alternate hypothesis: D1 activation on the striatonigral afferents to the ventral tegmental area (VTA) promotes GABA release.
    • recall that the VTA projects to the frontal/prefrontal cortex (PFC) via the mesocortical dopiminergic pathway. http://grad.uchc.edu/phdfaculty/antic.html There, DA synapese on spines of distal dendrites in juxtaposition with glutamergic synapses. this guy posits that these DA synapses are involved in the pathology of schizophrenia, and he uses optical techniques to measure the DA/Glu synapses.
    • VTA is just below the red nucleus in rats.
  • some people report that SKF 38393 potentiated depolarizing membrane responses to exogenous NMDA (agonist, excitotoxin).
  • they prefer the magnesium-dependent LTD pathway.
    • D1 receptor antagonist SCH 23390 prevented the generation of LTD in striatum. (Calabresi et al 1992).
    • in DA-depleted slices, LTD could be restored by the co-administration of D1 and D2 agonists.

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning algorithm meta_algorithm date: 0-0-2006 0:0 revision:0 [head]

Boost learning or AdaBoost - the idea is to update the discrete distribution used in training any algorithm to emphasize those points that are misclassified in the previous fit of a classifier. sensitive to outliers, but not overfitting.

hide / edit[0] / print
ref: bookmark-0 tags: neural_networks machine_learning matlab toolbox supervised_learning PCA perceptron SOM EM date: 0-0-2006 0:0 revision:0 [head]

http://www.ncrg.aston.ac.uk/netlab/index.php n.b. kinda old. (or does that just mean well established?)

hide / edit[0] / print
ref: bookmark-0 tags: spiking neuron models learning SRM spike response model date: 0-0-2006 0:0 revision:0 [head]


hide / edit[0] / print
ref: bookmark-0 tags: Bayes Baysian_networks probability probabalistic_networks Kalman ICA PCA HMM Dynamic_programming inference learning date: 0-0-2006 0:0 revision:0 [head]

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html very, very good! many references, well explained too.

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning date: 0-0-2006 0:0 revision:0 [head]


A related machine learning classifier, the relevance vector machine (RVM), has recently been introduced, which, unlike SVM, incorporates probabalistic output (probability of membership) through Bayesian inference. Its decision function depends on fewer input variables that SVM, possibly allowing better classification for small data sets with high dimensionality.

  • input data here is a number of glaucoma-correlated parameters.
  • " SVM is a machine classification method that directly minimizes the classification error without requiring a statistical data model. SVM uses a kernel function to find a hyperplane that maximizes the distance (margin) between two classes (or more?). The resultant model is spares, depending only on a few training samples (support vectors).
  • The RVM has the same functional form as the SVM within a Bayesian framework. This classifier is a sparse Bayesian model that provides probabalistic predictions (e.g. probability of glaucoma based on the training samples) through bayesian inference.
    • RVM outputs probabilities of membership rather than point estimates like SVM

hide / edit[0] / print
ref: bookmark-0 tags: smith predictor motor control wolpert cerebellum machine_learning prediction date: 0-0-2006 0:0 revision:0 [head]


  • quote in reference to models in which the cerebellum works as a smith predictor, e.g. feedforward prediction of the behavior of the limbs, eyes, trunk: Motor performance based on the use of such internal models would be degraded if the model was inavailable or inaccurate. These theories could therefore account for dysmetria, tremor, and dyssynergia, and perhaps also for increased reaction times.
  • note the difference between inverse model (transforms end target to a motor plan) and inverse models 9is used on-line in a tight feedback loop).
  • The difficulty becomes one of detecting mismatches between a rapid prediction of the outcome of a movement and the real feedback that arrives later in time (duh! :)
  • good set of notes on simple simulated smith predictor performance.

hide / edit[0] / print
ref: bookmark-0 tags: machine_learning classification entropy information date: 0-0-2006 0:0 revision:0 [head]

http://iridia.ulb.ac.be/~lazy/ -- Lazy Learning.

hide / edit[0] / print
ref: abstract-0 tags: tlh24 error signals in the cortex and basal ganglia reinforcement_learning gradient_descent motor_learning date: 0-0-2006 0:0 revision:0 [head]

Title: Error signals in the cortex and basal ganglia.

Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain.