m8ta
you are not logged in, login. new entry
text: sort by
tags: modified
type: chronology
[0] Bar-Gad I, Morris G, Bergman H, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia.Prog Neurobiol 71:6, 439-73 (2003 Dec)

[0] Shuler MG, Bear MF, Reward timing in the primary visual cortex.Science 311:5767, 1606-9 (2006 Mar 17)

[0] Sergio LE, Kalaska JF, Systematic changes in directional tuning of motor cortex cell activity with hand location in the workspace during generation of static isometric forces in constant spatial directions.J Neurophysiol 78:2, 1170-4 (1997 Aug)

[0] Atallah HE, Lopez-Paniagua D, Rudy JW, O'Reilly RC, Separate neural substrates for skill learning and performance in the ventral and dorsal striatum.Nat Neurosci 10:1, 126-31 (2007 Jan)

[0] Loewenstein Y, Seung HS, Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.Proc Natl Acad Sci U S A 103:41, 15224-9 (2006 Oct 10)

[0] Peters J, Schaal S, Reinforcement learning of motor skills with policy gradients.Neural Netw 21:4, 682-97 (2008 May)

[0] Kakade S, Dayan P, Dopamine: generalization and bonuses.Neural Netw 15:4-6, 549-59 (2002 Jun-Jul)

[0] Daw ND, Doya K, The computational neurobiology of learning and reward.Curr Opin Neurobiol 16:2, 199-204 (2006 Apr)

[0] Schultz W, Multiple reward signals in the brain.Nat Rev Neurosci 1:3, 199-207 (2000 Dec)[1] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Graybiel AM, The basal ganglia: learning new tricks and loving it.Curr Opin Neurobiol 15:6, 638-44 (2005 Dec)

[0] Li CS, Padoa-Schioppa C, Bizzi E, Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field.Neuron 30:2, 593-607 (2001 May)[1] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)

[0] Boline J, Ashe J, On the relations between single cell activity in the motor cortex and the direction and magnitude of three-dimensional dynamic isometric force.Exp Brain Res 167:2, 148-59 (2005 Nov)

[0] Maier MA, Bennett KM, Hepp-Reymond MC, Lemon RN, Contribution of the monkey corticomotoneuronal system to the control of force in precision grip.J Neurophysiol 69:3, 772-85 (1993 Mar)[1] Smith AM, Hepp-Reymond MC, Wyss UR, Relation of activity in precentral cortical neurons to force and rate of force change during isometric contractions of finger muscles.Exp Brain Res 23:3, 315-32 (1975 Sep 29)

[0] Hepp-Reymond M, Kirkpatrick-Tanner M, Gabernet L, Qi HX, Weber B, Context-dependent force coding in motor and premotor cortical areas.Exp Brain Res 128:1-2, 123-33 (1999 Sep)

[0] Kalaska JF, Cohen DA, Hyde ML, Prud'homme M, A comparison of movement direction-related versus load direction-related activity in primate motor cortex, using a two-dimensional reaching task.J Neurosci 9:6, 2080-102 (1989 Jun)

[0] Georgopoulos AP, Ashe J, Smyrnis N, Taira M, The motor cortex and the coding of force.Science 256:5064, 1692-5 (1992 Jun 19)

[0] Sergio LE, Kalaska JF, Systematic changes in motor cortex cell activity with arm posture during directional isometric force generation.J Neurophysiol 89:1, 212-28 (2003 Jan)

[0] Taira M, Boline J, Smyrnis N, Georgopoulos AP, Ashe J, On the relations between single cell activity in the motor cortex and the direction and magnitude of three-dimensional static isometric force.Exp Brain Res 109:3, 367-76 (1996 Jun)

[0] Ashe J, Force and the motor cortex.Behav Brain Res 87:2, 255-69 (1997 Sep)

[0] Ostry DJ, Feldman AG, A critical evaluation of the force control hypothesis in motor control.Exp Brain Res 153:3, 275-88 (2003 Dec)

[0] Afanas'ev SV, Tolkunov BF, Rogatskaya NB, Orlov AA, Filatova EV, Sequential rearrangements of the ensemble activity of putamen neurons in the monkey brain as a correlate of continuous behavior.Neurosci Behav Physiol 34:3, 251-8 (2004 Mar)

{1407}
hide / edit[0] / print
ref: -0 tags: tissue probe neural insertion force damage wound speed date: 06-02-2018 00:03 gmt revision:0 [head]

PMID-21896383 Effect of Insertion Speed on Tissue Response and Insertion Mechanics of a Chronically Implanted Silicon-Based Neural Probe

  • Two speeds, 10um/sec and 100um/sec, monitored out to 6 weeks.
  • Once the probes were fully advanced into the brain, we observed a decline in the compression force over time.
    • However, the compression force never decreased to zero.
    • This may indicate that chronically implanted probes experience a constant compression force when inserted in the brain, which may push the probe out of the brain over time if there is nothing to keep it in a fixed position.
      • Yet ... the Utah probe seems fine, up to many months in humans.
    • This may be a drawback for flexible probes [24], [25]. The approach to reduce tissue damage by reducing micromotion by not tethering the probe to the skull can also have this disadvantage [26]. Furthermore, the upward movement may lead to the inability of the contacts to record signals from the same neurons over long periods of time.
  • We did not observe a difference in initial insertion force, amount of dimpling, or the rest force after a 3-min rest period, but the force at the end of the insertion was significantly higher when inserting at 100 μm/s compared to 10 μm/s.
  • No significant difference in histological response observed between the two speeds.

{1406}
hide / edit[0] / print
ref: -0 tags: insertion speed needle neural electrodes force damage injury cassanova date: 06-01-2018 23:51 gmt revision:0 [head]

Effect of Needle Insertion Speed on Tissue Injury, Stress, and Backflow Distribution for Convection-Enhanced Delivery in the Rat Brain

  • Tissue damage, evaluated as the size of the hole left by the needle after retraction, bleeding, and tissue fracturing, was found to increase for increasing insertion speeds and was higher within white matter regions.
    • A statistically significant difference in hole areas with respect to insertion speed was found.
  • While there are no previous needle insertion speed studies with which to directly compare, previous electrode insertion studies have noted greater brain surface dimpling and insertion forces with increasing insertion speed [43–45]. These higher deformation and force measures may indicate greater brain tissue damage which is in agreement with the present study.
  • There are also studies which have found that fast insertion of sharp tip electrodes produced less blood vessel rupture and bleeding [28,29].
    • These differences in rate dependent damage may be due to differences in tip geometry (diameter and tip) or tissue region, since these electrode studies focus mainly on the cortex [28,29].
    • In the present study, hole measurements were small in the cortex, and no substantial bleeding was observed in the cortex except when it was produced during dura mater removal.
    • Any hemorrhage was observed primarily in white matter regions of the external capsule and the CPu.

{1405}
hide / edit[2] / print
ref: -0 tags: insertion speed neural electrodes force damage date: 06-01-2018 23:38 gmt revision:2 [1] [0] [head]

In vivo evaluation of needle force and friction stress during insertion at varying insertion speed into the brain

  • Targeted at CED procedures, but probably applicable elsewhere.
  • Used a blunted 32ga CA glue filled hypodermic needle.
  • Sprague-dawley rats.
  • Increased insertion speed corresponds with increased force, unlike cardiac tissue.
  • Greatuer surface dimpling before failure results in larger regions of deformed tissue and more energy storage before needle penetration.
  • In this study (blunt needle) dimpling increased with insertion speed, indicating that more energy was transferred over a larger region and increasing the potential for injury.
  • However, friction stresses likely decrease with insertion speed since larger tissue holes were measured with increasing insertion speeds indicating lower frictional stresses.
    • Rapid deformation results in greater pressurization of fluid filled spaces if fluid does not have time to redistribute, making the tissue effectively stiffer. This may occur in compacted tissues below or surrounding the needle and result in increasing needle forces with increasing needle speed.

{1333}
hide / edit[6] / print
ref: -0 tags: deep reinforcement learning date: 04-12-2016 17:19 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

Prioritized experience replay

  • In general, experience replay can reduce the amount of experience required to learn, and replace it with more computation and more memory – which are often cheaper resources than the RL agent’s interactions with its environment.
  • Transitions (between states) may be more or less
    • surprising (does the system in question have a model of the environment? It does have a model of the state & action expected reward, as it's Q-learning.
    • redundant, or
    • task-relevant
  • Some sundry neuroscience links:
    • Sequences associated with rewards appear to be replayed more frequently (Atherton et al., 2015; Ólafsdóttir et al., 2015; Foster & Wilson, 2006). Experiences with high magnitude TD error also appear to be replayed more often (Singer & Frank, 2009 PMID-20064396 ; McNamara et al., 2014).
  • Pose a useful example where the task is to learn (effectively) a random series of bits -- 'Blind Cliffwalk'. By choosing the replayed experiences properly (via an oracle), you can get an exponential speedup in learning.
  • Prioritized replay introduces bias because it changes [the sampled state-action] distribution in an uncontrolled fashion, and therefore changes the solution that the estimates will converge to (even if the policy and state distribution are fixed). We can correct this bias by using importance-sampling (IS) weights.
    • These weights are the inverse of the priority weights, but don't matter so much at the beginning, when things are more stochastic; they anneal the controlling exponent.
  • There are two ways of selecting (weighting) the priority weights:
    • Direct, proportional to the TD-error encountered when visiting a sequence.
    • Ranked, where errors and sequences are stored in a data structure ordered based on error and sampled 1/rank .
  • Somewhat illuminating is how the deep TD or Q learning is unable to even scratch the surface of Tetris or Montezuma's Revenge.

{1218}
hide / edit[3] / print
ref: -0 tags: silicon electrode histology Michigan tip shape shear force date: 04-24-2013 20:02 gmt revision:3 [2] [1] [0] [head]

PMID-1601445 Factors influencing the biocompatibility of insertable silicon microshafts in cerebral cortex.

  • Relatively early assessment of tissue reaction to silicon electrodes.
  • Noted 'severe' reaction at electrode tip; recommend recording along the shaft, Michigan style.
  • Noted microhematoma formation.
  • Recommend fast insertion.
  • Bending of the shafts (e.g. they exert lateral force) causes lateral tissue damage.
    • Problem with fast insertion is that it may cause the needle to bend a bit -- resulting in lateral 'kill zone'.
    • Ultimate speed must be a compromise.
  • Advocate shearing blade tip or chisel point to sever microtubules, rather than a conical tip pushing them to a annular ring that can grab to the sides of the needle.
  • Good paper, reviews the relevant cellular anatomy...

{1202}
hide / edit[0] / print
ref: -0 tags: saccarose sugar sweet electrode implantation force germany date: 01-24-2013 21:46 gmt revision:0 [head]

PMID-22254391 Chronic intracortical implantation of saccharose-coated flexible shaft electrodes into the cortex of rats.

  • measured forces of about 6mN inserting the 75um diameter saccharose-coated electrode.
    • Individual wires were 40um in diameter.
  • Limited longitudinal histology or electrophysiology

{1169}
hide / edit[0] / print
ref: -0 tags: artificial intelligence projection episodic memory reinforcement learning date: 08-15-2012 19:16 gmt revision:0 [head]

Projective simulation for artificial intelligence

  • Agent learns based on memory 'clips' which are combined using some pseudo-bayesian method to trigger actions.
    • These clips are learned from experience / observation.
    • Quote: "..more complex behavior seems to arise when an agent is able to “think for a while” before it “decides what to do next.” This means the agent somehow evaluates a given situation in the light of previous experience, whereby the type of evaluation is different from the execution of a simple reflex circuit"
    • Quote: "Learning is achieved by evaluating past experience, for example by simple reinforcement learning".
  • The forward exploration of learned action-stimulus patterns is seemingly a general problem-solving strategy (my generalization).
  • Pretty simple task:
    • Robot can only move left / right; shows a symbol to indicate which way it (might?) be going.

{58}
hide / edit[4] / print
ref: bookmark-0 tags: basal ganglia dopamine reinforcement learning Graybeil date: 03-06-2012 18:14 gmt revision:4 [3] [2] [1] [0] [head]

PMID-16271465 The basal ganglia: learning new tricks and loving it

  • BG analogous to the anterior forebrain pathway (AFP), which is necessary for song learning in young birds. Requires lots of practice and feedback. Studies suggest e.g. that neural activity in the AFP is correlated with song variability, and that the AFP can adjust ongoing activity in effector motor pathways.
    • LMAN = presumed homolog of cortex that receives basal ganglia outflow. Blockade of outflow from LMAN to RA creates stereotyped singing.
  • To see accurately what is happening, it's necessary to record simultaneously, or in close temporal contiguity, striatal and cortical neurons during learning.
    • Pasupathy and biller showed that changes occur in the striatum than cortex during learning.
  • She cites lots of papers -- there has been a good bit of work on this, and the theories are coming together. I should be careful not to dismiss or negatively weight things.
  • Person and Perkel [48] reports that in songbirds, the analogous GPi to thalamus pathway induces IPSPs as well as rebound spikes with highly selective timing.
  • Reference Levenesque and Parent PMID-16087877 who find elaborate column-like arrays of striatonigral terminations in the SNr, not in the dopamine-containing SNpc.

{1144}
hide / edit[2] / print
ref: -0 tags: dopamine reinforcement learning funneling reduction basal ganglia striatum DBS date: 02-28-2012 01:29 gmt revision:2 [1] [0] [head]

PMID-15242667 Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia

  • Major attributes of the BG:
    • Numerical reduction in the number of neurons across layers of the 'feed forward' (wrong!) network,
    • lateral inhibitory connections within the layers
    • modulatory effects of dopamine and acetylcholine.
  • Stochastic decision making task in monkeys.
  • Dopamine and ACh deliver different messages. DA much more specific.
  • Output nuclei of BG show uncorrelated activity.
    • THey see this as a means of compression -- more likely it is a training signal.
  • Striatum:
    • each striatal projection neuron receives 5300 cortico-striatal synapses; the dendritic fields of same contains 4e5 axons.
    • Say that a typical striatal neuron is spherical (?).
    • Striatal dendritic tree is very dense, whereas pallidal dendritic tree is sparse, with 4 main and 13 tips.
    • A striatal axon provides 240 synapses in the pallidum and makes 10 contacts with one pallidal neuron on average.
  • I don't necessarily disagree with the information-compression hypothesis, but I don't disagree either.
    • Learning seems a more likely hypothesis; could be that we fail to see many effects due to the transient nature of the signals, but I cannot do a thorough literature search on this.

PMID-15233923 Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons.

  • Same task as above.
  • both ACh (putatively, TANs in this study) and DA neurons respond to reward related events.
  • dopamine neurons' response reflects mismatch between expectation and outcome in the positive domain
  • TANs are invariant to reward predictability.
  • TANs are synchronized; most DA neurons are not.
  • Striatum displays the densest staining in the CNS for dopamine (Lavoie et al 1989) and ACh (Holt et al 1997)
    • Depression of striatal acetylcholine can be used to treat PD (Pisani et al 2003).
    • Might be a DA/ ACh balance problem (Barbeau 1962).
  • Deficit of either DA or ACh has been shown to disrupt reward-related learning processes. (Kitabatake et al 2003, Matsumoto 1999, Knowlton et al 1996).
  • Upon reward, dopaminergic neurons increase firing rate, whereas ACh neurons pause.
  • Primates show overshoot -- for a probabalistic relative reward, they saturate anything above 0.8 probability to 1. Rats and pigeons do not show this effect (figure 2 F).

{1076}
hide / edit[6] / print
ref: Heimer-2006.01 tags: STN DBS synchrony basal ganglia reinforcement learning beta date: 02-22-2012 17:07 gmt revision:6 [5] [4] [3] [2] [1] [0] [head]

PMID-17017503[0] Synchronizing activity of basal ganglia and pathophysiology of Parkinson's disease.

  • They worry that increased synchrony may be an epi-phenomena of tremor or independent oscillations with similar frequency.
  • Modeling using actor/critic models of the BG.
  • Dopamine depletion, as in PD, resultis in correlated pallidal activity, and reduced information capacity.
  • Other studies have found that DBS desynchronizes activity -- [1] or [2].
  • Biochemical and metabolic studies show that GPe activity does not change in Parkinsonism.
  • Pallidal neurons in normal monkeys do not show correlated discharge (Raz et al 2000, Bar-Gad et al 2003a).
  • Reinforcement driven dimensionality reduction (RDDR) (Bar-Gad et al 2003b).
  • DA activity, through action on D1 and D2 receptors on the 2 different types of MSN, affects the temporal difference learning scheme in which DA represents the difference between expectation and reality.
    • These neurons have a static 5-10 Hz firing rate, which can be modulated up or down. (Morris et al 2004).
  • "The model suggests that the chronic dopamine depletion in the striatum of PD patients is perceived as encoding a continuous state where reality is worse than predictions." Interesting theory.
    • Alternately, abnormal DA replacement leads to random organization of the cortico-striatal network, eventually leading to dyskinesia.
  • Recent human studies have found oscillatory neuronal correlation only in tremulous patients and raised the hypothesis that increased neuronal synchronization in parkinsonism is an epi-phenomenon of the tremor of independent oscillators with the same frequency (Levy et al 2000).
    • Hum. might be.
  • In rhesus and green monkey PD models, a major fraction of the primate pallidal cells develop both oscillatory and non-oscillatory pair-wise correlation
  • Our theoretical analysis of coherence functions revealed that small changes between oscillation frequencies results in non-significant coherence in recording sessions longer than 10 minutes.
  • Their theory: current DBS methods overcome this probably by imposing a null spatio-temporal firing in the basal ganglia enabling the thalamo-cortical circuits to ignore and compensate for the problematic BG".

____References____

[0] Heimer G, Rivlin M, Israel Z, Bergman H, Synchronizing activity of basal ganglia and pathophysiology of Parkinson's disease.J Neural Transm Suppl no Volume :70, 17-20 (2006)
[1] Kühn AA, Williams D, Kupsch A, Limousin P, Hariz M, Schneider GH, Yarrow K, Brown P, Event-related beta desynchronization in human subthalamic nucleus correlates with motor performance.Brain 127:Pt 4, 735-46 (2004 Apr)
[2] Goldberg JA, Boraud T, Maraton S, Haber SN, Vaadia E, Bergman H, Enhanced synchrony among primary motor cortex neurons in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine primate model of Parkinson's disease.J Neurosci 22:11, 4639-53 (2002 Jun 1)

{88}
hide / edit[9] / print
ref: Fellows-2006.04 tags: parkinsons subthalamic nucleus thalamus DBS STN force velocity overshoot grasp date: 02-22-2012 14:51 gmt revision:9 [8] [7] [6] [5] [4] [3] [head]

PMID-16549385[0] The effect of subthalamic nucleus deep brain stimulation on precision grip abnormalities in Parkinson's disease

  • Deep Brain stimulation improves mobility/dexterity and dyskinesia of patients in general, via an increase in rate and decrease in reaction time, but it does not let the patient match force output to the object being manipulated (that is, the force is too large).
  • The excessive levels of grip force present in the stimulation 'off' state, and present from the early stages of the disease, however, were even more marked with STN stimulation on.
    • STN DBS may worsen the ability to match force characteristics to task requirements. (position control is improved?).
    • quite fascinating.

See also PMID-19266149[1] Distal and proximal prehension is differentially affected by Parkinson‘s disease The effect of conscious and subconscious load cues

  • asked PD and control patients to lift heavy and light objects.
  • While controls were able to normalize lift velocity with the help of both conscious and subconscious load cues, the PD patients could use neither form of cue, and retained a pathological overshoot in lift velocity.
  • Hence force control is remarkably affected in PD, which is consistent with the piper rhythm being absent / usually present for isometric contraction.

____References____

[0] Fellows SJ, Kronenbürger M, Allert N, Coenen VA, Fromm C, Noth J, Weiss PH, The effect of subthalamic nucleus deep brain stimulation on precision grip abnormalities in Parkinson's disease.Parkinsonism Relat Disord 12:3, 149-54 (2006 Apr)
[1] Weiss PH, Dafotakis M, Metten L, Noth J, Distal and proximal prehension is differentially affected by Parkinson's disease. The effect of conscious and subconscious load cues.J Neurol 256:3, 450-6 (2009 Mar)

{843}
hide / edit[1] / print
ref: Zaghloul-2009.03 tags: DBS STN reinforcement learning humans unexpected reward Baltuch date: 01-26-2012 18:19 gmt revision:1 [0] [head]

PMID-19286561[0] Human Substantia Nigra Neurons Encode Unexpected Financial Rewards

  • direct, concise.
  • 15 neurons in 11 patients -- we have far more!

____References____

[0] Zaghloul KA, Blanco JA, Weidemann CT, McGill K, Jaggi JL, Baltuch GH, Kahana MJ, Human substantia nigra neurons encode unexpected financial rewards.Science 323:5920, 1496-9 (2009 Mar 13)

{1085}
hide / edit[2] / print
ref: Parush-2011.01 tags: basal ganglia reinforcement learning hypothesis frontiers israel date: 01-24-2012 04:05 gmt revision:2 [1] [0] [head]

PMID-21603228[0] Dopaminergic Balance between Reward Maximization and Policy Complexity.

  • model complexity discounting is an implicit thing.
    • the basal ganglia aim at optimization of independent gain and cost functions. Unlike previously suggested single-variable maximization processes, this multi-dimensional optimization process leads naturally to a softmax-like behavioral policy
  • In order for this to work:
    • dopamine directly affects striatal excitability and thus provides a pseudo-temperature signal that modulates the tradeoff between gain and cost.

____References____

[0] Parush N, Tishby N, Bergman H, Dopaminergic Balance between Reward Maximization and Policy Complexity.Front Syst Neurosci 5no Issue 22 (2011)

{255}
hide / edit[3] / print
ref: BarGad-2003.12 tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus date: 01-16-2012 19:18 gmt revision:3 [2] [1] [0] [head]

PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)

  • long paper! looks like they used latex.
  • they focus on a 'new model' for the basal ganglia: reinforcement driven dimensionality reduction (RDDR)
  • in order to make sense of the system - according to them - any model must ingore huge ammounts of information about the studied areas.
  • ventral striatum = nucelus accumbens!
  • striatum is broken into two, rough, parts: ventral and dorsal
    • dorsal striatum: the caudate and putamen are a part of the
    • ventral striatum: the nucelus accumbens, medial and ventral portions of the caudate and putamen, and striatal cells of the olifactory tubercle (!) and anterior perforated substance.
  • ~90 of neurons in the striatum are medium spiny neurons
    • dendrites fill 0.5mm^3
    • cells have up and down states.
      • the states are controlled by intrinsic connections
      • project to GPe GPi & SNr (primarily), using GABA.
  • 1-2% of neurons in the striatum are tonically active neurons (TANs)
    • use acetylcholine (among others)
    • fewer spines
    • more sensitive to input
    • TANs encode information relevant to reinforcement or incentive behavior

____References____

{905}
hide / edit[1] / print
ref: Wyler-1979.09 tags: operant control reinforcement schedule Wyler Robbins date: 01-07-2012 22:09 gmt revision:1 [0] [head]

PMID-114271[0] Operant control of precentral neurons: the role of reinforcement schedules.

  • Tried 3 different rewarding schedules:
    • Reward when the ISI was within a window 30-60ms
    • Differential reward, +2 or +3 when ISI was 45-60ms, +1 when 30-45ms
    • Nonspecific, constant applesauce reward.
  • No change in the mode of the ISI was observed, independent of reward schedules.

____References____

[0] Wyler AR, Robbins CA, Operant control of precentral neurons: the role of reinforcement schedules.Brain Res 173:2, 341-3 (1979 Sep 14)

{959}
hide / edit[3] / print
ref: -0 tags: Evarts force pyramidal tract M1 movement monkeys conduction velocity tuning date: 01-03-2012 03:25 gmt revision:3 [2] [1] [0] [head]

PMID-4966614 Relation of pyramidal tract activity to force exerted during voluntary movement.

  • One of the pioneering studies of electrophysiology in awake behaving animals; single electrode juice reward headposting: many followed.
  • {960} looked at conduction velocity, which we largely ignore now -- most highly mylenated axons are silent during motor quiescence and show phasic activity during movement.
    • Lower conduction velocity PTNs show + and - FR modulations. Again from [5]
  • [6] showed that PTN activity preceded EMG activity, implying that it was efferent rather than afferent feedback that was controlling the fr. as expected.
  • task: wrist flexion & extension under load.
  • task in monkey's home cage for a period of three months; monkeys carried out 3000 trials or more of the task (must have had strong wrists!)
  • Head fixated the monkeys for about 10 days prior unit recordings; "The monkeys learned to be quite cooperative in reentering the chair in the morning, since entrance to the chair was rewarded by the fruit juice of their choice (grape, apple, or orange). Indeed, some monkeys continued to work even in the presence of free water!
    • Maybe I should give mango some Hawaiian punch as well?
  • Mesured antidromic responses with a permanent electrode in the ipsilateral medullary pyramid.
  • Used glass insulated platinum-iridium electrodes [11]
  • traces are clean, very clean. I wonder if good insulation (in this case, glass) has anything to do with it?
  • controlled for displacement by varying the direction of load; PTNs seem to directly control muscles.
    • Fire during acceleration and movement for no load
    • Fire during load and co-contraction when loaded.
  • FR also related to δF/δt : FR higher during a low but rising force than a high but falling force.
  • more than 100 PTN recorded from the precentral gyrus, but only 31' had clear and consistent relation to performance on the task.
    • 16 units on extension loads, 7 units flexion loads
    • It was only one joint afterall..
  • Cells responding to the same movement (flexion or extension) were often founf on the same vertical electrode tract.
  • Very little response to joint position.
  • Very clean moculations -- neurons are almost silent if there is no force production; FR goes up to 50-80Hz.
  • Prior to the exp Evart expected a position tuning model, but saw clear evidence of force tuning.
  • Group 1 muscle afferents have now been shown to project to the motor cortex of both monkey [1] and cat [9]. Make sense, as if the ctx is to control force, it needs feedback regarding its production.
  • Caveats: many muscles were involved in the study, mainly due to postural effects, and having one or two controls poorly delineates what is going on in the motor ctx.
    • Plus, all the muscles controlling the figers come into play -- the manipulandum must be gripped firmly, esp to resist extension loads.

{788}
hide / edit[1] / print
ref: -0 tags: reinforcement learning basis function policy specialization date: 01-03-2012 02:37 gmt revision:1 [0] [head]

To read:

{630}
hide / edit[4] / print
ref: Shuler-2006.03 tags: reward V1 visual cortex timing reinforcement surprising date: 01-03-2012 02:33 gmt revision:4 [3] [2] [1] [0] [head]

PMID-16543459[0] Reward Timing in the Primary Visual Cortex

  • the responses of a substantial fraction of neurons in the primary visual cortex evolve from those that relate solely to the physical attributes of the stimuli to those that accurately predict the timing of reward.. wow!
  • rats. they put goggles on the rats to deliver full-fields retinal illumination for 400ms (isn't this cheating? full field?)
  • recorded from deep layers of V1
  • sensory processing does not seem to be reliable, stable, and reproducible...
  • rewarded only half of the trials, to see if the plasticity was a result of reward delivery or association of stimuli and reward.
  • after 5-7 sessions of training, neurons began to respond to the poststimulus reward time.
  • this was actually independent of reward delivery - only dependent on the time.
  • reward-related activity was only driven by the dominant eye.
  • individual neurons predict reward time quite accurately. (wha?)
  • responses continued even if the animal was no longer doing the task.
  • is this an artifact? of something else? what's going on? the suggest that it could be caused by subthreshold activity due to recurrent connections amplified by dopamine.

____References____

{336}
hide / edit[1] / print
ref: Sergio-1997.08 tags: M1 force tuning kinematics dynamics Kalaska date: 01-03-2012 02:31 gmt revision:1 [0] [head]

PMID-9307146[0] Systematic changes in directional tuning of motor cortex cell activity with hand location in the workspace during generation of static isometric forces in constant spatial directions.

  • The discharge rate of all proximal-arm M1 cells was affected by both hand location and by the direction of static force. w/ interaction between force direction and hand location.
    • this is consistent with cortical units controlling muscle activity directly or through the spinal cord.
  • conclusion: M1 controls muscles directly and contributes to the transformation from extrinsic coordinates to muscle activations while coordinating limb movements.

____References____

{623}
hide / edit[5] / print
ref: Shulgina-1986.09 tags: reinforcement learning review date: 01-03-2012 02:31 gmt revision:5 [4] [3] [2] [1] [0] [head]

Reinforcement learning in the cortex (a web scour/crawl):

  • http://www.springerlink.com/content/v211201413228x34/
    • short/long interspike intervals via pain reinforcement in immobilized rabbits.
  • PMID-3748636 Increased regularity of activity of cortical neurons in learning due to disinhibitory effect of reinforcement.
    • more rabbit shocking.
  • http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6T0F-3S1PT00-P
    • applied glutamate & noradrenaline; both responses are complex.
  • Reinforcement learning in populations of spiking neurons
    • the result: reinforcement learning can function effectively in large populations of neurons if there is a trace of the population activity in addition to the reinforcement signal. this trace must be per-synapes or perhaps per-neuron (as has been anticipated for some time). very important result, helps with the 'specificity' problem.
    • in human terms, the standard reinforcement learning approach is analogous to having a class of students write an exam and being informed by the teacher on the next day whether the majority of students passed or not.
    • this learning method is slow and achieves limited fidelity; in contrast, behavioral reinforcement learning can be reliable and fast. (perhaps this is a result of already-existing maps and or activity in the cortex?)
    • reinforcement learning is almost the opposite of backpropagation, in that in backprop, a error signal is computed per neuron, while in reinforcement learning the error is only computed for the entire system. They posit that there must be a middle ground (need something less than one neuron to compute the training/error signal per neuron, othewise the system would not be very efficient...)
    • points out a good if obvious point: to learn from trial and error different responses to a given stimulus must be explored, and, for this, randomness in the neural activities provides a convenient mechanism.
    • they use the running mean as an eligibility trace per synapse. then change in weight = eta * eligibility trace(t), evaluated at the ends of trials.
    • implemented an asymmetric rule that updates the synapses only slightly if the output is reliable and correct.
    • also needed a population signal or fed-back version of the previous neural behavior. Then individual reinforcement is a product of the reinforcement signal * the population signal * the eligibility trace (the last per synapse). Roughly, if the population signal is different than the eligability trace, and the behavior is wrong, then that synapse should be reinforced. and vice-versa.
  • PMID-17444757 Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity.
    • seems to give about the same result as above, except with STDP: reinforcement-modulated STDP with an eligibility trace stored at each synapse permits learning even if a reward signal is delayed.
    • network can learn XOR problem with firing-rate or temporally coded input.
    • they want someone to look for reward-moduled STDP. paper came out June 2007.
  • PMID: Metaplasticity: the plasticity of synaptic plasticity (1996, Mark Bear)
    • there is such thing as metaplasticity! (plasticity of plasticity, or control over how effective NMDAR are..)
    • he has several other papers on this topic after this..
  • PMID-2682404 Reward or reinforcement: what's the difference? (1989)
    • reward = certain environmental stimuli have the effect of eliciting approach responses. ventral striatum / nucleus accumbens is instrumental for this.
    • reinforcement = the tendency of certain stimuli to strengthen stimulus-response tendencies. dorsolateral striatum is used here.
  • PMID-9463469 Rapid plasticity of human cortical movement representation induced by practice.
  • used TMS to evoke isolated and directionally consistent thumb movements.
  • then asked the volunteers to practice moving their thumbs in an opposite direction
  • after 5-30 minutes of practice, then TMS evoked a response in the practiced direction. wow! this may be short-term memory or the first step in skill learning.
  • PMID-12736341 Learning input correlations through nonlinear temporally asymmetric Hebbian plasticity.
    • temporally asymmetric plasticity is apparently required for a stable network (aka no epilepsy?), and can be optimized to represent the temporal structure of input correlations.

{5}
hide / edit[3] / print
ref: bookmark-0 tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning date: 12-31-2011 19:30 gmt revision:3 [2] [1] [0] [head]

hunch.net interesting posts:

  • debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid
    • quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.
  • automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.
  • The design of an optimal research environment
    • Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
  • More recently: http://hunch.net/?p=2016
    • Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.
  • The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.
    • The motivation (money!) is just not there.

{612}
hide / edit[3] / print
ref: Atallah-2007.01 tags: striatum skill motor learning VTA substantia nigra basal ganglia reinforcement learning date: 12-31-2011 18:59 gmt revision:3 [2] [1] [0] [head]

PMID-17187065[0] Separate neural substrates for skill learning and performance in the ventral and dorsal striatum.

  • good paper. via SCLin's blog. slightly confusing anatomical terminology.
  • tested in rats, which has a anatomically different basal ganglia system than primates.
  • Rats had to choose which driection in a Y maze based on olfactory cues. Normal rats figure it out in 60 trials.
  • ventral striatum (nucleus accumbens here in rats) connects to the ventral prefrontal cortices (for example, the orbitofrontal cortex)
    • in primates, includes the medial caudate, which has been shown in fMRI to respond to reward prediction error. Neural activity in the caudate is attenuated when a monkey reaches optimal performance.
  • dorsal parts of the striatum (according to web: caudate, putamen, globus pallidus in primates) connect to the dorsal prefrontal and motor cortices
    • (according to them:) this corresponds to the putamen in primates. Activity in the putamen reflects performance but not learning.
    • activity in the putamen is highest after successful learning & accurate performance.
  • used muscimol (GABAa agonist, silences neural activity) and AP-5 (blocks NMDA based plasticity), in each of the target areas.
  • dorsal striatum is involved in performance but not learning
    • Injection of muscimol during acquisition did not impair test performance
    • Injection of muscimol during test phase did impair performance
    • Injection of AP-5 during acquisition had no effect.
    • in acquisition sessions, muscimol blocked instrumental response (performance); but muscimol only has a small effect when it was injected after rats perfected the task.
      • Idea: consistent behavior creates a stimulus-response association in extrastriatal brain areas, e.g. cerebral cortex. That is, the basal ganglia is the reinforcement signal, the cortex learns the association due to feedback-driven behavior? Not part of the habit system, but make and important contribution to goal-directed behavior.
      • This is consistent with the observation that behavior is initially goal driven but is later habitual.
    • Actually, other studies show that plasticity in the dorsal striatum may be detrimental to instrumental learning.
    • The number of neurons that fire just before the execution of a response is larger in the putamen than the caudate.
  • ventral striatum is involved in learning and performance.
    • Injection of AP-5 or muscimol during acquisition (learning behavior) impairs test performance.
    • Injection of AP-5 during test performance has no effect , but muscimol impairs performance.
  • Their data support an actor-director-critic architecture of the striatum:
    • Actor = dorsal striatum; involved in performance, but not in learning them.
    • Director = ventral striatum; quote "it somehow learns the relevant task demands and directs the dorsal striatum to perform the appropriate action plans, but, crucially, it does not train the dorsal striatum"
      • ventrai striatum acts through the orbitofrontal cortex that mantains representations of task-reward contingencies.
      • ventral striatum might also select action selection through it's projections to the substantia nigra.
    • Critic = dopaminergic inputs from the ventral tegmental area and substantia nigra.

____References____

{964}
hide / edit[5] / print
ref: OLDS-1954.12 tags: Olds Milner operant conditioning electrical reinforcement wireheading BMI date: 12-29-2011 05:09 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-13233369[0] Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain.

  • The original electrical reinforcement experiment!
  • tested out various areas for reinforcement; septal forebrain area was the best.
  • later work: 1956 Olds, J. Runway and maze behavior controlled by basomedial forebrain stimulation in the rat. J. Comp. Physiol. Psychol. 49:507-12.

____References____

[0] OLDS J, MILNER P, Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain.J Comp Physiol Psychol 47:6, 419-27 (1954 Dec)

{194}
hide / edit[1] / print
ref: Schultz-1998.07 tags: dopamine reward reinforcement_learning review date: 12-07-2011 04:16 gmt revision:1 [0] [head]

PMID-9658025[0] Predictive reward signal of dopamine neurons.

  • hot article.
  • reasons why midbrain Da is involved in reward: lesions, receptor blocking, electrical self-stimulation, and drugs of abuse.
  • DA neurons show phasic response to both primary reward and reward-predicting stimul.
  • 'All responses to rewards and reward-predicting stimuli depend on event predictability.
  • Just think of the MFB work with the rats... and how powerful it is.
  • most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal (e.g. parkinsons, huntingtons) -> implying that DA has two uses: the labeling of reward, that the tonic enabling of postsynaptic neurons.
    • I just anticipated this, which is good :)
    • It is still a mystery how the neurons in the midbrain determine to fire - the pathways between reward and behavior must be very carefully segregated, otherwise we would be able to self-simulate
      • the pure expectation part of it is bound play a part in this - if we know that a certain event will be rewarding, then the expectation will diminish DA release.
  • predictive eye movements amerliorate behavioral perfromance through advance focusing. (interesting)
  • predictions are used in industry:
    • Internal Model Control is used in industry to predict future system states before they actually occur. for example, the fly-by-wire technique in aviation makes decisions to do particular manuvers based on predictable forthcoming states of the plane. (Like a human)
  • if you learn a reaction/reflex based on a conditioned stimulus, the presentation of that stimulus sets the internal state to that motivated to achieve the primary reward. there is a transfer back in time, which, generally, is what neural systems are for.
  • animals avoid foods that fail to influence important plasma/brain parameters, for example foods lacking essential amino acids like histidine, threonine, or methionine. In the case of food, the appearance/structure would be used to predict the slower plasma effects, and hence influence motivation to eat it. (of course!)
  • midbrain groups:
    • A8 = dorsal to lateral substantia nigra
    • A9 = pars compacta of substantia nigra, SNc
    • A10 = VTA, media to substantia nigra.
  • The characteristic polyphasic, relatively long impulses discharged at low frequencies make dpamine neurons easily distinguishable from other midbrain neurons.

____References____

[0] Schultz W, Predictive reward signal of dopamine neurons.J Neurophysiol 80:1, 1-27 (1998 Jul)

{323}
hide / edit[4] / print
ref: Loewenstein-2006.1 tags: reinforcement learning operant conditioning neural networks theory date: 12-07-2011 03:36 gmt revision:4 [3] [2] [1] [0] [head]

PMID-17008410[0] Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity

  • The probability of choosing an alternative in a long sequence of repeated choices is proportional to the total reward derived from that alternative, a phenomenon known as Herrnstein's matching law.
  • We hypothesize that there are forms of synaptic plasticity driven by the covariance between reward and neural activity and prove mathematically that matching (alternative to reward) is a generic outcome of such plasticity
    • models for learning that are based on the covariance between reward and choice are common in economics and are used phenomologically to explain human behavior.
  • this model can be tested experimentally by making reward contingent not on the choices, but rather on the activity of neural activity.
  • Maximization is shown to be a generic outcome of synaptic plasticity driven by the sum of the covariances between reward and all past neural activities.

____References____

{914}
hide / edit[1] / print
ref: Gandolfo-2000.02 tags: Gandolfo Bizzi dynamic environment force fields learning motor control MIT M1 date: 12-02-2011 00:10 gmt revision:1 [0] [head]

PMID-10681435 Cortical correlates of learning in monkey adapting to a new dynamical environment.

{795}
hide / edit[1] / print
ref: work-0 tags: machine learning reinforcement genetic algorithms date: 10-26-2009 04:49 gmt revision:1 [0] [head]

I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other:

1. Gradient descent / Newton-Rhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semi-directed search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before - if it has, edit this to indicate where!

2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that - optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**)) -- we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, cross-validated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an in-depth talk about information generation / destruction in non-linear systems.)

Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of sub-goals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great - the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?)

To my limited knowledge, there has been some work as of recent in the creation of sub-goals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of sub-goals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks - make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment.

Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :-) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not - but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it - boring!

Well, I've stated my sub-goal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible).

(**) Corollary #2 - There is no god. nod, Dawkins.

{715}
hide / edit[5] / print
ref: Legenstein-2008.1 tags: Maass STDP reinforcement learning biofeedback Fetz synapse date: 04-09-2009 17:13 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-18846203[0] A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

  • (from abstract) The resulting learning theory predicts that even difficult credit-assignment problems, where it is very hard to tell which synaptic weights should be modified in order to increase the global reward for the system, can be solved in a self-organizing manner through reward-modulated STDP.
    • This yields an explanation for a fundamental experimental result on biofeedback in monkeys by Fetz and Baker.
  • STDP is prevalent in the cortex ; however, it requires a second signal:
    • Dopamine seems to gate STDP in corticostriatal synapses
    • ACh does the same or similar in the cortex. -- see references 8-12
  • simple learning rule they use: d/dtW ij(t)=C ij(t)D(t)
  • Their notes on the Fetz/Baker experiments: "Adjacent neurons tended to change their firing rate in the same direction, but also differential changes of directions of firing rates of pairs of neurons are reported in [17] (when these differential changes were rewarded). For example, it was shown in Figure 9 of [17] (see also Figure 1 in [19]) that pairs of neurons that were separated by no more than a few hundred microns could be independently trained to increase or decrease their firing rates."
  • Their result is actually really simple - there is no 'control' or biofeedback - there is no visual or sensory input, no real computation by the network (at least for this simulation). One neuron is simply reinforced, hence it's firing rate increases.
    • Fetz & later Schimdt's work involved feedback and precise control of firing rate; this does not.
    • This also does not address the problem that their rule may allow other synapses to forget during reinforcement.
  • They do show that exact spike times can be rewarded, which is kinda interesting ... kinda.
  • Tried a pattern classification task where all of the information was in the relative spike timings.
    • Had to run the pattern through the network 1000 times. That's a bit unrealistic (?).
      • The problem with all these algorithms is that they require so many presentations for gradient descent (or similar) to work, whereas biological systems can and do learn after one or a few presentations.
  • Next tried to train neurons to classify spoken input
    • Audio stimului was processed through a cochlear model
    • Maass previously has been able to train a network to perform speaker-independent classification.
    • Neuron model does, roughly, seem to discriminate between "one" and "two"... after 2000 trials (each with a presentation of 10 of the same digit utterance). I'm still not all that impressed. Feels like gradient descent / linear regression as per the original LSM.
  • A great many derivations in the Methods section... too much to follow.
  • Should read refs:
    • PMID-16907616[1] Gradient learning in spiking neural networks by dynamic perturbation of conductances.
    • PMID-17220510[2] Solving the distal reward problem through linkage of STDP and dopamine signaling.

____References____

[0] Legenstein R, Pecevski D, Maass W, A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.PLoS Comput Biol 4:10, e1000180 (2008 Oct)
[1] Fiete IR, Seung HS, Gradient learning in spiking neural networks by dynamic perturbation of conductances.Phys Rev Lett 97:4, 048104 (2006 Jul 28)
[2] Izhikevich EM, Solving the distal reward problem through linkage of STDP and dopamine signaling.Cereb Cortex 17:10, 2443-52 (2007 Oct)

{686}
hide / edit[1] / print
ref: Brown-2007.09 tags: motor force field learning vision date: 02-20-2009 00:28 gmt revision:1 [0] [head]

PMID-17855611 Motor Force Field Learning Influences Visual Processing of Target Motion

  • as you can see from the title - this is an interesting result.
  • learning to compensate for forces applied to the hand influenced how participants predicted target motion for interception.
  • subjects were trained on a robotic manipulandum that applied different force fields; they had to use the manipulandum to hit a accelerating target.
  • There were 3 force feilds: rightward, leftward, and null. Target accelerated left to right. Subjects with the rightward force field hit more targets than the null, and these more targets than the leftward force field. Hence motor knowledge of the environment (associated accelerations, as if there were wind or water current...) influenced how motion was perceived and acted upon.
    • perhaps there is a simple explanation for this (rather than their evolutionary information-sharing hypothesis): there exists a network that serves to convert visual-spatial coordinates into motor plans, and later muscle activations. The presence of a force field initially only affects the motor/muscle control parts of the ctx, but as training continues, the changes are propagated earlier into the system - to the visual system (or at least the visual-planning system). But this is a complicated system, and it's hard to predict how and where adaptation occurs.

{651}
hide / edit[4] / print
ref: Peters-2008.05 tags: Schaal reinforcement learning policy gradient motor primitives date: 02-17-2009 18:49 gmt revision:4 [3] [2] [1] [0] [head]

PMID-18482830[0] Reinforcement learning of motor skills with policy gradients

  • they say that the only way to deal with reinforcement or general-type learning in a high-dimensional policy space defined by parameterized motor primitives are policy gradient methods.
  • article is rather difficult to follow; they do not always provide enough details (for me) to understand exactly what their equations mean. Perhaps this is related to their criticism that others's papers are 'ad-hoc' and not 'statistically motivated'
  • none the less, it seems interesting..
  • their previous paper - Reinforcement learning for Humanoid robotics - maybe slightly easier to understand.

____References____

{674}
hide / edit[1] / print
ref: notes-0 tags: Barto Hierarchal Reinforcement Learning date: 02-17-2009 05:38 gmt revision:1 [0] [head]

Recent Advancements in Hierarchal Reinforcement Learning

  • RL with good function-approximation methods for evaluating the value function or policy function solve many problems yet...
  • RL is bedeviled by the curse of dimensionality: the number of parameters grows exponentially with the size of a compact encoding of state.
  • Recent research has tackled the problem by exploiting temporal abstraction - decisions are not required at each step, but rather invoke the activity of temporally extended sub-policies. This is somewhat similar to a macro or subroutine in programming.
  • This is fundamentally similar to adding detailed domain-specific knowledge to the controller / policy.
  • Ron Parr seems to have made significant advances in this field with 'hierarchies of abstract machines'.
    • I'm still looking for a cognitive (predictive) extension to these RL methods ... these all are about extension through programmer knowledge.
  • They also talk about concurrent RL, where agents can pursue multiple actions (or options) at the same time, and assess value of each upon completion.
  • Next are partially observable markov decision processes, where you have to estimate the present state (belief state), as well as a policy. It is known that and optimal solution to this task is intractable. They propose using Hierarchal suffix memory as a solution ; I can't really see what these are about.
    • It is also possible to attack the problem using hierarchal POMDPs, which break the task into higher and lower level 'tasks'. Little mention is given to the even harder problem of breaking sequences up into tasks.
  • Good review altogether, reasonable balance between depth and length.

{653}
hide / edit[1] / print
ref: Kakade-2002.07 tags: dopamine reward reinforcement learning Kakade Dayan date: 12-09-2008 21:27 gmt revision:1 [0] [head]

PMID-12371511[0] Dopamine: generalization and bonuses

  • suggest that some anomalies of dopamine activity is related to generalization and novelty. In terms of novelty, dopamine may be shaping exploration.
  • review results that DA activity signal a global prediction error for summed future reward in conditioning tasks.
    • above, A = pre-training; B = post-training; C = catch trial.
    • this type of model is essentially TD(0); it does not involve 'eligibility traces', but still is capable of learning.
    • remind us that these cells have been found, but there are many other different types of responses of dopmamine cells.
  • storage of these predictions involves the basolateral nuclei of the amygdala and the orbitofrontal cortex. (but how do these structures learn their expectations ... ?)
  • dopamine release is associated with motor effects that are species specific, like approach behaviors, that can be irrelevant or detrimental to the delivery of reward.
  • bonuses, for the authors = fictitious quantities added to rewards or values to ensure appropriate exploration.
  • resolution of DA activity ~ 50ms.
  • Romo & Schultz have found that there are phasic increases in DA activity to both rewarded and non-rewarded events/stimuli - something that they explain as 'generalization'. But - maybe it is something else? like a startle / get ready to move response?
  • They suggest that it is a matter of intermediate states where the monkey is uncertain as to what to do / what will happen. hum, not sure about this.

____References____

{652}
hide / edit[0] / print
ref: notes-0 tags: policy gradient reinforcement learning aibo walk optimization date: 12-09-2008 17:46 gmt revision:0 [head]

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

  • simple, easy to understand policy gradient method! many papers cite this on google scholar.
  • compare to {651}

{631}
hide / edit[1] / print
ref: Daw-2006.04 tags: reinforcement learning reward dopamine striatum date: 10-07-2008 22:36 gmt revision:1 [0] [head]

PMID-16563737[0] The computational neurobiology of learning and reward

  • I'm sure I read this, but cannot find it in m8ta anymore.
  • short, concise review article.
  • review evidence for actor-critic architectures in the prefrontal cortex.
  • cool: "Perhaps most impressively, a trial-by-trial regression analysis of dopamine responses in a task with varying reward magnitudes showed that the response dependence on the magnitude history has the same form as that expected from TD learning". trial by trial is where it's at! article: Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal

____References____

{629}
hide / edit[1] / print
ref: Schultz-2000.12 tags: review reward dopamine VTA basal ganglia reinforcement learning date: 10-07-2008 22:35 gmt revision:1 [0] [head]

PMID-11257908[0] Multiple Reward Signals in the Brain

  • deals with regions in the brain in which reward-related activity has been found, and specifically what the activity looks like.
  • despite the 2000 date, the review feels somewhat dated?
  • similar to [1] except much sorter..

____References____

{628}
hide / edit[1] / print
ref: Schultz-2000.03 tags: review orbitofrontal cortex basal ganglia dopamine reward reinforcement learning striatum date: 10-07-2008 03:53 gmt revision:1 [0] [head]

PMID-10731222[0] Reward processing in primate orbitofrontal cortex and basal ganglia

  • Orbitofrontal neurons showed three principal forms of reward-related activity during the performance of delayed response tasks,
    • responses to reward-predicting instructions,
    • activations during the expectation period immediately preceding reward and
    • responses following reward
    • above, reward-predicting stimulus in a dopamine neuron. Left: the animal received a small quantity of apple juice at irregular intervals without performing in any behavioral task. Right: the animal performed in an operant lever-pressing task in which it released a touch-sensitive resting key and touched a small lever in reaction to an auditory trigger signal. The dopamine neuron lost its response to the primary reward and responded to the reward-predicting sound.
  • for the other figures, read the excellent paper!

____References____

{67}
hide / edit[3] / print
ref: Graybiel-2005.12 tags: graybiel motor_learning reinforcement_learning basal ganglia striatum thalamus cortex date: 10-03-2008 17:04 gmt revision:3 [2] [1] [0] [head]

PMID-16271465[] The basal ganglia: Learning new tricks and loving it

  • learning-related changes occur significantly earlier in the striatum than the cortex in a cue-reversal task. she says that this is because the basal ganglia instruct the cortex. I rather think that they select output dimensions from that variance-generator, the cortex.
  • dopamine agonist treatment improves learning with positive reinforcers but not learning with negative reinforcers.
  • there is a strong hyperkinetic pathway that projects directly to the subthalamic nucleus from the motor cortex. this controls output of the inhibitor pathway (GPi)
  • GABA input from the GPi to the thalamus can induce rebound spikes with precise timing. (the outputs are therefore not only inhibitory).
  • striatal neurons have up and down states. recommended action: simultaneous on-line recording of dopamine release and spike activity.
  • interesting generalization: cerebellum = supervised learning, striatum = reinforcement learning. yet yet! the cerebellum has a strong disynaptic projection to the putamen. of course, there is a continuous gradient between fully-supervised and fully-reinforcement models. the question is how to formulate both in a stable loop.
  • striosomal = striatum to the SNc
  • http://en.wikipedia.org/wiki/Substantia_nigra SNc is not an disorganized mass: the dopamergic neurons from the pars compacta project to the cortex in a topological map, dopaminergic neurons of the fringes (the lowest) go to the sensorimotor striatum and the highest to the associative striatum

____References____

{289}
hide / edit[5] / print
ref: Li-2001.05 tags: Bizzi motor learning force field MIT M1 plasticity memory direction tuning transform date: 09-24-2008 22:49 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-11395017[0] Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field

  • this is concerned with memory cells, cells that 'remember' or remain permanently changed after learning the force-field.
  • In the above figure, the blue lines (or rather vertices of the blue lines) indicate the firing rate during the movement period (and 200ms before); angular position indicates the target of the movement. The force-field in this case was a curl field where force was proportional to velocity.
  • Preferred direction of the motor cortical units changed when the preferred driection of the EMGs changed
  • evidence of encoding of an internal model in the changes in tuning properties of the cells.
    • this can suppor both online performance and motor learning.
    • but what mechanisms allow the motor cortex to change in this way???
  • also see [1]

____References____

{596}
hide / edit[3] / print
ref: Huesler-2000.1 tags: EMG synchronization Hepp-Raymond grip finger force isometric date: 09-07-2008 17:26 gmt revision:3 [2] [1] [0] [head]

PMID-11081826 EMG activation patterns during force production in precision grip. III. Synchronisation of single motor units.

  • synchronization observed in 78% of intrinsic finger muscles (within the hand itself) and 45% of extrinsic finger muscles.
    • force increase was not necessarily correlated to increased synchronization; rather, high synchronization occurred at low force production.
  • instrinsic muscles have higher force sensitivity & higher recruitment thresholds.
  • other articles in the series:
    • PMID-7615027 EMG activation patterns during force production in precision grip. I. Contribution of 15 finger muscles to isometric force.
    • PMID-7615028 EMG activation patterns during force production in precision grip. II. Muscular synergies in the spatial and temporal domain.

Dr. hepp-Raymond himself seems to be a prolific researcher, judging from his pubmed search results. e.g.:

  • PMID-18272868 Absence of gamma-range corticomuscular coherence during dynamic force in a deafferented patient.
    • quote: proprioceptive information is mandatory in the genesis of gamma-band CMC (corticomuscular coherence) during the generation and control of dynamic forces.

{104}
hide / edit[3] / print
ref: Boline-2005.11 tags: electrophysiology motor cortex force isometric Ashe 2005 date: 04-09-2007 22:39 gmt revision:3 [2] [1] [0] [head]

this seems to be the same as {339}, with a different pubmed id & different author list. bug in the system!

PMID-16193273[0] On the relations between single cell activity in the motor cortex and the direction and magnitude of three-dimensional dynamic isometric force* the majority of cells responded to direction

  • few to the magnitude,
  • and ~10% to the direction & magnitude
  • control of static and dynamic motor systems is based on a common control process!
  • 2d task, monkeys, single-unit recording, regression analysis.

____References____

{286}
hide / edit[4] / print
ref: Maier-1993.03 tags: force motor control grip electrophysiology date: 04-09-2007 20:20 gmt revision:4 [3] [2] [1] [0] [head]

PMID-8463818[0] Contribution of the monkey corticomotoneuronal system to the control of force in precision grip

  • recorded 33 corticomotoneronal cells
  • used spike-triggered averaging to find putative pyramidal tract neurons.
  • considerable trail-by-trial variability in the cells activity-force relationship
  • and, in an earlier work: PMID-810360[1] Relation of activity in precentral cortical neurons to force and rate of force change during isometric contractions of finger muscles.

____References____

{345}
hide / edit[0] / print
ref: HeppReymond-1999.09 tags: force motor control grip electrophysiology date: 04-09-2007 20:20 gmt revision:0 [head]

PMID-10473750[0] Context-dependent force coding in motor and premotor cortical areas.

  • here they found neurons related to dF/dt during another isometric precision grip task.

____References____

{337}
hide / edit[2] / print
ref: Kalaska-1989.06 tags: motor control direction tuning force Kalaska date: 04-09-2007 19:59 gmt revision:2 [1] [0] [head]

PMID-2723767[0] A comparison of movement direction-related versus load direction-related activity in primate motor cortex, using a two-dimensional reaching task.

  • comparison to georoplous task:
    • "We demonstrate here that many of these cells show similar large continuously graded changes in discharge when the monkey compensates for inertial loads which pull the arm in 8 different directions"
  • the mean activity of the sample population under any condition of movement direction and load direction can be described reasonably well by a simple linear summation of the movement-related discharge without any loads, and the change in tonic activity of the population caused by the load, measured prior to movement
  • their data support the dual kinematics/dynamics encoding in the motor cortex.
    • but, to me, the data also supports direct control of the muscles.

____References____

{335}
hide / edit[1] / print
ref: Georgopoulos-1992.06 tags: motor control force Georgopoulos date: 04-09-2007 19:56 gmt revision:1 [0] [head]

PMID-1609282[0] The motor cortex and the coding of force.

  • 2D isometric force, which dissociated force & changed in force.
  • cells are not tuned to the direction of the absolute force, but rather to the direction of both the visual cue and change in force (dF/dt) as measured using linear regressions in an isometric force task.

____References____

{340}
hide / edit[1] / print
ref: Sergio-2003.01 tags: M1 isometric force posture direction SUA Kalaska date: 04-09-2007 15:22 gmt revision:1 [0] [head]

PMID-12522173[0] Systematic changes in motor cortex cell activity with arm posture during directional isometric force generation.

  • isometric joystick was positioned at 5-9 different locations in a plane in the monkey's workspace.
  • discharge of all cells varied with position and force.
    • Cell directional tuning tended to shift systematically with hand location even though the direction of static force output at the hand remained constant
      • would this be true if the forces were directed in muscle coordinates?
  • "provides further evidence that MI contributes to the transformation between extrinsic and intrinsic representations of motor output during isometric force production."

____References____

{339}
hide / edit[1] / print
ref: Taira-1996.06 tags: 3D Georgopoulos SUA M1 force motor control direction tuning date: 04-09-2007 15:16 gmt revision:1 [0] [head]

PMID-8817266[0] On the relations between single cell activity in the motor cortex and the direction and magnitude of three-dimensional static isometric force.

  • 3D isometric joystick.
  • stepwise multiple linear regression.
  • direction of force is a signal especially prominent in the motor cortex.
    • the pure directional effect was 1.8 times more prevalent in the cells than in the muscles studied (!)

____References____

{326}
hide / edit[0] / print
ref: Ashe-1997.09 tags: motor control force direction magnitude M1 cortex date: 04-09-2007 01:10 gmt revision:0 [head]

PMID-9331494[0] Force and the motor cortex.

  • most M1 cells seem to be related to the direction of static force; fewer related to direction and magnitude; fewer yet to only magnitude.
  • dynamic forces: there is a stron correlation between the rate of change of force and the motor cortex firing
    • dynamic force seems to determine firing rate moreso than static force (e.g. resisting gravity)
    • I have definantly seen evidence of this with the kinarm experiments.

____References____

{290}
hide / edit[0] / print
ref: Ostry-2003.12 tags: force motor control review cortex M1 date: 04-05-2007 15:21 gmt revision:0 [head]

PMID-14610628[0] A critical evaluation of the force control hypothesis in motor control.

  • the target of this review is the inverse dynamics model of motor control, which is very successful in robots. however, it seems that the mammalian nervous system does things a bit more complicated than this.
  • they agree that motor learning is most likely the defining feature of the cortex (i think that the critical and essential element of the cortex is not what control solution it arrives at, but rather how it learns that solution given the anatomical connections development has endowed it with.
  • they also find issue with the failure to incorporate realistic spinal reflexes into inverse-dynamics models.
  • However, we find little empirical evidence that specifically supports the inverse dynamics or forward internal model proposals per se.
  • We further conclude that the central idea of the force control hypothesis--that control levels operate through the central specification of forces--is flawed.

____References____

{197}
hide / edit[3] / print
ref: Afanasev-2004.03 tags: striatum learning reinforcement electrophysiology putamen russians date: 02-05-2007 17:33 gmt revision:3 [2] [1] [0] [head]

PMID-15151178[0] Sequential Rearrangements of the Ensemble Activity of Putamen Neurons in the Monkey Brain as a Correlate of Continuous Behavior

  • recorded 6-7 neurons in the putamen during alternative spatial selection
  • used discriminant analysis (whats that?) to analyze re-arrangements in spike activity
  • dynamics of re-arrangnement were dependent on reinforcement, and mostly contralateral striatum

____References____

{103}
hide / edit[0] / print
ref: bookmark-0 tags: Shadmehr torque forces jacobian date: 0-0-2007 0:0 revision:0 [head]

The Computational Neurobiology of Reaching and Pointing - online notes

{72}
hide / edit[0] / print
ref: abstract-0 tags: tlh24 error signals in the cortex and basal ganglia reinforcement_learning gradient_descent motor_learning date: 0-0-2006 0:0 revision:0 [head]

Title: Error signals in the cortex and basal ganglia.

Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain.