m8ta
use https for features.
text: sort by
tags: modified
type: chronology
[0] Shuler MG, Bear MF, Reward timing in the primary visual cortex.Science 311:5767, 1606-9 (2006 Mar 17)

[0] Kakade S, Dayan P, Dopamine: generalization and bonuses.Neural Netw 15:4-6, 549-59 (2002 Jun-Jul)

[0] Pleger B, Blankenburg F, Ruff CC, Driver J, Dolan RJ, Reward facilitates tactile judgments and modulates hemodynamic responses in human primary somatosensory cortex.J Neurosci 28:33, 8161-8 (2008 Aug 13)

[0] Daw ND, Doya K, The computational neurobiology of learning and reward.Curr Opin Neurobiol 16:2, 199-204 (2006 Apr)

[0] Schultz W, Multiple reward signals in the brain.Nat Rev Neurosci 1:3, 199-207 (2000 Dec)[1] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Schultz W, Tremblay L, Hollerman JR, Reward processing in primate orbitofrontal cortex and basal ganglia.Cereb Cortex 10:3, 272-84 (2000 Mar)

[0] Shidara M, Aigner TG, Richmond BJ, Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials.J Neurosci 18:7, 2613-25 (1998 Apr 1)

{1447}
hide / / print
ref: -2006 tags: Mark Bear reward visual cortex cholinergic date: 03-06-2019 04:54 gmt revision:1 [0] [head]

PMID-16543459 Reward timing in the primary visual cortex

  • Used 192-IgG-Saporin (saporin immunotoxin)to selectively lesion cholinergic fibers locally in V1 following a visual stimulus -> licking reward delay behavior.
  • Visual stimulus is full-field light, delivered to either the left or right eye.
    • This is scarcely a challenging task; perhaps they or others have followed up?
  • These examples illustrate that both cue 1-dominant and cue 2-dominant neurons recorded from intact animals express NRTs that appropriately reflect the new policy. Conversely, although cue 1- and cue 2-dominant neurons recorded from 192-IgG-saporin-infused animals are capable of displaying all forms of reward timing activity, ‘’’they do not update their NRTs but rather persist in reporting the now outdated policy.’’’
    • NRT = neural reaction time.
  • This needs to be controlled with recordings from other cortical areas.
  • Acquisition of reward based response is simultaneously interesting and boring -- what about the normal, discriminative and perceptual function of the cortex?
  • See also follow-up work PMID-23439124 A cholinergic mechanism for reward timing within primary visual cortex.

{1140}
hide / / print
ref: -0 tags: dopamine reward prediction striatum error striatum orbitofrontal reward date: 02-24-2012 21:26 gmt revision:1 [0] [head]

PMID-11105648 Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.

  • Many regions have a complex set of activations, but dopamine neurons appear more homogenous: they report the error in reward prediction.
    • "The homogeneity of responsiveness across the population of dopamine neurons indicates that this error signal is widely broadcast to dopamine terminal regions where it could provide a teaching signal for synaptic modifications underlying the learning of goal-directed appetitive behaviors."
    • Signals are not contingent on the type of behavior needed to obtain the reward, and hence represent a relatively 'pure' reward prediction error.
  • Unlike dopamine neurons, many striatal neurons respond to predicted rewards, although at least some may reflect the relative degree of predictability in the magnitude of the responses to reward.
  • Neuronal activations in the orbitofrontal cortex appear to involve less integration of behavioral and reward-related information, but rather incorporate another aspect of reward, the relative motivational significance of different rewards.
  • Processing is hierarchical (or supposed to be so):
    • Dopamine neurons provide a relatively pure signal of an error in reward prediction,
    • Striatal neurons signal not only reward, but also behavioral contingencies,
    • Orbitofrontal neurons signal reward and incorporate relative reward preference.

{843}
hide / / print
ref: Zaghloul-2009.03 tags: DBS STN reinforcement learning humans unexpected reward Baltuch date: 01-26-2012 18:19 gmt revision:1 [0] [head]

PMID-19286561[0] Human Substantia Nigra Neurons Encode Unexpected Financial Rewards

  • direct, concise.
  • 15 neurons in 11 patients -- we have far more!

____References____

[0] Zaghloul KA, Blanco JA, Weidemann CT, McGill K, Jaggi JL, Baltuch GH, Kahana MJ, Human substantia nigra neurons encode unexpected financial rewards.Science 323:5920, 1496-9 (2009 Mar 13)

{1084}
hide / / print
ref: BAdi-2009.09 tags: dopamine L-Dopa levodopa agonist young reward novelty punisment learning date: 01-24-2012 04:05 gmt revision:1 [0] [head]

PMID-19416950[0] Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients

  • dopamine agonist administration in young patients with Parkinson's disease resulted in increased novelty seeking, enhanced reward processing, and decreased punishment processing may shed light on the cognitive and personality bases of the impulse control disorders, which arise as side-effects of dopamine agonist therapy in some Parkinson's disease patients.

____References____

[0] Bódi N, Kéri S, Nagy H, Moustafa A, Myers CE, Daw N, Dibó G, Takáts A, Bereczki D, Gluck MA, Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson's patients.Brain 132:Pt 9, 2385-95 (2009 Sep)

{630}
hide / / print
ref: Shuler-2006.03 tags: reward V1 visual cortex timing reinforcement surprising date: 01-03-2012 02:33 gmt revision:4 [3] [2] [1] [0] [head]

PMID-16543459[0] Reward Timing in the Primary Visual Cortex

  • the responses of a substantial fraction of neurons in the primary visual cortex evolve from those that relate solely to the physical attributes of the stimuli to those that accurately predict the timing of reward.. wow!
  • rats. they put goggles on the rats to deliver full-fields retinal illumination for 400ms (isn't this cheating? full field?)
  • recorded from deep layers of V1
  • sensory processing does not seem to be reliable, stable, and reproducible...
  • rewarded only half of the trials, to see if the plasticity was a result of reward delivery or association of stimuli and reward.
  • after 5-7 sessions of training, neurons began to respond to the poststimulus reward time.
  • this was actually independent of reward delivery - only dependent on the time.
  • reward-related activity was only driven by the dominant eye.
  • individual neurons predict reward time quite accurately. (wha?)
  • responses continued even if the animal was no longer doing the task.
  • is this an artifact? of something else? what's going on? the suggest that it could be caused by subthreshold activity due to recurrent connections amplified by dopamine.

____References____

{194}
hide / / print
ref: Schultz-1998.07 tags: dopamine reward reinforcement_learning review date: 12-07-2011 04:16 gmt revision:1 [0] [head]

PMID-9658025[0] Predictive reward signal of dopamine neurons.

  • hot article.
  • reasons why midbrain Da is involved in reward: lesions, receptor blocking, electrical self-stimulation, and drugs of abuse.
  • DA neurons show phasic response to both primary reward and reward-predicting stimul.
  • 'All responses to rewards and reward-predicting stimuli depend on event predictability.
  • Just think of the MFB work with the rats... and how powerful it is.
  • most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal (e.g. parkinsons, huntingtons) -> implying that DA has two uses: the labeling of reward, that the tonic enabling of postsynaptic neurons.
    • I just anticipated this, which is good :)
    • It is still a mystery how the neurons in the midbrain determine to fire - the pathways between reward and behavior must be very carefully segregated, otherwise we would be able to self-simulate
      • the pure expectation part of it is bound play a part in this - if we know that a certain event will be rewarding, then the expectation will diminish DA release.
  • predictive eye movements amerliorate behavioral perfromance through advance focusing. (interesting)
  • predictions are used in industry:
    • Internal Model Control is used in industry to predict future system states before they actually occur. for example, the fly-by-wire technique in aviation makes decisions to do particular manuvers based on predictable forthcoming states of the plane. (Like a human)
  • if you learn a reaction/reflex based on a conditioned stimulus, the presentation of that stimulus sets the internal state to that motivated to achieve the primary reward. there is a transfer back in time, which, generally, is what neural systems are for.
  • animals avoid foods that fail to influence important plasma/brain parameters, for example foods lacking essential amino acids like histidine, threonine, or methionine. In the case of food, the appearance/structure would be used to predict the slower plasma effects, and hence influence motivation to eat it. (of course!)
  • midbrain groups:
    • A8 = dorsal to lateral substantia nigra
    • A9 = pars compacta of substantia nigra, SNc
    • A10 = VTA, media to substantia nigra.
  • The characteristic polyphasic, relatively long impulses discharged at low frequencies make dpamine neurons easily distinguishable from other midbrain neurons.

____References____

[0] Schultz W, Predictive reward signal of dopamine neurons.J Neurophysiol 80:1, 1-27 (1998 Jul)

{156}
hide / / print
ref: Shidara-2002.05 tags: anterior cingulate ACC 2002 reward anticipation ODC date: 12-07-2011 04:12 gmt revision:1 [0] [head]

PMID-12040201[0] Anterior cingulate: single neuronal signals related to degree of reward expectancy

  • feelings of increasing anticipation experienced as we work toward a predicted outcome may be traceable to a reward expectancy signal; in OCD, the brain may be 'hijacked' by runaway signals in the reward expectancy circuit.
    • brain imaging studies have detected abnormal activation of ACC in OCD

____References____

[0] Shidara M, Richmond BJ, Anterior cingulate: single neuronal signals related to degree of reward expectancy.Science 296:5573, 1709-11 (2002 May 31)

{689}
hide / / print
ref: HilArio-2007.01 tags: Rui Costa endocannabinoid habit reward striatum basal ganglia date: 03-05-2009 19:04 gmt revision:0 [head]

PMID-18958234 Endocannabinoid Signaling is Critical for Habit Formation.

  • quick review (the intro is packed with grat information):
    • in goal-directed learning, behavior is highly sensitive to the incentive value of the outcome, and contingency between the action and the outcome.
    • with repetition actions become both more efficient and more automatic.
    • after extensive training, rats move from goal-directed behavior to more habitual response independent of outcome value.
      • random interval schedules favor this more than random ratio reward schedules.
        • in mice, random interval schedules promoted habit formation, whereas random ratio schedules promoted acquisition of goal-directed behaviors. does this also apply to humans? I would guess so. Might be an interesting tool to have in the toolbox.
        • interval schedules promoted the exploration of a random lever whereas ratio schedules promoted the exploitation of the reward lever.
    • the underlying circuitry supporting goal-directed behav and habit formation are different:
      • goal directed behavior seems to require the associative BG/cortex including:
        • dorsomedial or associative striatum (medial!)
          • COMT, a transporter, is more highly expressed here than DAT.
        • pre-limbic ctx
        • mediodorsal thalamus
      • habit formation requries:
        • dorsolateral or sensorimotor striatum (lateral!)
          • DAT, dopamine transporter, is highly expressed here.
        • infralimbic cortex
    • amphetamine sensitization can lead to increased spine density in medium spiny neurons in the dorsolateral striatum, while decreasing spine density in the dorsomedial striatum. (interesting!)
    • lesions of nigrostriatal input to dorsolateral striatum impairs habit formation;
    • infusion of dopamine into the ventral medial prefrontal cortex favors goal-directed behavior
      • that is a rather broad statement to make ...
  • endocannabinoid release in the striatum is required for LTD induction.
  • endocannabinoid signaling regulated bt DA.
  • CB1 (the receptor implicated in addiction) is highly expressed in the dorsolateral striatum (habit!) at both excitatory and inhibitory terminals.
  • used mice with CB1 mutations therefore!
  • CB1 mutant mice have impaired habit formation and enhanced exploration.
    • suggest that endocannabinoid signaling is critical for both habit formation and increased exploration in interval schedules.

{653}
hide / / print
ref: Kakade-2002.07 tags: dopamine reward reinforcement learning Kakade Dayan date: 12-09-2008 21:27 gmt revision:1 [0] [head]

PMID-12371511[0] Dopamine: generalization and bonuses

  • suggest that some anomalies of dopamine activity is related to generalization and novelty. In terms of novelty, dopamine may be shaping exploration.
  • review results that DA activity signal a global prediction error for summed future reward in conditioning tasks.
    • above, A = pre-training; B = post-training; C = catch trial.
    • this type of model is essentially TD(0); it does not involve 'eligibility traces', but still is capable of learning.
    • remind us that these cells have been found, but there are many other different types of responses of dopmamine cells.
  • storage of these predictions involves the basolateral nuclei of the amygdala and the orbitofrontal cortex. (but how do these structures learn their expectations ... ?)
  • dopamine release is associated with motor effects that are species specific, like approach behaviors, that can be irrelevant or detrimental to the delivery of reward.
  • bonuses, for the authors = fictitious quantities added to rewards or values to ensure appropriate exploration.
  • resolution of DA activity ~ 50ms.
  • Romo & Schultz have found that there are phasic increases in DA activity to both rewarded and non-rewarded events/stimuli - something that they explain as 'generalization'. But - maybe it is something else? like a startle / get ready to move response?
  • They suggest that it is a matter of intermediate states where the monkey is uncertain as to what to do / what will happen. hum, not sure about this.

____References____

{632}
hide / / print
ref: Pleger-2008.08 tags: S1 reward fMRI human date: 10-07-2008 23:06 gmt revision:1 [0] [head]

PMID-18701678[0] Reward facilitates tactile judgments and modulates hemodynamic responses in human primary somatosensory cortex.

  • "Remarkably, primary somatosensory cortex contralateral to the judged hand was reactivated at the point of reward delivery, despite the absence of concurrent somatosensory input at that time point."
    • hence, it is probably rostral to the central sulcus too.
  • the same as http://m8ta.com/index.pl?pid=630
  • rewarded humans with $
  • people had to discriminate the frequency of electrical stimulation to their left/right index fingers. i guess a vibrator would have been hard in the magnet of an MRI machine.
  • reward cue was visually instructed.
  • reference Janaina's paper. http://www.jneurosci.org/cgi/content/full/27/39/10608

____References____

{631}
hide / / print
ref: Daw-2006.04 tags: reinforcement learning reward dopamine striatum date: 10-07-2008 22:36 gmt revision:1 [0] [head]

PMID-16563737[0] The computational neurobiology of learning and reward

  • I'm sure I read this, but cannot find it in m8ta anymore.
  • short, concise review article.
  • review evidence for actor-critic architectures in the prefrontal cortex.
  • cool: "Perhaps most impressively, a trial-by-trial regression analysis of dopamine responses in a task with varying reward magnitudes showed that the response dependence on the magnitude history has the same form as that expected from TD learning". trial by trial is where it's at! article: Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal

____References____

{629}
hide / / print
ref: Schultz-2000.12 tags: review reward dopamine VTA basal ganglia reinforcement learning date: 10-07-2008 22:35 gmt revision:1 [0] [head]

PMID-11257908[0] Multiple Reward Signals in the Brain

  • deals with regions in the brain in which reward-related activity has been found, and specifically what the activity looks like.
  • despite the 2000 date, the review feels somewhat dated?
  • similar to [1] except much sorter..

____References____

{628}
hide / / print
ref: Schultz-2000.03 tags: review orbitofrontal cortex basal ganglia dopamine reward reinforcement learning striatum date: 10-07-2008 03:53 gmt revision:1 [0] [head]

PMID-10731222[0] Reward processing in primate orbitofrontal cortex and basal ganglia

  • Orbitofrontal neurons showed three principal forms of reward-related activity during the performance of delayed response tasks,
    • responses to reward-predicting instructions,
    • activations during the expectation period immediately preceding reward and
    • responses following reward
    • above, reward-predicting stimulus in a dopamine neuron. Left: the animal received a small quantity of apple juice at irregular intervals without performing in any behavioral task. Right: the animal performed in an operant lever-pressing task in which it released a touch-sensitive resting key and touched a small lever in reaction to an auditory trigger signal. The dopamine neuron lost its response to the primary reward and responded to the reward-predicting sound.
  • for the other figures, read the excellent paper!

____References____

{257}
hide / / print
ref: Shidara-1998.04 tags: ventral striatum nucleus accumbens monkey reward progress cue date: 03-27-2007 14:39 gmt revision:0 [head]

PMID-9502820[] Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials

  • neurons seem to cue/indicate/keep track of the state that a monkey is in during a sequence of reward-motivated behavior, e.g. there are neurons here which respond to the first trial, another group to anything other than 1st, others to first trial of schedules longer than one.
    • the recording site.

____References____

{108}
hide / / print
ref: bookmark-0 tags: STDP hebbian learning dopamine reward robot model ISO date: 0-0-2007 0:0 revision:0 [head]

http://www.berndporr.me.uk/iso3_sab/

  • idea: have a gating signal for the hebbian learning.
    • pure hebbian learning is unsable; it will lead to endless amplification.
  • method: use a bunch of resonators near sub-critically dampled.
  • application: a simple 2-d robot that learns to seek food. not super interesting, but still good.
  • Uses ISO learning - Isotropic sequence order learning.
  • somewhat related: runbot!

{117}
hide / / print
ref: Gdowski-2001.02 tags: globus pallidus reward electrophysiology 2001 date: 0-0-2007 0:0 revision:0 [head]

PMID-11160530 Context Dependency in the Globus Pallidus Internal Segment During Targeted Arm Movements

  • most of the movement-responsive neurons had modulations in the cued segment of the task, not in the subsequent relaxed, self-paced phase.
  • this constitutes a reward or context-dependence.
{116}