m8ta
use https for features.
text: sort by
tags: modified
type: chronology
{1500}
hide / / print
ref: -0 tags: reinforcement learning distribution DQN Deepmind dopamine date: 03-30-2020 02:14 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-31942076 A distributional code for value in dopamine based reinforcement learning

  • Synopsis is staggeringly simple: dopamine neurons encode / learn to encode a distribution of reward expectations, not just the mean (aka the expected value) of the reward at a given state-action pair.
  • This is almost obvious neurally -- of course dopamine neurons in the striatum represent different levels of reward expectation; there is population diversity in nearly everything in neuroscience. The new interpretation is that neurons have different slopes for their susceptibility to positive and negative rewards (or rather, reward predictions), which results in different inflection points where the neurons are neutral about a reward.
    • This constitutes more optimistic and pessimistic neurons.
  • There is already substantial evidence that such a distributional representation enhances performance in DQN (Deep q-networks) from circa 2017; the innovation here is that it has been extended to experiments from 2015 where mice learned to anticipate water rewards with varying volume, or varying probability of arrival.
  • The model predicts a diversity of asymmetry below and above the reversal point
  • Also predicts that the distribution of reward responses should be decoded by neural activity ... which it is ... but it is not surprising that a bespoke decoder can find this information in the neural firing rates. (Have not examined in depth the decoding methods)
  • Still, this is a clear and well-written, well-thought out paper; glad to see new parsimonious theories about dopamine out there.

{984}
hide / / print
ref: ODoherty-2011 tags: Odoherty Nicolelis ICMS stimulation randomly patterned gamma distribution date: 01-03-2012 06:55 gmt revision:1 [0] [head]

IEEE-6114258 (pdf) Towards a Brain-Machine-Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation.

  • Key result: monkeys can discriminate between constant-frequency ICMS and aperiodic pulses, hence can discriminate some fine temporal aspects of ICMS.
  • Also discussed blanking methods for stimulating and recording at the same time (on different electrodes, using the randomized stimulation patterns).

____References____

O'Doherty, J. and Lebedev, M. and Li, Z. and Nicolelis, M. Towards a Brain #x2013;Machine #x2013;Brain Interface:Virtual Active Touch Using Randomly Patterned Intracortical Microstimulation Neural Systems and Rehabilitation Engineering, IEEE Transactions on PP 99 1 (2011)