[0] Dayan P, Balleine BW, Reward, motivation, and reinforcement learning.Neuron 36:2, 285-98 (2002 Oct 10)

  • criticism of the actor-critic model in the context of extensive behavioral research.
    • the critic evaluates the average future reward of given states (for the whole task - hence solving the temporal credit problem.
  • discusses temporal credit problem, which is an issue in sequential learning problems. (and nearly all learning!)
  • heheh: "For example, Hershberger, W.A., 1986. An approach through the looking glass. Anim. Learn. Behav. 14, pp. 443–451. View Record in Scopus | Cited By in Scopus (9)Hershberger (1986) trained cochral chicks to expect to find food in a specific food cup. He then arranged the situation such that if they ran toward the food cup, the cup receded at twice their approach speed whereas if they ran away from the food cup, it approached them at twice their retreat speed. As such, the chicks had to learn to run away from the distinctive food cup in order to get food. Hershberger found that the chicks were unable to learn this response in order to get the food and persisted in chasing the food away. They could, however, learn perfectly well to get the food when the cup moved away from them at only half of their approach speed."