m8ta
You are not authenticated, login. |
|
{1576} |
ref: -0
tags: GFlowNet Bengio probabilty modelling reinforcement learing
date: 10-29-2023 19:17 gmt
revision:3
[2] [1] [0] [head]
|
||||
| |||||
{1522} | |||||
Schema networks: zero-shot transfer with a generative causal model of intuitive physics
| |||||
{1500} | |||||
PMID-31942076 A distributional code for value in dopamine based reinforcement learning
| |||||
{1333} | |||||
| |||||
{1169} |
ref: -0
tags: artificial intelligence projection episodic memory reinforcement learning
date: 08-15-2012 19:16 gmt
revision:0
[head]
|
||||
Projective simulation for artificial intelligence
| |||||
{58} | |||||
PMID-16271465 The basal ganglia: learning new tricks and loving it
| |||||
{1144} | |||||
PMID-15242667 Anatomical funneling, sparse connectivity and redundancy reduction in the neural networks of the basal ganglia
PMID-15233923 Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons.
| |||||
{1076} | |||||
PMID-17017503[0] Synchronizing activity of basal ganglia and pathophysiology of Parkinson's disease.
____References____
| |||||
{843} | |||||
PMID-19286561[0] Human Substantia Nigra Neurons Encode Unexpected Financial Rewards
____References____
| |||||
{1085} | |||||
PMID-21603228[0] Dopaminergic Balance between Reward Maximization and Policy Complexity.
____References____
| |||||
{255} |
ref: BarGad-2003.12
tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus
date: 01-16-2012 19:18 gmt
revision:3
[2] [1] [0] [head]
|
||||
PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)
____References____ | |||||
{905} | |||||
PMID-114271[0] Operant control of precentral neurons: the role of reinforcement schedules.
____References____
| |||||
{788} |
ref: -0
tags: reinforcement learning basis function policy specialization
date: 01-03-2012 02:37 gmt
revision:1
[0] [head]
|
||||
To read: | |||||
{630} | |||||
PMID-16543459[0] Reward Timing in the Primary Visual Cortex
____References____ | |||||
{623} | |||||
Reinforcement learning in the cortex (a web scour/crawl):
| |||||
{5} |
ref: bookmark-0
tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning
date: 12-31-2011 19:30 gmt
revision:3
[2] [1] [0] [head]
|
||||
hunch.net interesting posts:
| |||||
{612} | |||||
PMID-17187065[0] Separate neural substrates for skill learning and performance in the ventral and dorsal striatum.
____References____ | |||||
{964} |
ref: OLDS-1954.12
tags: Olds Milner operant conditioning electrical reinforcement wireheading BMI
date: 12-29-2011 05:09 gmt
revision:5
[4] [3] [2] [1] [0] [head]
|
||||
PMID-13233369[0] Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain.
____References____
| |||||
{194} | |||||
PMID-9658025[0] Predictive reward signal of dopamine neurons.
____References____
| |||||
{323} |
ref: Loewenstein-2006.1
tags: reinforcement learning operant conditioning neural networks theory
date: 12-07-2011 03:36 gmt
revision:4
[3] [2] [1] [0] [head]
|
||||
PMID-17008410[0] Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity
____References____ | |||||
{795} |
ref: work-0
tags: machine learning reinforcement genetic algorithms
date: 10-26-2009 04:49 gmt
revision:1
[0] [head]
|
||||
I just had dinner with Jesse, and the we had a good/productive discussion/brainstorm about algorithms, learning, and neurobio. Two things worth repeating, one simpler than the other: 1. Gradient descent / Newton-Rhapson like techniques should be tried with genetic algorithms. As of my current understanding, genetic algorithms perform an semi-directed search, randomly exploring the space of solutions with natural selection exerting a pressure to improve. What if you took the partial derivative of each of the organism's genes, and used that to direct mutation, rather than random selection of the mutated element? What if you looked before mating and crossover? Seems like this would speed up the algorithm greatly (though it might get it stuck in local minima, too). Not sure if this has been done before - if it has, edit this to indicate where! 2. Most supervised machine learning algorithms seem to rely on one single, externally applied objective function which they then attempt to optimize. (Rather this is what convex programming is. Unsupervised learning of course exists, like PCA, ICA, and other means of learning correlative structure) There are a great many ways to do optimization, but all are exactly that - optimization, search through a space for some set of weights / set of rules / decision tree that maximizes or minimizes an objective function. What Jesse and I have arrived at is that there is no real utility function in the world, (Corollary #1: life is not an optimization problem (**)) -- we generate these utility functions, just as we generate our own behavior. What would happen if an algorithm iteratively estimated, checked, cross-validated its utility function based on the small rewards actually found in the world / its synthetic environment? Would we get generative behavior greater than the complexity of the inputs? (Jesse and I also had an in-depth talk about information generation / destruction in non-linear systems.) Put another way, perhaps part of learning is to structure internal valuation / utility functions to set up reinforcement learning problems where the reinforcement signal comes according to satisfaction of sub-goals (= local utility functions). Or, the gradient signal comes by evaluating partial derivatives of actions wrt Creating these goals is natural but not always easy, which is why one reason (of very many!) sports are so great - the utility function is clean, external, and immutable. The recursive, introspective creation of valuation / utility functions is what drives a lot of my internal monologues, mixed with a hefty dose of taking partial derivatives (see {780}) based on models of the world. (Stated this way, they seem so similar that perhaps they are the same thing?) To my limited knowledge, there has been some work as of recent in the creation of sub-goals in reinforcement learning. One paper I read used a system to look for states that had a high ratio of ultimately rewarded paths to unrewarded paths, and selected these as subgoals (e.g. rewarded the agent when this state was reached.) I'm not talking about these sorts of sub-goals. In these systems, there is an ultimate goal that the researcher wants the agent to achieve, and it is the algorithm's (or s') task to make a policy for generating/selecting behavior. Rather, I'm interested in even more unstructured tasks - make a utility function, and a behavioral policy, based on small continuous (possibly irrelevant?) rewards in the environment. Why would I want to do this? The pet project I have in mind is a 'cognitive' PCB part placement / layout / routing algorithm to add to my pet project, kicadocaml, to finally get some people to use it (the attention economy :-) In the course of thinking about how to do this, I've realized that a substantial problem is simply determining what board layouts are good, and what are not. I have a rough aesthetic idea + some heuristics that I learned from my dad + some heuristics I've learned through practice of what is good layout and what is not - but, how to code these up? And what if these aren't the best rules, anyway? If i just code up the rules I've internalized as utility functions, then the board layout will be pretty much as I do it - boring! Well, I've stated my sub-goal in the form of a problem statement and some criteria to meet. Now, to go and search for a decent solution to it. (Have to keep this blog m8ta!) (Or, realistically, to go back and see if the problem statement is sensible). (**) Corollary #2 - There is no god. nod, Dawkins. | |||||
{715} |
ref: Legenstein-2008.1
tags: Maass STDP reinforcement learning biofeedback Fetz synapse
date: 04-09-2009 17:13 gmt
revision:5
[4] [3] [2] [1] [0] [head]
|
||||
PMID-18846203[0] A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
____References____
| |||||
{651} | |||||
PMID-18482830[0] Reinforcement learning of motor skills with policy gradients
____References____ | |||||
{674} |
ref: notes-0
tags: Barto Hierarchal Reinforcement Learning
date: 02-17-2009 05:38 gmt
revision:1
[0] [head]
|
||||
Recent Advancements in Hierarchal Reinforcement Learning
| |||||
{653} | |||||
PMID-12371511[0] Dopamine: generalization and bonuses
____References____ | |||||
{652} |
ref: notes-0
tags: policy gradient reinforcement learning aibo walk optimization
date: 12-09-2008 17:46 gmt
revision:0
[head]
|
||||
Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion
| |||||
{631} | |||||
PMID-16563737[0] The computational neurobiology of learning and reward
____References____ | |||||
{629} | |||||
PMID-11257908[0] Multiple Reward Signals in the Brain
____References____ | |||||
{628} | |||||
PMID-10731222[0] Reward processing in primate orbitofrontal cortex and basal ganglia
____References____ | |||||
{67} | |||||
PMID-16271465[] The basal ganglia: Learning new tricks and loving it
____References____ | |||||
{197} | |||||
PMID-15151178[0] Sequential Rearrangements of the Ensemble Activity of Putamen Neurons in the Monkey Brain as a Correlate of Continuous Behavior
____References____ | |||||
{72} |
ref: abstract-0
tags: tlh24 error signals in the cortex and basal ganglia reinforcement_learning gradient_descent motor_learning
date: 0-0-2006 0:0
revision:0
[head]
|
||||
Title: Error signals in the cortex and basal ganglia. Abstract: Numerous studies have found correlations between measures of neural activity, from single unit recordings to aggregate measures such as EEG, to motor behavior. Two general themes have emerged from this research: neurons are generally broadly tuned and are often arrayed in spatial maps. It is hypothesized that these are two features of a larger hierarchal structure of spatial and temporal transforms that allow mappings to procure complex behaviors from abstract goals, or similarly, complex sensory information to produce simple percepts. Much theoretical work has proved the suitability of this organization to both generate behavior and extract relevant information from the world. It is generally agreed that most transforms enacted by the cortex and basal ganglia are learned rather than genetically encoded. Therefore, it is the characterization of the learning process that describes the computational nature of the brain; the descriptions of the basis functions themselves are more descriptive of the brain’s environment. Here we hypothesize that learning in the mammalian brain is a stochastic maximization of reward and transform predictability, and a minimization of transform complexity and latency. It is probable that the optimizations employed in learning include both components of gradient descent and competitive elimination, which are two large classes of algorithms explored extensively in the field of machine learning. The former method requires the existence of a vectoral error signal, while the latter is less restrictive, and requires at least a scalar evaluator. We will look for the existence of candidate error or evaluator signals in the cortex and basal ganglia during force-field learning where the motor error is task-relevant and explicitly provided to the subject. By simultaneously recording large populations of neurons from multiple brain areas we can probe the existence of error or evaluator signals by measuring the stochastic relationship and predictive ability of neural activity to the provided error signal. From this data we will also be able to track dependence of neural tuning trajectory on trial-by-trial success; if the cortex operates under minimization principles, then tuning change will have a temporal relationship to reward. The overarching goal of this research is to look for one aspect of motor learning – the error signal – with the hope of using this data to better understand the normal function of the cortex and basal ganglia, and how this normal function is related to the symptoms caused by disease and lesions of the brain. |