m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Peters J, Schaal S, Reinforcement learning of motor skills with policy gradients.Neural Netw 21:4, 682-97 (2008 May)

{135}
hide / / print
ref: Vijayakumar-2005.12 tags: schaal motor learning LWPL PLS partial least sqares date: 12-07-2011 04:09 gmt revision:1 [0] [head]

PMID-16212764[0] Incremental online learning in high dimensions

ideas:

  • use locally linear models.
  • use a small number of regressions in selected dimensions of input space in the spirit of partial least squares regression. (like partial least-squares) hence, can operate in very high dimensions.
  • function to be approximated has locally low-dimensional structure, which holds for most real-world data.
  • use: the learning of of value functions, policies, and models for learning control in high-dimensional systems (like complex robots or humans).
  • important distinction between function-approximation learning:
    • methods that fit nonlinear functions globally, possibly using input space expansions.
      • gaussian process regression
      • support vector machine regression
        • problem: requires the right kernel choice & basis vector choice.
      • variational bayes for mixture models
        • represents the conditional joint expectation, which is expensive to update. (though this is factored).
      • each above were designed for data analysis, not incremental data. (biology is incremental).
    • methods that fit simple models locally and segment the input space automatically.
      • problem: the curse of dimensionality: they require an exponential number of models for accurate approximation.
        • this is not such a problem if the function is locally low-dim, as mentioned above.
  • projection regression (PR) works via decomposing multivariate regressions into a superposition of single-variate regressions along a few axes of input space.
    • projection pursuit regression is a well-known and useful example.
    • sigmoidal neural networks can be viewed as a method of projection regression.
  • they want to use factor analysis, which assumes that the observed data is generated from a low-dimensional distribution with a limited number of latent variables related to the output via a transformation matrix + noise. (PCA/ wiener filter)
    • problem: the factor analysis must represent all high-variance dimensions in the data, even if it is irrelevant for the output.
    • solution: use joint input and output space projection to avoid elimination of regression-important dimensions.
----
  • practical details: they use the LPWR algorithm to model the inverse dynamics of their 7DOF hydraulically-actuated gripper arm. That is, they applied random torques while recording the resulting accelerations, velocities, and angles, then fit a function to predict torques from these variables. The robot was compliant and not very well modeled with a rigid body model, though they tried this. The resulting LPWR generated model was 27 to 7, predicted torques. The control system uses this functional approximation to compute torques from desired trajectories, i think. The desired trajectories are generated using spline-smoothing ?? and the control system is adaptive in addition to the LPWR approximation being adaptive.
  • The core of the LPWR is partial-least squares regression / progression pursuit, coupled with gaussian kernels and a distance metric (just a matrix) learned via constrained gradient descent with cross-validation. The partial least squares (PLS) appears to be very popular in many fields, and there are an number of ways of computing it. Distance metric can expand without limit, and overlap freely. Local models are added based on MSE, i think, and model adding stops when the space is well covered.
  • I think this technique is very powerful - you separate the the function evaluation from the error minimization, to avoid the problem of ambiguous causes. Instead, when applying the LPWR to the robot, the torques cause the angles and accelerations -> but you invert this relationship: want to control the torques given trajectory. Of course, the whole function approximation is stationary in time - the p/v/a is sufficient to describe the state and the required torques. Does the brain work in the same way? do random things, observe consequences, work in consequence space and invert ?? e.g. i contracted my bicep and it caused my hand to move to the face; now I want my hand to move to my face again, what caused that? Need reverse memory... or something. Hmm. let's go back to conditional learning: if any animal does an action, and subsequently it is rewarded, it will do that action again. if this is conditional on a need, then that action will be performed only when needed.. when habitual, the action will be performed no matter what.. this is the nature of all animals, i think, and corresponds to rienforcement learning? but how? I suppose it's all about memory, and assigning credit where credit is due. the same problem is dealt with rienforcement learning. and yet things like motor learning seem so far out of this paradigm - they are goal-directed and minimize some sort of error. eh, not really. Clementine is operating on the conditioned response now - has little in the way of error. but gradually this will be built; with humans, it is built very quickly by reuse of existing modes. or conciousness.
  • back to the beginning: you dont have to regress into output space - you regress into sensory space, and do as much as possible in that sensory space for control. this is very powerful, and the ISO learning people (Porr et al) have effectively discovered this: you minimize in sensory space.
    • does this abrogate the need for backprop? we are continually causality-inverting machines; we are prredictive.

____References____

[0] Vijayakumar S, D'Souza A, Schaal S, Incremental online learning in high dimensions.Neural Comput 17:12, 2602-34 (2005 Dec)

{651}
hide / / print
ref: Peters-2008.05 tags: Schaal reinforcement learning policy gradient motor primitives date: 02-17-2009 18:49 gmt revision:4 [3] [2] [1] [0] [head]

PMID-18482830[0] Reinforcement learning of motor skills with policy gradients

  • they say that the only way to deal with reinforcement or general-type learning in a high-dimensional policy space defined by parameterized motor primitives are policy gradient methods.
  • article is rather difficult to follow; they do not always provide enough details (for me) to understand exactly what their equations mean. Perhaps this is related to their criticism that others's papers are 'ad-hoc' and not 'statistically motivated'
  • none the less, it seems interesting..
  • their previous paper - Reinforcement learning for Humanoid robotics - maybe slightly easier to understand.

____References____

{138}
hide / / print
ref: Schaal-2005.12 tags: schaal motor learning review date: 0-0-2007 0:0 revision:0 [head]

PMID-16271466 Computational Motor control in humans and robots

{139}
hide / / print
ref: Schaal-1998.11 tags: schaal local learning PLS partial least squares function approximation date: 0-0-2007 0:0 revision:0 [head]

PMID-9804671 Constructive incremental learning from only local information

{140}
hide / / print
ref: Nakanishi-2005.01 tags: schaal adaptive control function approximation error learning date: 0-0-2007 0:0 revision:0 [head]

PMID-15649663 Composite adaptive control with locally weighted statistical learning.

  • idea: want error-tracking plus locally-weighted peicewise linear function approximation (though , I didn't read it all that much in depth.. it is complicated)