PMID-16212764[0] Incremental online learning in high dimensions
ideas:
- use locally linear models.
- use a small number of regressions in selected dimensions of input space in the spirit of partial least squares regression. (like partial least-squares) hence, can operate in very high dimensions.
- function to be approximated has locally low-dimensional structure, which holds for most real-world data.
- use: the learning of of value functions, policies, and models for learning control in high-dimensional systems (like complex robots or humans).
- important distinction between function-approximation learning:
- methods that fit nonlinear functions globally, possibly using input space expansions.
- gaussian process regression
- support vector machine regression
- problem: requires the right kernel choice & basis vector choice.
- variational bayes for mixture models
- represents the conditional joint expectation, which is expensive to update. (though this is factored).
- each above were designed for data analysis, not incremental data. (biology is incremental).
- methods that fit simple models locally and segment the input space automatically.
- problem: the curse of dimensionality: they require an exponential number of models for accurate approximation.
- this is not such a problem if the function is locally low-dim, as mentioned above.
- projection regression (PR) works via decomposing multivariate regressions into a superposition of single-variate regressions along a few axes of input space.
- projection pursuit regression is a well-known and useful example.
- sigmoidal neural networks can be viewed as a method of projection regression.
- they want to use factor analysis, which assumes that the observed data is generated from a low-dimensional distribution with a limited number of latent variables related to the output via a transformation matrix + noise. (PCA/ wiener filter)
- problem: the factor analysis must represent all high-variance dimensions in the data, even if it is irrelevant for the output.
- solution: use joint input and output space projection to avoid elimination of regression-important dimensions.
----
- practical details: they use the LPWR algorithm to model the inverse dynamics of their 7DOF hydraulically-actuated gripper arm. That is, they applied random torques while recording the resulting accelerations, velocities, and angles, then fit a function to predict torques from these variables. The robot was compliant and not very well modeled with a rigid body model, though they tried this. The resulting LPWR generated model was 27 to 7, predicted torques. The control system uses this functional approximation to compute torques from desired trajectories, i think. The desired trajectories are generated using spline-smoothing ?? and the control system is adaptive in addition to the LPWR approximation being adaptive.
- The core of the LPWR is partial-least squares regression / progression pursuit, coupled with gaussian kernels and a distance metric (just a matrix) learned via constrained gradient descent with cross-validation. The partial least squares (PLS) appears to be very popular in many fields, and there are an number of ways of computing it. Distance metric can expand without limit, and overlap freely. Local models are added based on MSE, i think, and model adding stops when the space is well covered.
- I think this technique is very powerful - you separate the the function evaluation from the error minimization, to avoid the problem of ambiguous causes. Instead, when applying the LPWR to the robot, the torques cause the angles and accelerations -> but you invert this relationship: want to control the torques given trajectory. Of course, the whole function approximation is stationary in time - the p/v/a is sufficient to describe the state and the required torques. Does the brain work in the same way? do random things, observe consequences, work in consequence space and invert ?? e.g. i contracted my bicep and it caused my hand to move to the face; now I want my hand to move to my face again, what caused that? Need reverse memory... or something. Hmm. let's go back to conditional learning: if any animal does an action, and subsequently it is rewarded, it will do that action again. if this is conditional on a need, then that action will be performed only when needed.. when habitual, the action will be performed no matter what.. this is the nature of all animals, i think, and corresponds to rienforcement learning? but how? I suppose it's all about memory, and assigning credit where credit is due. the same problem is dealt with rienforcement learning. and yet things like motor learning seem so far out of this paradigm - they are goal-directed and minimize some sort of error. eh, not really. Clementine is operating on the conditioned response now - has little in the way of error. but gradually this will be built; with humans, it is built very quickly by reuse of existing modes. or conciousness.
- back to the beginning: you dont have to regress into output space - you regress into sensory space, and do as much as possible in that sensory space for control. this is very powerful, and the ISO learning people (Porr et al) have effectively discovered this: you minimize in sensory space.
- does this abrogate the need for backprop? we are continually causality-inverting machines; we are prredictive.
____References____
[0] Vijayakumar S, D'Souza A, Schaal S, Incremental online learning in high dimensions.Neural Comput 17:12, 2602-34 (2005 Dec) |
|