m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1543}
hide / / print
ref: -2019 tags: backprop neural networks deep learning coordinate descent alternating minimization date: 07-21-2021 03:07 gmt revision:1 [0] [head]

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

  • This paper is sort-of interesting: rather than back-propagating the errors, you optimize auxiliary variables, pre-nonlinearity 'codes' in a last-to-first layer order. The optimization is done to minimize a multimodal logistic loss function; math is not done to minimize other loss functions, but presumably this is not a limit. The loss function also includes a quadratic term on the weights.
  • After the 'codes' are set, optimization can proceed in parallel on the weights. This is done with either straight SGD or adaptive ADAM.
  • Weight L2 penalty is scheduled over time.

This is interesting in that the weight updates can be cone in parallel - perhaps more efficient - but you are still propagating errors backward, albeit via optimizing 'codes'. Given the vast infractructure devoted to auto-diff + backprop, I can't see this being adopted broadly.

That said, the idea of alternating minimization (which is used eg for EM clustering) is powerful, and this paper does describe (though I didn't read it) how there are guarantees on the convexity of the alternating minimization. Likewise, the authors show how to improve the performance of the online / minibatch algorithm by keeping around memory variables, in the form of covariance matrices.

{817}
hide / / print
ref: Friston-2010.02 tags: free energy minimization life learning large theories date: 06-08-2010 13:59 gmt revision:2 [1] [0] [head]

My letter to a friend regarding images/817_1.pdf The free-energy principle: a unified brain theory? PMID-20068583 -- like all critics, i feel the world will benefit from my criticism ;-) Hey , I did read that paper on the plane, and wrote down some comments, but haven't had a chance to actually send them until now. err..anyway.. might as well send them since I did bother writing stuff down: I thought the paper was interesting, but rather specious, especially the way the author makes 'surprise' something to be minimized. This is blatantly false! Humans and other mammals (at least) like being surprised (in the normal meaning of the word). He says things like: "This is where free energy comes in: free energy is an upper bound on surprise, which means that if agents minimize free energy, they implicity minimize surprise -- a huge logical jump, and not one that I'm willing to accept. I feel like this author is trying to capitalize on some recent developments, like variational bayes and ensemble learning, without fully understanding them or having the mathematical chops (like Hayen) to flesh it out. So far as I understand, large theories (as this proposes to be) are useful in that they permit derivation of particular update equations; Variational Bayes for example takes the Kullbeck-Leibler divergence & a factorization of the posterior to create EM update equations. So, even if the free energy idea is valid, the author uses it at such a level to make no useful, mathy predictions. One area where I agree with him is that the nervous system create a model of the internal world, for the purpose of prediction. Yes, maybe this allows 'surprise' to be minimized. But animals minimize surprise not because of free energy, but rather for the much more quotidian reason that surprise can be dangerous. Finally, i wholly reject the idea that value and surprise can be equated or even similar. They seem orthogonal to me! Value is assigned to things that help an animal survive and multiply, surprise is things it's nervous system does not expect. All these things make sense when cast against the theories of evolurion and selection. Perhaps, perhaps selection is a consequence of decreasing free energy - this intuitively and somewhat amorphously/mystically makes sense (the aggregate consequence of life on earth is somehow order, harmony and other 'goodstuff' (but this is an anthropocentric view)) - but if so the author should be able to make more coherent / mathematical prediction of observed phenomena. Eg. why animals locally violate the second law of thermodynamics. Despite my critique, thanks for sending the article, made me think. Maybe you don't want to read it now and I saved you some time ;-)