m8ta
You are not authenticated, login. 

{1569} 
ref: 2022
tags: symbolic regression facebook AI transformer
date: 05172022 20:25 gmt
revision:0
[head]


Deep symbolic regression for recurrent sequences Surprisingly, they do not do any network structure changes; it’s Vaswini 2017w/ a 8head, 8 layer transformer (sequence to sequence, not decoder only) with a latent dimension of 512. Significant work was in feature / representation engineering (e.g. base10k representations of integers and fixedprecision representations of floatingpoint numbers. (both of these involve a vocabulary size of ~10k ... amazing still that this works..)) + the significant training regimen they worked with (16 Turing GPUs, 32gb ea). Note that they do perform a bit of beamsearch over the symbolic regressions by checking how well each node fits to the starting sequence, but the models work even without this degree of refinement. (As always, there undoubtedly was significant effort spent in simply getting everything to work) The paper does both symbolic (estimate the algebraic recurence relation) and numeric (estimate the rest of the sequence) training / evaluation. Symbolic regression generalizes better, unsurprisingly. But both can be made to work even in the presence of (logscaled) noise! Analysis of how the transformers work for these problems is weak; only one figure showing that the embeddings of the integers follows some meandering but continuous path in tSNE space. Still, the trained transformer is able to usually best handcoded sequence inference engine(s) in Mathematica, and does so without memorizing all of the training data. Very impressive and important result, enough to convince that this learned representation (and undiscovered cleverness, perhaps) beats human mathematical engineering, which probably took longer and took more effort. It follows, without too much imagination (but vastly more compute), that you can train an 'automatic programmer' in the very same way. 