m8ta
You are not authenticated, login. 

{1575}  
 
{1574} 
ref: 0
tags: ocaml application functional programming
date: 10112022 21:36 gmt
revision:2
[1] [0] [head]


https://stackoverflow.com/questions/26475765/ocamlfunctionwithvariablenumberofarguments From this I learned that in ocaml you can return not just functions (e.g. currying) but appliations of yettobe named functions. let sum f = f 0 ;; let arg a b c = c ( b + a ) ;; let z a = a ;; then sum (arg 1) ;; is welltyped as (int > `a) > `a = <fun> e.g. an application of a function that converts int to `a. Think of it as the application of Xa to argument ( 0 + 1 ), where Xa is the argument (per type signature). Zero is supplied by the definition of 'sum'. sum (arg 1) (arg 2);; can be parsed as (sum (arg 1)) (arg 2) ;; '(arg 2)' outputs an application of an int & a yetto be determined function to 'a, E.g. it's typed as int > (int > `a) > `a = <fun>. So, you can call it Xa passed to above. Or, Xa = Xb( ( 0 + 1 ) + 2) where, again, Xb is a yettobe defined function that is supplied as an argument. Therefore, you can collapse the whole chain with the identity function z. But, of course, it could be anything else  square root perhaps for MSE? All very clever.  
{1573}  
PMID36070680 Extracellular vesicles mediate the communication of adipose tissue with brain and promote cognitive impairment associated with insulin resistance
 
{1571}  
One model for the learning of language
A more interesting result is Deep symbolic regression for recurrent sequences, where the authors (facebook/meta) use a Transformer  in this case, directly taken from Vaswini 2017 (8head, 8layer QKV w/ a latent dimension of 512) to do both symbolic (estimate the algebraic recurrence relation) and numeric (estimate the rest of the sequence) training / evaluation. Symbolic regression generalizes better, unsurprisingly. But both can be made to work even in the presence of (logscaled) noise! While the language learning paper shows that small generative programs can be inferred from a few samples, the Meta symbolic regression shows that Transformers can evince either amortized memory (less likely) or algorithms for perception  both new and interesting. It suggests that 'even' abstract symbolic learning tasks are sufficiently decomposable that the sorts of algorithms available to an 8layer transformer can give a useful search heuristic. (N.B. That the transformer doesn't spit out perfect symbolic or numerical results directly  it also needs postprocessing search. Also, the transformer algorithm has search (in the form of softmax) baked in to it's architecture.) This is not a light architecture: they trained the transformer for 250 epochs, where each epoch was 5M equations in batches of 512. Each epoch took 1 hour on 16 Volta GPUs w 32GB of memory. So, 4k GPUhours x ~10 TFlops = 1.4e20 Flops. Compare this with grammar learning above; 7 days on 32 cores operating at ~ 3Gops/sec is 1.8e15 ops. Much, much smaller compute. All of this is to suggest a central theme of computer science: a continuum between search and memorization.
Most interesting for a visual neuroscientist (not that I'm one per se, but bear with me) is where on these axes (search, heuristic, memory) visual perception is. Clearly there is a high degree of recurrence, and a high degree of plasticity / learning. But is there search or local optimization? Is this coupled to the recurrence via some form of energyminimizing system? Is recurrence approximating EM?  
{1572} 
ref: 2019
tags: Piantadosi cogntion combinators function logic
date: 09052022 01:57 gmt
revision:0
[head]


 
{1570}  
Kickback cuts Backprop's redtape: Biologically plausible credit assignment in neural networks Bit of a meh  idea is, rather than propagating error signals backwards through a hierarchy, you propagate only one layer + use a signed global reward signal. This works by keeping the network ‘coherent’  positive neurons have positive input weights, and negative neurons have negative weights, such that the overall effect of a weight change does not change sign when propagated forward through the network. This is kind of a lame shortcut, imho, as it limits the types of functions that the network can model & the computational structure of the network. This is already quite limited by the dotproductrectifier common structure (as is used here). Much more interesting and possibly necessary (given much deeper architectures now) is to allow units to change sign. (Open question as to whether they actually frequently do!). As such, the model is in the vein of "how do we make backprop biologically plausible by removing features / communication" rather than "what sorts of signals and changes does the brain use perceive and generate behavior". This is also related to the literature on what ResNets do; what are the skip connections for? Amthropic has some interesting analyses for Transformer architectures, but checking the literature on other resnets is for another time.  
{1569} 
ref: 2022
tags: symbolic regression facebook AI transformer
date: 05172022 20:25 gmt
revision:0
[head]


Deep symbolic regression for recurrent sequences Surprisingly, they do not do any network structure changes; it’s Vaswini 2017w/ a 8head, 8 layer transformer (sequence to sequence, not decoder only) with a latent dimension of 512. Significant work was in feature / representation engineering (e.g. base10k representations of integers and fixedprecision representations of floatingpoint numbers. (both of these involve a vocabulary size of ~10k ... amazing still that this works..)) + the significant training regimen they worked with (16 Turing GPUs, 32gb ea). Note that they do perform a bit of beamsearch over the symbolic regressions by checking how well each node fits to the starting sequence, but the models work even without this degree of refinement. (As always, there undoubtedly was significant effort spent in simply getting everything to work) The paper does both symbolic (estimate the algebraic recurence relation) and numeric (estimate the rest of the sequence) training / evaluation. Symbolic regression generalizes better, unsurprisingly. But both can be made to work even in the presence of (logscaled) noise! Analysis of how the transformers work for these problems is weak; only one figure showing that the embeddings of the integers follows some meandering but continuous path in tSNE space. Still, the trained transformer is able to usually best handcoded sequence inference engine(s) in Mathematica, and does so without memorizing all of the training data. Very impressive and important result, enough to convince that this learned representation (and undiscovered cleverness, perhaps) beats human mathematical engineering, which probably took longer and took more effort. It follows, without too much imagination (but vastly more compute), that you can train an 'automatic programmer' in the very same way.  
{1568}  
Burstdependent synaptic plasticity can coordinate learning in hierarchical circuits
 
{1567} 
ref: 0
tags: evolution simplicity symmetry kolmogorov complexity polyominoes protein interactions
date: 04212022 18:22 gmt
revision:5
[4] [3] [2] [1] [0] [head]


Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution
The paper features a excellent set of references, including:
Letter to a friend following her article Machine learning in evolutionary studies comes of age Read your PNAS article last night, super interesting that you can get statistical purchase on longlost evolutionary 'sweeps' via GANs and other neural network models. I feel like there is some sort of statistical power issue there? DNNs are almost always overparameterized... slightly suspicious. This morning I was sleepily mulling things over & thought about a walking conversation that we had a long time ago in the woods of NC: Why is evolution so effective? Why does it seem to evolve to evolve? Thinking more  and having years more perspective  it seems almost obvious in retrospect: it's a consequence of Bayes' rule. Evolution finds solutions in spaces that have overwhelming prevalence of working solutions. The prior has an extremely strong effect. These representational / structural spaces by definition have many nearby & associated solutions, hence appear posthoc 'evolvable'. (You probably already know this.) I think proteins very much fall into this category: AA were added to the translation machinery based on ones that happened to solve a particular problem... but because of the 'generalization prior' (to use NN parlance), they were useful for many other things. This does not explain the humanengineeringlike modularity of mature evolved systems, but maybe that is due to the strong simplicity prior [1] Very very interesting to me is how the science of evolution and neural networks are drawing together, vis a vis the lottery ticket hypothesis. Both evince a continuum of representational spaces, too, from highdimensional vectoral (how all modern deep learning systems work) to lowdimensional modular, specific, and general (phenomenological human cognition). I suspect that evolution uses a form of this continuum, as seen in the human highdimensional longrange gene regulatory / enhancer network (= a structure designed to evolve). Not sure how selection works here, though; it's hard to search a highdimensional space. The brain has an almost identical problem: it's hard to do 'credit assignment' in a billionslarge, deep and recurrent network. Finding which set of synapses caused a good / bad behaviior takes a lot of bits.  
{1566}  
Interactions between learning and evolution
Altogether (historically) interesting, but some of these ideas might well have been anticipated by some simple hand calculations.  
{1565}  
Compiling a list of saturated matrixmatrix gflops for various Nvidia GPUs.
 
{1564}  
“Visualizing data using tSNE”
 
{1563}  
The Sony Xperia XZ1 compact is a better phone than an Apple iPhone 12 mini I don't normally write any personal options here  just halffinished paper notes riddled with typos (haha)  but this one has been bothering me for a while. November 2020 I purchased an iPhone 12 mini to replace my aging Sony Xperia XZ1 compact. (Thinking of staying with Android, I tried out a Samsung S10e as well, but didn't like it.) Having owned and used the iPhone for a year and change, I still prefer the Sony. Here is why:
Summary: I'll try to get my moneys worth out of the iPhone; when it dies, will buy the smallest waterproof Android phone that supports my carrier's bands.  
{1561}  
Cortical response selectivity derives from strength in numbers of synapses
 
{842}  
Distilling freeform natural laws from experimental data
Since his Phd, Michael Schmidt has gone on to found Nutonian, which produced Eurequa software, apparently without dramatic new features other than being able to use the cloud for equation search. (Probably he improved many other detailed facets of the software..). Nutonian received $4M in seed funding, according to Crunchbase. In 2017, Nutonian was acquired by Data Robot (for an undisclosed amount), where Michael has worked since, rising to the title of CTO. Always interesting to follow up on the authors of these classic papers!  
{1562}  
Modern SAT solvers: fast, neat and underused (part 1 of N) A set of posts that are worth rereading.  
{1560}  
 
{1559}  
Some investigations into denoising models & their intellectual lineage: Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015
Generative Modeling by Estimating Gradients of the Data Distribution July 2019
Denoising Diffusion Probabilistic Models June 2020
Improved Denoising Diffusion Probabilistic Models Feb 2021
Diffusion Models Beat GANs on Image Synthesis May 2021
In all of above, it seems that the inversediffusion function approximator is a minor player in the paper  but of course, it's vitally important to making the system work. In some sense, this 'diffusion model' is as much a means of training the neural network as it is a (rather inefficient, compared to GANs) way of sampling from the data distribution. In Nichol & Dhariwal Feb 2021, they use a Unet convolutional network (e.g. start with few channels, downsample and double the channels until there are 128256 channels, then upsample x2 and half the channels) including multiheaded attention. Ho 2020 used singleheaded attention only at the 16x16 level. Ho 2020 in turn was based on PixelCNN++
which is an improvement to (e.g. add seltattention layers) Conditional Image Generation with PixelCNN Decoders
Most recently, GLIDE: Towards Photorealistic Image Generation and Editing with TextGuided Diffusion Models
Added textconditional generation + many more parameters + much more compute to yield very impressive image results + inpainting. This last effect is enabled by the fact that it's a full generative denoising probabilistic model  you can condition on other parts of the image!  
{1558} 
ref: 2021
tags: hippocampal behavior scale plasticity Magee Romani Bittner
date: 12202021 22:39 gmt
revision:0
[head]


Bidirectional synaptic plasticity rapidly modifies hippocampal representations
I'm still not 100% sure that this excludes any influence on presynaptic activity ... they didn't control for that. But certainly LTD in their model does not require postsynaptic activity; indeed, it may only require netsynaptic homeostasis.  
{1557}  
The fact that sVD works at all, and pulls out some structure is interesting! Not nearly as good as word2vec. 