m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1572}
hide / / print
ref: -2019 tags: Piantadosi cogntion combinators function logic date: 09-05-2022 01:57 gmt revision:0 [head]

  • The Computational Origin of Representation (2019)
  • from Piantandosi, talks a big game... reviews some seminal literature ...
    • But the argument reduces to the established idea that you can represent boolean logic and arbitrary algorithms with Church encoding through S and K (and some tortuous symbol manipulation..)
    • It seems the Piantadosi was perhaps excited by discovering and understanding combinators?
      • It is indeed super neat (and i didn't wade so deep to really understand it), but the backtracking search procedure embodied in pyChuriso is scarcely close to anything happening in our brains (and such backtracking search is common in CS..)
      • It is overwhelmingly more likely that we approximate other Turning-complete computations, by (evolutionary) luck and education.
      • The last parts of the paper, describing a continuum between combinators, logic, calculus, tensor approximations, and neuroscience is ... very hand-wavey, with no implementation.
        • If you allow me to be hyypercritical, this paper is an excellent literature review, but limited impact for ML practitioners.

{1570}
hide / / print
ref: -0 tags: Balduzzi backprop biologically plausible red-tape date: 05-31-2022 20:48 gmt revision:1 [0] [head]

Kickback cuts Backprop's red-tape: Biologically plausible credit assignment in neural networks

Bit of a meh -- idea is, rather than propagating error signals backwards through a hierarchy, you propagate only one layer + use a signed global reward signal. This works by keeping the network ‘coherent’ -- positive neurons have positive input weights, and negative neurons have negative weights, such that the overall effect of a weight change does not change sign when propagated forward through the network.

This is kind of a lame shortcut, imho, as it limits the types of functions that the network can model & the computational structure of the network. This is already quite limited by the dot-product-rectifier common structure (as is used here). Much more interesting and possibly necessary (given much deeper architectures now) is to allow units to change sign. (Open question as to whether they actually frequently do!). As such, the model is in the vein of "how do we make backprop biologically plausible by removing features / communication" rather than "what sorts of signals and changes does the brain use perceive and generate behavior".

This is also related to the literature on what ResNets do; what are the skip connections for? Amthropic has some interesting analyses for Transformer architectures, but checking the literature on other resnets is for another time.

{1527}
hide / / print
ref: -0 tags: inductive logic programming deepmind formal propositions prolog date: 11-21-2020 04:07 gmt revision:0 [head]

Learning Explanatory Rules from Noisy Data

  • From a dense background of inductive logic programming (ILP): given a set of statements, and rules for transformation and substitution, generate clauses that satisfy a set of 'background knowledge'.
  • Programs like Metagol can do this using search and simplify logic built into Prolog.
    • Actually kinda surprising how very dense this program is -- only 330 lines!
  • This task can be transformed into a SAT problem via rules of logic, for which there are many fast solvers.
  • The trick here (instead) is that a neural network is used to turn 'on' or 'off' clauses that fit the background knowledge
    • BK is typically very small, a few examples, consistent with the small size of the learned networks.
  • These weight matrices are represented as the outer product of composed or combined clauses, which makes the weight matrix very large!
  • They then do gradient descent, while passing the cross-entropy errors through nonlinearities (including clauses themselves? I think this is how recursion is handled.) to update the weights.
    • Hence, SGD is used as a means of heuristic search.
  • Compare this to Metagol, which is brittle to any noise in the input; unsurprisingly, due to SGD, this is much more robust.
  • Way too many words and symbols in this paper for what it seems to be doing. Just seems to be obfuscating the work (which is perfectly good). Again: Metagol is only 330 lines!

{1441}
hide / / print
ref: -2018 tags: biologically inspired deep learning feedback alignment direct difference target propagation date: 03-15-2019 05:51 gmt revision:5 [4] [3] [2] [1] [0] [head]

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

  • Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap
  • As is known, many algorithms work well on MNIST, but fail on more complicated tasks, like CIFAR and ImageNet.
  • In their experiments, backprop still fares better than any of the biologically inspired / biologically plausible learning rules. This includes:
    • Feedback alignment {1432} {1423}
    • Vanilla target propagation
      • Problem: with convergent networks, layer inverses (top-down) will map all items of the same class to one target vector in each layer, which is very limiting.
      • Hence this algorithm was not directly investigated.
    • Difference target propagation (2015)
      • Uses the per-layer target as h^ l=g(h^ l+1;λ l+1)+[h lg(h l+1;λ l+1)]\hat{h}_l = g(\hat{h}_{l+1}; \lambda_{l+1}) + [h_l - g(h_{l+1};\lambda_{l+1})]
      • Or: h^ l=h l+g(h^ l+1;λ l+1)g(h l+1;λ l+1)\hat{h}_l = h_l + g(\hat{h}_{l+1}; \lambda_{l+1}) - g(h_{l+1};\lambda_{l+1}) where λ l\lambda_{l} are the parameters for the inverse model; g()g() is the sum and nonlinearity.
      • That is, the target is modified ala delta rule by the difference between inverse-propagated higher layer target and inverse-propagated higher level activity.
        • Why? h lh_{l} should approach h^ l\hat{h}_{l} as h l+1h_{l+1} approaches h^ l+1\hat{h}_{l+1} .
        • Otherwise, the parameters in lower layers continue to be updated even when low loss is reached in the upper layers. (from original paper).
      • The last to penultimate layer weights is trained via backprop to prevent template impoverishment as noted above.
    • Simplified difference target propagation
      • The substitute a biologically plausible learning rule for the penultimate layer,
      • h^ L1=h L1+g(h^ L;λ L)g(h L;λ L)\hat{h}_{L-1} = h_{L-1} + g(\hat{h}_L;\lambda_L) - g(h_L;\lambda_L) where there are LL layers.
      • It's the same rule as the other layers.
      • Hence subject to impoverishment problem with low-entropy labels.
    • Auxiliary output simplified difference target propagation
      • Add a vector zz to the last layer activation, which carries information about the input vector.
      • zz is just a set of random features from the activation h L1h_{L-1} .
  • Used both fully connected and locally-connected (e.g. convolution without weight sharing) MLP.
  • It's not so great:
  • Target propagation seems like a weak learner, worse than feedback alignment; not only is the feedback limited, but it does not take advantage of the statistics of the input.
    • Hence, some of these schemes may work better when combined with unsupervised learning rules.
    • Still, in the original paper they use difference-target propagation with autoencoders, and get reasonable stroke features..
  • Their general result that networks and learning rules need to be tested on more difficult tasks rings true, and might well be the main point of this otherwise meh paper.

{1445}
hide / / print
ref: -2018 tags: cortex layer martinotti interneuron somatostatin S1 V1 morphology cell type morphological recovery patch seq date: 03-06-2019 02:51 gmt revision:3 [2] [1] [0] [head]

Neocortical layer 4 in adult mouse differs in major cell types and circuit organization between primary sensory areas

  • Using whole-cell recordings with morphological recovery, we identified one major excitatory and seven inhibitory types of neurons in L4 of adult mouse visual cortex (V1).
  • Nearly all excitatory neurons were pyramidal and almost all Somatostatin-positive (SOM+) neurons were Martinotti cells.
  • In contrast, in somatosensory cortex (S1), excitatory cells were mostly stellate and SOM+ cells were non-Martinotti.
  • These morphologically distinct SOM+ interneurons correspond to different transcriptomic cell types and are differentially integrated into the local circuit with only S1 cells receiving local excitatory input.
  • Our results challenge the classical view of a canonical microcircuit repeated through the neocortex.
  • Instead we propose that cell-type specific circuit motifs, such as the Martinotti/pyramidal pair, are optionally used across the cortex as building blocks to assemble cortical circuits.
  • Note preponderance of axons.
  • Classifications:
    • Pyr pyramidal cells
    • BC Basket cells
    • MC Martinotti cells
    • BPC bipolar cells
    • NFC neurogliaform cells
    • SC shrub cells
    • DBC double bouquet cells
    • HEC horizontally elongated cells.
  • Using Patch-seq