m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Li CS, Padoa-Schioppa C, Bizzi E, Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field.Neuron 30:2, 593-607 (2001 May)[1] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)

[0] Caminiti R, Johnson PB, Galli C, Ferraina S, Burnod Y, Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.J Neurosci 11:5, 1182-97 (1991 May)

[0] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)[1] Caminiti R, Johnson PB, Galli C, Ferraina S, Burnod Y, Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.J Neurosci 11:5, 1182-97 (1991 May)

{1548}
hide / / print
ref: -2021 tags: gated multi layer perceptrons transformers ML Quoc_Le Google_Brain date: 08-05-2021 06:00 gmt revision:4 [3] [2] [1] [0] [head]

Pay attention to MLPs

  • Using bilinear / multiplicative gating + deep / wide networks, you can attain similar accuracies as Transformers on vision and masked language learning tasks! No attention needed, just a in-network multiplicative term.
  • And the math is quite straightforward. Per layer:
    • Z=σ(XU),,Z^=s(Z),,Y=Z^V Z = \sigma(X U) ,, \hat{Z} = s(Z) ,, Y = \hat{Z} V
      • Where X is the layer input, σ\sigma is the nonlinearity (GeLU), U is a weight matrix, Z^\hat{Z} is the spatially-gated Z, and V is another weight matrix.
    • s(Z)=Z 1(WZ 2+b) s(Z) = Z_1 \odot (W Z_2 + b)
      • Where Z is divided into two parts along the channel dimension, Z 1Z 2Z_1 Z_2 . 'circleDot' is element-wise multiplication, and W is a weight matrix.
  • You of course need a lot of compute; this paper has nice figures of model accuracy scaling vs. depth / number of parameters / size. I guess you can do this if you're Google.

Pretty remarkable that an industrial lab freely publishes results like this. I guess the ROI is that they get the resultant improved ideas? Or, perhaps, Google is in such a dominant position in terms of data and compute that even if they give away ideas and code, provided some of the resultant innovation returns to them, they win. The return includes trained people as well as ideas. Good for us, I guess!

{1440}
hide / / print
ref: -2017 tags: attention transformer language model youtube google tech talk date: 02-26-2019 20:28 gmt revision:3 [2] [1] [0] [head]

Attention is all you need

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • Attention is all you need neural network models
  • Good summary, along with: The Illustrated Transformer (please refer to this!)
  • Ɓukasz Kaiser mentions a few times how fragile the network is -- how easy it is to make something that doesn't train at all, or how many tricks by google experts were needed to make things work properly. it might be bravado or bluffing, but this is arguably not the way that biology fails.
  • Encoding:
  • Input is words encoded as 512-length vectors.
  • Vectors are transformed into length 64 vectors: query, key and value via differentiable weight matrices.
  • Attention is computed as the dot-product of the query (current input word) with the keys (values of the other words).
    • This value is scaled and passed through a softmax function to result in one attentional signal scaling the value.
  • Multiple heads' output are concatenated together, and this output is passed through a final weight matrix to produce a final value for the next layer.
    • So, attention in this respect looks like a conditional gain field.
  • 'Final value' above is then passed through a single layer feedforward net, with resnet style jump.
  • Decoding:
  • Use the attentional key value from the encoder to determine the first word through the output encoding (?) Not clear.
  • Subsequent causal decodes depend on the already 'spoken' words, plus the key-values from the encoder.
  • Output is a one-hot softmax layer from a feedforward layer; the sum total is differentiable from input to output using cross-entropy loss or KL divergence.

{289}
hide / / print
ref: Li-2001.05 tags: Bizzi motor learning force field MIT M1 plasticity memory direction tuning transform date: 09-24-2008 22:49 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-11395017[0] Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field

  • this is concerned with memory cells, cells that 'remember' or remain permanently changed after learning the force-field.
  • In the above figure, the blue lines (or rather vertices of the blue lines) indicate the firing rate during the movement period (and 200ms before); angular position indicates the target of the movement. The force-field in this case was a curl field where force was proportional to velocity.
  • Preferred direction of the motor cortical units changed when the preferred driection of the EMGs changed
  • evidence of encoding of an internal model in the changes in tuning properties of the cells.
    • this can suppor both online performance and motor learning.
    • but what mechanisms allow the motor cortex to change in this way???
  • also see [1]

____References____

{565}
hide / / print
ref: Walker-2005.12 tags: algae transfection transformation protein synthesis bioreactor date: 03-21-2008 17:22 gmt revision:1 [0] [head]

Microalgae as bioreactors PMID-16136314

{520}
hide / / print
ref: bookmark-0 tags: DSP Benford's law Fourier transform book date: 12-07-2007 06:14 gmt revision:1 [0] [head]

http://www.dspguide.com/ch34.htm -- awesome!!

{344}
hide / / print
ref: Caminiti-1991.05 tags: transform motor control M1 3D population_vector premotor Caminiti date: 04-09-2007 20:10 gmt revision:2 [1] [0] [head]

PMID-2027042[0] Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.

  • trained monkeys to make similar movements in different parts of external/extrinsic 3D space.
  • change of preferred direction was graded in an orderly manner across extrinsic space.
  • virtually no correlations found to endpoint static position: "virtually all cells were related to the direction and not to the end point of movement" - compare to Graziano!
  • yet the population vector remained an accurate predictor of movement: "Unlike the individual cell preferred directions upon which they are based, movement population vectors did not change their spatial orientation across the work space, suggesting that they remain good predictors of movement direction regardless of the region of space in which movements are made"

____References____

{294}
hide / / print
ref: Caminiti-1990.07 tags: transform motor control M1 3D population_vector premotor Caminiti date: 04-09-2007 20:07 gmt revision:4 [3] [2] [1] [0] [head]

PMID-2376768[0] Making arm movements within different parts of space: dynamic aspects in the primate motor cortex

  • monkeys made similar movements in different parts of external/extrinsic 3D space.
  • change of preferred direction was graded in an orderly manner across extrinsic space.
    • this change closely followed the changes in muscle activation required to effect the observed movements.
  • motor cortical cells can code direction of movement in a way which is dependent on the position of the arm in space
  • implies existence of mechanisms which facilitate the transformation between extrinsic (visual targets) and intrinsic coordinates
  • also see [1]

____References____