use https for features.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -2017 tags: attention transformer language model youtube google tech talk date: 02-26-2019 20:28 gmt revision:3 [2] [1] [0] [head]

Attention is all you need

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • Attention is all you need neural network models
  • Good summary, along with: The Illustrated Transformer (please refer to this!)
  • Ɓukasz Kaiser mentions a few times how fragile the network is -- how easy it is to make something that doesn't train at all, or how many tricks by google experts were needed to make things work properly. it might be bravado or bluffing, but this is arguably not the way that biology fails.
  • Encoding:
  • Input is words encoded as 512-length vectors.
  • Vectors are transformed into length 64 vectors: query, key and value via differentiable weight matrices.
  • Attention is computed as the dot-product of the query (current input word) with the keys (values of the other words).
    • This value is scaled and passed through a softmax function to result in one attentional signal scaling the value.
  • Multiple heads' output are concatenated together, and this output is passed through a final weight matrix to produce a final value for the next layer.
    • So, attention in this respect looks like a conditional gain field.
  • 'Final value' above is then passed through a single layer feedforward net, with resnet style jump.
  • Decoding:
  • Use the attentional key value from the encoder to determine the first word through the output encoding (?) Not clear.
  • Subsequent causal decodes depend on the already 'spoken' words, plus the key-values from the encoder.
  • Output is a one-hot softmax layer from a feedforward layer; the sum total is differentiable from input to output using cross-entropy loss or KL divergence.

hide / / print
ref: bookmark-0 tags: language learning year french brain hack date: 09-03-2007 04:13 gmt revision:2 [1] [0] [head]

http://mirror.mricon.com/french/french.html -- "how i learned french in a year"

  • verbiste : verb conjugator for linux (Gnome)
  • When talking about software, it was FredBrooks in TheMythicalManMonth who said that people will always reinvent the wheel because it is intrinsically easier and more fun to write your own code than it is read someone else's code.