m8ta
You are not authenticated, login. 

{1578}  
God Help us, let's try to understand AI monosemanticity Commentary: To some degree, superposition seems like a geometric "hack" invented in the process of optimization to squeeze a great many (largely mutuallyexclusive) sparse features into a limited number of neurons. GPT3 has a latent dimension of only 96 * 128 = 12288, and with 96 layers this is only 1.17 M neurons (*). A fruit fly has 100k neurons (and can't speak). All communication must be through that 12288 dimensional vector, which is passed through LayerNorm many times (**), so naturally the network learns to take advantage of locally linear subspaces. That said, the primate visual system does seem to use superposition, though not via local subspaces; instead, neurons seem to encode multiple axes somewhat linearly (e.g. global spaces: linearly combined position and class) That was a few years ago, and I suspect that new results may contest this. The face area seems to do a good job of disentanglement, for example. Treating everything as highdimensional vectors is great for analogy making, like the wife  husband + king = queen example. But having fixedsize vectors for representing arbitrarydimensioned relationships inevitably leads to compression ~= superposition. Provided those subspaces are semantically meaningful, it all works out from a generalization standpoint  but this is then equivalent to allocating an additional axis for said relationship or attribute. Additional axes would also put less decoding burden on the downstream layers, and make optimization easier. Google has demonstrated allocation in transformers. It's also prevalent in the cortex. Trick is getting it to work! (*) GPT4 is unlikely to have more than an order of magnitude more 'neurons'; PaLM540B has only 2.17 M. Given that GPT4 is something like 34x larger, it should have 68 M neurons, which is still 3 orders of magnitude fewer than the human neocortex (nevermind the cerebellum ;) (**) I'm of two minds on LayerNorm. PV interneurons might be seen to do something like this, but it's all local  you don't need everything to be vector rotations. (LayerNorm effectively removes one degree of freedom, so really it's a 12287 dimensional vector) Update: After reading https://transformercircuits.pub/2023/monosemanticfeatures/index.html, I find the idea of local manifolds / local codes to be quite appealing: why not represent sparse yet conditional features using superposition? This also expands the possibility of pseudohierarchical representation, which is great. 