{1545} revision 2 modified: 08-03-2021 06:12 gmt |

Self-organizaton in a perceptual network - Ralph Linsker, 1988.
- One of the first (verbose, slightly diffuse) investigations of the properties of linear projection neurons (e.g. dot-product; no non-linearity) to express useful tuning functions.
- ''Useful' is here information-preserving, in the face of noise or dimensional bottlenecks (like PCA).
- Starts with Hebbian learning functions, and shows that this + white-noise sensory input + some local topology, you can get simple and complex visual cell responses.
- Ralph notes that neurons in primate visual cortex are tuned
*in utero*-- prior real-world visual experience! Wow. (Who did these studies?) - This is a very minimalistic starting point; there isn't even structured stimuli (!)
- Single neuron (and later, multiple neurons) are purely feed-forward; author cautions that a lack of feedback is not biologically realistic.
- Also note that this was back in the Motorola 680x0 days ... computers were not that powerful (but certainly could handle more than 1-2 neurons!)
- Linear algebra shows that Hebbian synapses cause a linear layer to learn the covariance function of their inputs, $Q$ , with no dependence on the actual layer activity.
- When looked at in terms of an energy function, this is equivalent to gradient descent to maximize the layer-output variance.
- He also hits on:
- Hopfield networks,
- PCA,
- Oja's constrained Hebbian rule $\delta w_i \propto < L_2(L_1 - L_2 w_i) >$ (that is, a quadratic constraint on the weight to make $\Sigma w^2 \sim 1$ )
- Optimal linear reconstruction in the presence of noise
- Mutual information between layer input and output (I found this to be a bit hand-wavey)
- Yet he notes critically: "but it is not true that maximum information rate and maximum activity variance coincide when the probability distribution of signals is arbitrary".
- Indeed. The world is characterized by very non-Gaussian structured sensory stimuli.
- Redundancy and diversity in 2-neuron coding model.
- Role of infomax in maximizing the determinant of the weight matrix, sorta.
One may critically challenge the infomax idea: we very much need to (and do) throw away spurious or irrelevant information in our sensory streams; what upper layers 'care about' when making decisions is certainly relevant to the lower layers. This credit-assignment is neatly solved by backprop, and there are a number 'biologically plausible' means of performing it, but both this and infomax are maybe avoiding the problem. What might the upper layers really care about? Likely 'care about' is an emergent property of the interacting local learning rules and network structure. Can you search directly in these domains, within biological limits, and motivated by statistical reality, to find unsupervised-learning networks? You'll still need a way to rank the networks, hence an objective 'care about' function. Sigh. Either way, I don't per se put a lot of weight in the infomax principle. It could be useful, but is only part of the story. Otherwise Linsker's discussion is accessible, lucid, and prescient. Lol. |