m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Schmidt EM, McIntosh JS, Durelli L, Bak MJ, Fine control of operantly conditioned firing patterns of cortical neurons.Exp Neurol 61:2, 349-69 (1978 Sep 1)[1] Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP, Instant neural control of a movement signal.Nature 416:6877, 141-2 (2002 Mar 14)[2] Fetz EE, Operant conditioning of cortical unit activity.Science 163:870, 955-8 (1969 Feb 28)[3] Fetz EE, Finocchio DV, Operant conditioning of specific patterns of neural and muscular activity.Science 174:7, 431-5 (1971 Oct 22)[4] Fetz EE, Finocchio DV, Operant conditioning of isolated activity in specific muscles and precentral cells.Brain Res 40:1, 19-23 (1972 May 12)[5] Fetz EE, Baker MA, Operantly conditioned patterns on precentral unit activity and correlated responses in adjacent cells and contralateral muscles.J Neurophysiol 36:2, 179-204 (1973 Mar)

[0] Bar-Gad I, Morris G, Bergman H, Information processing, dimensionality reduction and reinforcement learning in the basal ganglia.Prog Neurobiol 71:6, 439-73 (2003 Dec)

[0] Won DS, Wolf PD, A simulation study of information transmission by multi-unit microelectrode recordings.Network 15:1, 29-44 (2004 Feb)

[0] Li CS, Padoa-Schioppa C, Bizzi E, Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field.Neuron 30:2, 593-607 (2001 May)[1] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)

[0] Caminiti R, Johnson PB, Galli C, Ferraina S, Burnod Y, Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.J Neurosci 11:5, 1182-97 (1991 May)

[0] Caminiti R, Johnson PB, Urbano A, Making arm movements within different parts of space: dynamic aspects in the primate motor cortex.J Neurosci 10:7, 2039-58 (1990 Jul)[1] Caminiti R, Johnson PB, Galli C, Ferraina S, Burnod Y, Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.J Neurosci 11:5, 1182-97 (1991 May)

{842}
hide / / print
ref: work-0 tags: distilling free-form natural laws from experimental data Schmidt Cornell automatic programming genetic algorithms date: 12-30-2021 05:11 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Distilling free-form natural laws from experimental data

  • The critical step was to use the full set of all pairs of partial derivatives ( δx/δy\delta x / \delta y ) to evaluate the search for invariants.
  • The selection of which partial derivatives are held to be independent / which variables are dependent is a bit of a trick too -- see the supplemental information.
    • Even yet, with a 4D data set the search for natural laws took ~ 30 hours.
  • This was via a genetic algorithm, distributed among 'islands' on different CPUs, with mutation and single-point crossover.
  • Not sure what the IL is, but it appears to be floating-point assembly.
  • Timeseries data is smoothed with Loess smoothing, which fits a polynomial to the data, and hence allows for smoother / more analytic derivative calculation.
    • Then again, how long did it take humans to figure out these invariants? (Went about it in a decidedly different way..)
    • Further, how long did it take for biology to discover similar 'design equations'?
      • The same algorithm has been applied to biological data - a metabolic pathway - with some success pub 2011.
      • Of course evolution had to explore a much larger space - proteins and regulatory pathways, not simpler mathematical expressions / linkages.


Since his Phd, Michael Schmidt has gone on to found Nutonian, which produced Eurequa software, apparently without dramatic new features other than being able to use the cloud for equation search. (Probably he improved many other detailed facets of the software..). Nutonian received $4M in seed funding, according to Crunchbase.

In 2017, Nutonian was acquired by Data Robot (for an undisclosed amount), where Michael has worked since, rising to the title of CTO.

Always interesting to follow up on the authors of these classic papers!

{1556}
hide / / print
ref: -0 tags: concept net NLP transformers graph representation knowledge date: 11-04-2021 17:48 gmt revision:0 [head]

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

  • From a team at University of Washington / Allen institute for artificial intelligence/
  • Courtesy of Yannic Kilcher's youtube channel.
  • General idea: use GPT-3 as a completion source given a set of prompts, like:
    • X starts running
      • So, X gets in shape
    • X and Y engage in an argument
      • So, X wants to avoid Y.
  • There are only 7 linkage atoms (edges, so to speak) in these queries, but of course many actions / direct objects.
    • These prompts are generated from the Atomic 20-20 human-authored dataset.
    • The prompts are fed into 175B parameter DaVinci model, resulting in 165k examples in the 7 linkages after cleaning.
    • In turn the 165k are fed into a smaller version of GPT-3, Curie, that generates 6.5M text examples, aka Atomic 10x.
  • Then filter the results via a second critic model, based on fine-tuned RoBERTa & human supervision to determine if a generated sentence is 'good' or not.
  • By throwing away 62% of Atomic 10x, they get a student accuracy of 96.4%, much better than the human-designed knowledge graph.
    • They suggest that one way thins works is by removing degenerate outputs from GPT-3.

Human-designed knowledge graphs are described here: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

And employed for profit here: https://www.luminoso.com/

{1544}
hide / / print
ref: -2019 tags: HSIC information bottleneck deep learning backprop gaussian kernel date: 10-06-2021 17:23 gmt revision:5 [4] [3] [2] [1] [0] [head]

The HSIC Bottleneck: Deep learning without Back-propagation

In this work, the authors use a kernelized estimate of statistical independence as part of a 'information bottleneck' to set per-layer objective functions for learning useful features in a deep network. They use the HSIC, or Hilbert-schmidt independence criterion, as the independence measure.

The information bottleneck was proposed by Bailek (spikes..) et al in 1999, and aims to increase the mutual information between the layer representation and the labels while minimizing the mutual information between the representation and the input:

minP T i|XI(X;T i)βI(T i;Y)\frac{min}{P_{T_i | X}} I(X; T_i) - \beta I(T_i; Y)

Where T iT_i is the hidden representation at layer i (later output), XX is the layer input, and YY are the labels. By replacing I()I() with the HSIC, and some derivation (?), they show that

HSIC(D)=(m1) 2tr(K XHK YH)HSIC(D) = (m-1)^{-2} tr(K_X H K_Y H)

Where D=(x 1,y 1),...(x m,y m)D = {(x_1,y_1), ... (x_m, y_m)} are samples and labels, K X ij=k(x i,x j)K_{X_{ij}} = k(x_i, x_j) and K Y ij=k(y i,y j)K_{Y_{ij}} = k(y_i, y_j) -- that is, it's the kernel function applied to all pairs of (vectoral) input variables. H is the centering matrix. The kernel is simply a Gaussian kernel, k(x,y)=exp(1/2||xy|| 2/σ 2)k(x,y) = exp(-1/2 ||x-y||^2/\sigma^2) . So, if all the x and y are on average independent, then the inner-product will be mean zero, the kernel will be mean one, and after centering will lead to zero trace. If the inner product is large within the realm of the derivative of the kernel, then the HSIC will be large (and negative, i think). In practice they use three different widths for their kernel, and they also center the kernel matrices.

But still, the feedback is an aggregate measure (the trace) of the product of two kernelized (a nonlinearity) outer-product spaces of similarities between inputs. it's not unimaginable that feedback networks could be doing something like this...

For example, a neural network could calculate & communicate aspects of joint statistics to reward / penalize weights within a layer of a network, and this is parallelizable / per layer / adaptable to an unsupervised learning regime. Indeed, that was done almost exactly by this paper: Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks albeit in a much less intelligible way.


Robust Learning with the Hilbert-Schmidt Independence Criterion

Is another, later, paper using the HSIC. Their interpretation: "This loss-function encourages learning models where the distribution of the residuals between the label and the model prediction is statistically independent of the distribution of the instances themselves." Hence, given above nomenclature, E X(P T i|XI(X;T i))=0 E_X( P_{T_i | X} I(X ; T_i) ) = 0 (I'm not totally sure about the weighting, but might be required given the definition of the HSIC.)

As I understand it, the HSIC loss is a kernellized loss between the input, output, and labels that encourages a degree of invariance to input ('covariate shift'). This is useful, but I'm unconvinced that making the layer output independent of the input is absolutely essential (??)

{1548}
hide / / print
ref: -2021 tags: gated multi layer perceptrons transformers ML Quoc_Le Google_Brain date: 08-05-2021 06:00 gmt revision:4 [3] [2] [1] [0] [head]

Pay attention to MLPs

  • Using bilinear / multiplicative gating + deep / wide networks, you can attain similar accuracies as Transformers on vision and masked language learning tasks! No attention needed, just a in-network multiplicative term.
  • And the math is quite straightforward. Per layer:
    • Z=σ(XU),,Z^=s(Z),,Y=Z^V Z = \sigma(X U) ,, \hat{Z} = s(Z) ,, Y = \hat{Z} V
      • Where X is the layer input, σ\sigma is the nonlinearity (GeLU), U is a weight matrix, Z^\hat{Z} is the spatially-gated Z, and V is another weight matrix.
    • s(Z)=Z 1(WZ 2+b) s(Z) = Z_1 \odot (W Z_2 + b)
      • Where Z is divided into two parts along the channel dimension, Z 1Z 2Z_1 Z_2 . 'circleDot' is element-wise multiplication, and W is a weight matrix.
  • You of course need a lot of compute; this paper has nice figures of model accuracy scaling vs. depth / number of parameters / size. I guess you can do this if you're Google.

Pretty remarkable that an industrial lab freely publishes results like this. I guess the ROI is that they get the resultant improved ideas? Or, perhaps, Google is in such a dominant position in terms of data and compute that even if they give away ideas and code, provided some of the resultant innovation returns to them, they win. The return includes trained people as well as ideas. Good for us, I guess!

{1527}
hide / / print
ref: -0 tags: inductive logic programming deepmind formal propositions prolog date: 11-21-2020 04:07 gmt revision:0 [head]

Learning Explanatory Rules from Noisy Data

  • From a dense background of inductive logic programming (ILP): given a set of statements, and rules for transformation and substitution, generate clauses that satisfy a set of 'background knowledge'.
  • Programs like Metagol can do this using search and simplify logic built into Prolog.
    • Actually kinda surprising how very dense this program is -- only 330 lines!
  • This task can be transformed into a SAT problem via rules of logic, for which there are many fast solvers.
  • The trick here (instead) is that a neural network is used to turn 'on' or 'off' clauses that fit the background knowledge
    • BK is typically very small, a few examples, consistent with the small size of the learned networks.
  • These weight matrices are represented as the outer product of composed or combined clauses, which makes the weight matrix very large!
  • They then do gradient descent, while passing the cross-entropy errors through nonlinearities (including clauses themselves? I think this is how recursion is handled.) to update the weights.
    • Hence, SGD is used as a means of heuristic search.
  • Compare this to Metagol, which is brittle to any noise in the input; unsurprisingly, due to SGD, this is much more robust.
  • Way too many words and symbols in this paper for what it seems to be doing. Just seems to be obfuscating the work (which is perfectly good). Again: Metagol is only 330 lines!

{305}
hide / / print
ref: Schmidt-1978.09 tags: Schmidt BMI original operant conditioning cortex HOT pyramidal information antidromic date: 03-12-2019 23:35 gmt revision:11 [10] [9] [8] [7] [6] [5] [head]

PMID-101388[0] Fine control of operantly conditioned firing patterns of cortical neurons.

  • Hand-arm area of M1, 11 or 12 chronic recording electrodes, 3 monkeys.
    • But, they only used one unit at a time in the conditioning task.
  • Observed conditioning in 77% of single units and 65% of combined units (multiunits?).
  • Trained to move a handle to a position indicated by 8 annular cursor lights.
    • Cursor was updated at 50hz -- this was just a series of lights! talk about simple feedback...
    • Investigated different smoothing: too fast, FR does not stay in target; too slow, cursor acquires target too slowly.
      • My gamma function is very similar to their lowpass filter used for smoothing the firing rates.
    • 4 or 8 target random tracking task
    • Time-out of 8 seconds
    • Run of 40 trials
      • The conditioning reached a significant level of performance after 2.2 runs of 40 trials (in well-trained monkeys); typically, they did 18 runs/day (720 trials)
  • Recordings:
    • Scalar mapping of unit firing rate to cursor position.
    • Filtered 600-6kHz
    • Each accepted spike triggered a generator that produced a pulse of of constant amplitude and width -> this was fed into a lowpass filter (1.5 to 2.5 & 3.5Hz cutoff), and a gain stage, then a ADC, then (presumably) the PDP.
      • can determine if these units were in the pyramidal tract by measuring antidromic delay.
    • recorded one neuron for 108 days!!
      • Neuronal activity is still being recorded from one monkey 24 months after chronic implantation of the microelectrodes.
    • Average period in which conditioning was attempted was 3.12 days.
  • Successful conditioning was always associated with specific repeatable limb movements
    • "However, what appears to be conditioned in these experiments is a movement, and the neuron under study is correlated with that movement." YES.
    • The monkeys clearly learned to make (increasingly refined) movement to modulate the firing activity of the recorded units.
    • The monkey learned to turn off certain units with specific limb positions; the monkey used exaggerated movements for these purposes.
      • e.g. finger and shoulder movements, isometric contraction in one case.
  • Trained some monkeys or > 15 months; animals got better at the task over time.
  • PDP-12 computer.
  • Information measure: 0 bits for missed targets, 2 for a 4 target task, 3 for 8 target task; information rate = total number of bits / time to acquire targets.
    • 3.85 bits/sec peak with 4 targets, 500ms hold time
    • With this, monkeys were able to exert fine control of firing rate.
    • Damn! compare to Paninski! [1]
  • 4.29 bits/sec when the same task was performed with a manipulandum & wrist movement
  • they were able to condition 77% of individual neurons and 65% of combined units.
  • Implanted a pyramidal tract electrode in one monkey; both cells recorded at that time were pyramidal tract neurons, antidromic latencies of 1.2 - 1.3ms.
    • Failures had no relation to over movements of the monkey.
  • Fetz and Baker [2,3,4,5] found that 65% of precentral neurons could be conditioned for increased or decreased firing rates.
    • and it only took 6.5 minutes, on average, for the units to change firing rates!
  • Summarized in [1].

____References____

{1440}
hide / / print
ref: -2017 tags: attention transformer language model youtube google tech talk date: 02-26-2019 20:28 gmt revision:3 [2] [1] [0] [head]

Attention is all you need

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • Attention is all you need neural network models
  • Good summary, along with: The Illustrated Transformer (please refer to this!)
  • Łukasz Kaiser mentions a few times how fragile the network is -- how easy it is to make something that doesn't train at all, or how many tricks by google experts were needed to make things work properly. it might be bravado or bluffing, but this is arguably not the way that biology fails.
  • Encoding:
  • Input is words encoded as 512-length vectors.
  • Vectors are transformed into length 64 vectors: query, key and value via differentiable weight matrices.
  • Attention is computed as the dot-product of the query (current input word) with the keys (values of the other words).
    • This value is scaled and passed through a softmax function to result in one attentional signal scaling the value.
  • Multiple heads' output are concatenated together, and this output is passed through a final weight matrix to produce a final value for the next layer.
    • So, attention in this respect looks like a conditional gain field.
  • 'Final value' above is then passed through a single layer feedforward net, with resnet style jump.
  • Decoding:
  • Use the attentional key value from the encoder to determine the first word through the output encoding (?) Not clear.
  • Subsequent causal decodes depend on the already 'spoken' words, plus the key-values from the encoder.
  • Output is a one-hot softmax layer from a feedforward layer; the sum total is differentiable from input to output using cross-entropy loss or KL divergence.

{1207}
hide / / print
ref: -0 tags: Shenoy eye position BMI performance monitoring date: 01-25-2013 00:41 gmt revision:1 [0] [head]

PMID-18303802 Cortical neural prosthesis performance improves when eye position is monitored.

  • This proposal stems from recent discoveries that the direction of gaze influences neural activity in several areas that are commonly targeted for electrode implantation in neural prosthetics.
  • Can estimate eye position directly from neural activity & subtract it when performing BMI predictions.

{1087}
hide / / print
ref: Timmermann-2003.01 tags: DBS double tremor oscillations DICS beamforming parkinsons date: 02-29-2012 00:39 gmt revision:4 [3] [2] [1] [0] [head]

PMID-12477707[0] The cerebral oscillatory network of parkinsonian resting tremor.

  • Patients had idiopathic unliateral tremor-dominated PD.
  • MEG + EMG -> coherence analysis. (+ DICS for deep MEG recording).
  • M1 correlated to EMG at tremor and double-tremor frequency following medication withdrawal overnight.
    • M1 leads by 15 - 25 ms, consistent with conduction delay.
  • Unlike other studies, they find that many cortical areas are also coherent / oscillating with M1, including:
    • Cingulate and supplementary motor area (CMA / SMA)
    • Lateral premotor cortex (PM).
    • SII
    • Posterior pareital cortex PPC
    • contralateral cerebellum - strongest at double frequency.
  • In contrast, the cerebellum, SMA/CMA and PM show little evidence for direct coupling with the peripheral EMG but seem to be connected with the periphery via other cerebral areas (e.g. M1)
  • Power spectral analysis of activity in all central areas indicated the strongest frequency coherence at double tremor frequency.
    • Especially cerebro-cerebro coupling.
  • These open-ended observation studies are useful!

____References____

[0] Timmermann L, Gross J, Dirks M, Volkmann J, Freund HJ, Schnitzler A, The cerebral oscillatory network of parkinsonian resting tremor.Brain 126:Pt 1, 199-212 (2003 Jan)

{1132}
hide / / print
ref: -0 tags: mesh silk conformal coating date: 02-21-2012 20:03 gmt revision:0 [head]

PMID-20400953 Dissolvable films of silk fibroin for ultrathin conformal bio-integrated electronics.

  • Mounting such devices on tissue and then allowing the silk to dissolve and resorb initiates a spontaneous, conformal wrapping process driven by capillary forces at the biotic/abiotic interface.
  • Specialized mesh designs and ultrathin forms for the electronics ensure minimal stresses on the tissue and highly conformal coverage, even for complex curvilinear surfaces, as confirmed by experimental and theoretical studies.
    • Wow! cool!
  • polyimide electrode substrates 2.5 - 7.5 um thick. Electrodes were made of anisotropic conductive film.

{255}
hide / / print
ref: BarGad-2003.12 tags: information dimensionality reduction reinforcement learning basal_ganglia RDDR SNR globus pallidus date: 01-16-2012 19:18 gmt revision:3 [2] [1] [0] [head]

PMID-15013228[] Information processing, dimensionality reduction, and reinforcement learning in the basal ganglia (2003)

  • long paper! looks like they used latex.
  • they focus on a 'new model' for the basal ganglia: reinforcement driven dimensionality reduction (RDDR)
  • in order to make sense of the system - according to them - any model must ingore huge ammounts of information about the studied areas.
  • ventral striatum = nucelus accumbens!
  • striatum is broken into two, rough, parts: ventral and dorsal
    • dorsal striatum: the caudate and putamen are a part of the
    • ventral striatum: the nucelus accumbens, medial and ventral portions of the caudate and putamen, and striatal cells of the olifactory tubercle (!) and anterior perforated substance.
  • ~90 of neurons in the striatum are medium spiny neurons
    • dendrites fill 0.5mm^3
    • cells have up and down states.
      • the states are controlled by intrinsic connections
      • project to GPe GPi & SNr (primarily), using GABA.
  • 1-2% of neurons in the striatum are tonically active neurons (TANs)
    • use acetylcholine (among others)
    • fewer spines
    • more sensitive to input
    • TANs encode information relevant to reinforcement or incentive behavior

____References____

{806}
hide / / print
ref: work-0 tags: gaussian random variables mutual information SNR date: 01-16-2012 03:54 gmt revision:26 [25] [24] [23] [22] [21] [20] [head]

I've recently tried to determine the bit-rate of conveyed by one gaussian random process about another in terms of the signal-to-noise ratio between the two. Assume x x is the known signal to be predicted, and y y is the prediction.

Let's define SNR(y)=Var(x)Var(err) SNR(y) = \frac{Var(x)}{Var(err)} where err=xy err = x-y . Note this is a ratio of powers; for the conventional SNR, SNR dB=10*log 10Var(x)Var(err) SNR_{dB} = 10*log_{10 } \frac{Var(x)}{Var(err)} . Var(err)Var(err) is also known as the mean-squared-error (mse).

Now, Var(err)=(xyerr¯) 2=Var(x)+Var(y)2Cov(x,y) Var(err) = \sum{ (x - y - sstrch \bar{err})^2 estrch} = Var(x) + Var(y) - 2 Cov(x,y) ; assume x and y have unit variance (or scale them so that they do), then

2SNR(y) 12=Cov(x,y) \frac{2 - SNR(y)^{-1}}{2 } = Cov(x,y)

We need the covariance because the mutual information between two jointly Gaussian zero-mean variables can be defined in terms of their covariance matrix: (see http://www.springerlink.com/content/v026617150753x6q/ ). Here Q is the covariance matrix,

Q=[Var(x) Cov(x,y) Cov(x,y) Var(y)] Q = \left[ \array{Var(x) & Cov(x,y) \\ Cov(x,y) & Var(y)} \right]

MI=12logVar(x)Var(y)det(Q) MI = \frac{1 }{2 } log \frac{Var(x) Var(y)}{det(Q)}

Det(Q)=1Cov(x,y) 2 Det(Q) = 1 - Cov(x,y)^2

Then MI=12log 2[1Cov(x,y) 2] MI = - \frac{1 }{2 } log_2 \left[ 1 - Cov(x,y)^2 \right]

or MI=12log 2[SNR(y) 114SNR(y) 2] MI = - \frac{1 }{2 } log_2 \left[ SNR(y)^{-1} - \frac{1 }{4 } SNR(y)^{-2} \right]

This agrees with intuition. If we have a SNR of 10db, or 10 (power ratio), then we would expect to be able to break a random variable into about 10 different categories or bins (recall stdev is the sqrt of the variance), with the probability of the variable being in the estimated bin to be 1/2. (This, at least in my mind, is where the 1/2 constant comes from - if there is gaussian noise, you won't be able to determine exactly which bin the random variable is in, hence log_2 is an overestimator.)

Here is a table with the respective values, including the amplitude (not power) ratio representations of SNR. "

SNRAmp. ratioMI (bits)
103.11.6
20103.3
30315.0
401006.6
9031e315
Note that at 90dB, you get about 15 bits of resolution. This makes sense, as 16-bit DACs and ADCs have (typically) 96dB SNR. good.

Now, to get the bitrate, you take the SNR, calculate the mutual information, and multiply it by the bandwidth (not the sampling rate in a discrete time system) of the signals. In our particular application, I think the bandwidth is between 1 and 2 Hz, hence we're getting 1.6-3.2 bits/second/axis, hence 3.2-6.4 bits/second for our normal 2D tasks. If you read this blog regularly, you'll notice that others have achieved 4bits/sec with one neuron and 6.5 bits/sec with dozens {271}.

{5}
hide / / print
ref: bookmark-0 tags: machine_learning research_blog parallel_computing bayes active_learning information_theory reinforcement_learning date: 12-31-2011 19:30 gmt revision:3 [2] [1] [0] [head]

hunch.net interesting posts:

  • debugging your brain - how to discover what you don't understand. a very intelligent viewpoint, worth rereading + the comments. look at the data, stupid
    • quote: how to represent the problem is perhaps even more important in research since human brains are not as adept as computers at shifting and using representations. Significant initial thought on how to represent a research problem is helpful. And when it’s not going well, changing representations can make a problem radically simpler.
  • automated labeling - great way to use a human 'oracle' to bootstrap us into good performance, esp. if the predictor can output a certainty value and hence ask the oracle all the 'tricky questions'.
  • The design of an optimal research environment
    • Quote: Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
  • More recently: http://hunch.net/?p=2016
    • Problem is that online course only imperfectly emulate the social environment of a college, which IMHO are useflu for cultivating diligence.
  • The unrealized potential of the research lab Quote: Muthu Muthukrishnan says “it’s the incentives”. In particular, people who invent something within a research lab have little personal incentive in seeing it’s potential realized so they fail to pursue it as vigorously as they might in a startup setting.
    • The motivation (money!) is just not there.

{968}
hide / / print
ref: Bassett-2009.07 tags: Weinberger congnitive efficiency beta band neuroimagaing EEG task performance optimization network size effort date: 12-28-2011 20:39 gmt revision:1 [0] [head]

PMID-19564605[0] Cognitive fitness of cost-efficient brain functional networks.

  • Idea: smaller, tighter networks are correlated with better task performance
    • working memory task in normal subjects and schizophrenics.
  • Larger networks operate with higher beta frequencies (more effort?) and show less efficient task performance.
  • Not sure about the noisy data, but v. interesting theory!

____References____

[0] Bassett DS, Bullmore ET, Meyer-Lindenberg A, Apud JA, Weinberger DR, Coppola R, Cognitive fitness of cost-efficient brain functional networks.Proc Natl Acad Sci U S A 106:28, 11747-52 (2009 Jul 14)

{922}
hide / / print
ref: Guenther-2009.12 tags: Guenther Kennedy 2009 neurotrophic electrode speech synthesize formant BMI date: 12-17-2011 02:12 gmt revision:2 [1] [0] [head]

PMID-20011034[0] A Wireless Brain-Machine Interface for Real-Time Speech Synthesis

  • Neurites grow into the glass electrode over the course of 3-4 months; the signals and neurons are henceforth stable, at least for the period prior publication (>4 years).
  • Used an FM modulator to send out the broadband neural signal; powered the implanted electronics inductively.
  • Sorted 56 spike clusters (!!)
    • quote: "We chose to err on the side of overestimating the number of clusters in our BMI since our Kalman filter decoding technique is somewhat robust to noisy inputs, whereas a stricter criterion for cluster definition might leave out information-carrying spike clusters."
    • 27 units on one wire and 29 on the other.
  • Quote: "neurons in the implanted region of left ventral premotor cortex represent intended speech sounds in terms of formant frequency trajectories, and projections from these neurons to primary motor cortex transform the intended formant trajectories into motor commands to the speech articulators."
    • Thus speech can be represented as a trajectory through formant space.
    • plus there are many simple low-load formant-based sw synthesizers
  • Used supervised methods (ridge regression), where the user was asked to imagine making vowel sounds mimicking what he heard.
    • only used the first 2 vowel formants; hence 2D task.
    • Supervised from 8 ~1-minute recording sessions.
  • 25 real-time feedback sessions over 5 months -- not much training time, why?
  • Video looks alright.

____References____

[0] Guenther FH, Brumberg JS, Wright EJ, Nieto-Castanon A, Tourville JA, Panko M, Law R, Siebert SA, Bartels JL, Andreasen DS, Ehirim P, Mao H, Kennedy PR, A wireless brain-machine interface for real-time speech synthesis.PLoS One 4:12, e8218 (2009 Dec 9)

{252}
hide / / print
ref: Won-2004.02 tags: Debbie Won Wolf spike sorting mutual information tuning BMI date: 12-07-2011 02:58 gmt revision:3 [2] [1] [0] [head]

PMID-15022843[0] A simulation study of information transmission by multi-unit microelectrode recordings key idea:

  • when the units on a single channel are similarly tuned, you don't loose much information by grouping all spikes as coming from one source. And the opposite effect is true when you have very differently tuned neurons on the same channel - the information becomes more ambiguous.

____References____

{289}
hide / / print
ref: Li-2001.05 tags: Bizzi motor learning force field MIT M1 plasticity memory direction tuning transform date: 09-24-2008 22:49 gmt revision:5 [4] [3] [2] [1] [0] [head]

PMID-11395017[0] Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field

  • this is concerned with memory cells, cells that 'remember' or remain permanently changed after learning the force-field.
  • In the above figure, the blue lines (or rather vertices of the blue lines) indicate the firing rate during the movement period (and 200ms before); angular position indicates the target of the movement. The force-field in this case was a curl field where force was proportional to velocity.
  • Preferred direction of the motor cortical units changed when the preferred driection of the EMGs changed
  • evidence of encoding of an internal model in the changes in tuning properties of the cells.
    • this can suppor both online performance and motor learning.
    • but what mechanisms allow the motor cortex to change in this way???
  • also see [1]

____References____

{565}
hide / / print
ref: Walker-2005.12 tags: algae transfection transformation protein synthesis bioreactor date: 03-21-2008 17:22 gmt revision:1 [0] [head]

Microalgae as bioreactors PMID-16136314

{530}
hide / / print
ref: notes-0 tags: neuroscience ion channels information coding John Harris date: 01-07-2008 16:46 gmt revision:4 [3] [2] [1] [0] [head]

  • crazy idea: that neurons have a number of ion channel lines which can be selectively activated. That is, information is transmitted by longitudial transmission channels which are selectively activated based on the message that is transmitted
  • has any evidence for such a fine structure been found?? I think not, due to binding studies, but who knows..
  • dude uses historical references (Neumann) to back up his ideas. I find these sorts of justifications interesting, but not logically substantiative. Do not talk about the opinions of old philosophers (exclusively, at least), talk about their data.
  • interesting story about holography & the holograph of Dennis Gabor.
    • he does make interesting analogies to neuroscience & the importance of preserving spatial phase.
  • fourier images -- neato.
conclusion: interesting, but a bit cooky.

{520}
hide / / print
ref: bookmark-0 tags: DSP Benford's law Fourier transform book date: 12-07-2007 06:14 gmt revision:1 [0] [head]

http://www.dspguide.com/ch34.htm -- awesome!!

{344}
hide / / print
ref: Caminiti-1991.05 tags: transform motor control M1 3D population_vector premotor Caminiti date: 04-09-2007 20:10 gmt revision:2 [1] [0] [head]

PMID-2027042[0] Making arm movements within different parts of space: the premotor and motor cortical representation of a coordinate system for reaching to visual targets.

  • trained monkeys to make similar movements in different parts of external/extrinsic 3D space.
  • change of preferred direction was graded in an orderly manner across extrinsic space.
  • virtually no correlations found to endpoint static position: "virtually all cells were related to the direction and not to the end point of movement" - compare to Graziano!
  • yet the population vector remained an accurate predictor of movement: "Unlike the individual cell preferred directions upon which they are based, movement population vectors did not change their spatial orientation across the work space, suggesting that they remain good predictors of movement direction regardless of the region of space in which movements are made"

____References____

{294}
hide / / print
ref: Caminiti-1990.07 tags: transform motor control M1 3D population_vector premotor Caminiti date: 04-09-2007 20:07 gmt revision:4 [3] [2] [1] [0] [head]

PMID-2376768[0] Making arm movements within different parts of space: dynamic aspects in the primate motor cortex

  • monkeys made similar movements in different parts of external/extrinsic 3D space.
  • change of preferred direction was graded in an orderly manner across extrinsic space.
    • this change closely followed the changes in muscle activation required to effect the observed movements.
  • motor cortical cells can code direction of movement in a way which is dependent on the position of the arm in space
  • implies existence of mechanisms which facilitate the transformation between extrinsic (visual targets) and intrinsic coordinates
  • also see [1]

____References____

{229}
hide / / print
ref: notes-0 tags: SNR MSE error multidimensional mutual information date: 03-08-2007 22:33 gmt revision:2 [1] [0] [head]

http://ieeexplore.ieee.org/iel5/516/3389/00116771.pdf or http://hardm.ath.cx:88/pdf/MultidimensionalSNR.pdf

  • the signal-to-noise ratio between two vectors is the ratio of the determinants of the correlation matrices. Just see equation 14.

{146}
hide / / print
ref: van-2004.11 tags: anterior cingulate cortex error performance monitoring 2004 date: 0-0-2007 0:0 revision:0 [head]

PMID-15518940 Errors without conflict: implications for performance monitoring theories of anterior cingulate cortex.

  • did a event-locked fMRI to study whether the ACC would differentiate between correct and incorrect feedback stimuli in a time estimation task.
  • ACC seems to be not involved in error detection, just conflict.
----
  • according to one theory, ERN is generated as part of a reinforcement learning process. (Holroyd and Coles 2002): behavior is monitored by an 'adaptive critic' in the basal ganglia.
    • in this theory, the ACC is used to select between mental processes competing to access the motor system.
    • ERN corresponds to a decrease in dopamine.
    • ERN occurs when the stimulus indicates that an error has occured.
  • alternately, the ACC can monitor for the presence of conflict between simultaneously active but incompatible sensory/processing streams.
    • the ACC is active in correct trials in tasks that require conflict resolution. + it makes sense from a modeling strategy: high-energy state is equivalent to a state of conflit: many neurons are active at the same time.
    • that is, it is a stimuli resolver: e.g. the stroop task.
  • some studies localize (and the authors here indicate that the source-analysis that localizes dipole sources is inaccurate) the error potential to the posterior cingulate cortex.
    • fMRI solves this problem.
  • from their figures, it seems that the right putamen + bilateral caudate are involved in their time-estimation task (subjects has to press a button 1 second after a stimulus cue; feedback then guided/misguided them toward/away from 1000ms; subjects, of course, adjusted their behavior)
    • no sign of ACC activation was shown - as hard as they could look - despite identical (more or less) experimental design to the ERN studies.
      • hence, ERN is generated by areas other than the ACC.
  • in contrast, the stroop task fully engaged the anterior cingulate cortex.
  • cool: perhaps, then, error feedback negativity is better conceived as an (absence of) superimposed "correct feedback positivity" 'cause no area was more active in error than correct feedback.
  • of course, one is measuring brain activation through blood flow, and the other is measuring EEG signals.

{7}
hide / / print
ref: bookmark-0 tags: book information_theory machine_learning bayes probability neural_networks mackay date: 0-0-2007 0:0 revision:0 [head]

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html -- free! (but i liked the book, so I bought it :)

{66}
hide / / print
ref: bookmark-0 tags: machine_learning classification entropy information date: 0-0-2006 0:0 revision:0 [head]

http://iridia.ulb.ac.be/~lazy/ -- Lazy Learning.

{57}
hide / / print
ref: bookmark-0 tags: information entropy bit rate matlab code date: 0-0-2006 0:0 revision:0 [head]

http://www.cs.rug.nl/~rudy/matlab/

  • concise, well documented, useful.
  • number of bins = length of vector ^ (1/3).
  • information = sum(log (bincounts / prior) * bincounts) -- this is just the divergence, same as I do it.