You are not authenticated, login.
text: sort by
tags: modified
type: chronology
[0] Gandolfo F, Mussa-Ivaldi FA, Bizzi E, Motor learning by field approximation.Proc Natl Acad Sci U S A 93:9, 3843-6 (1996 Apr 30)[1] Mussa-Ivaldi FA, Giszter SF, Vector field approximation: a computational paradigm for motor control and learning.Biol Cybern 67:6, 491-500 (1992)

hide / / print
ref: -2019 tags: HSIC information bottleneck deep learning backprop gaussian kernel date: 10-06-2021 17:23 gmt revision:5 [4] [3] [2] [1] [0] [head]

The HSIC Bottleneck: Deep learning without Back-propagation

In this work, the authors use a kernelized estimate of statistical independence as part of a 'information bottleneck' to set per-layer objective functions for learning useful features in a deep network. They use the HSIC, or Hilbert-schmidt independence criterion, as the independence measure.

The information bottleneck was proposed by Bailek (spikes..) et al in 1999, and aims to increase the mutual information between the layer representation and the labels while minimizing the mutual information between the representation and the input:

minP T i|XI(X;T i)βI(T i;Y)\frac{min}{P_{T_i | X}} I(X; T_i) - \beta I(T_i; Y)

Where T iT_i is the hidden representation at layer i (later output), XX is the layer input, and YY are the labels. By replacing I()I() with the HSIC, and some derivation (?), they show that

HSIC(D)=(m1) 2tr(K XHK YH)HSIC(D) = (m-1)^{-2} tr(K_X H K_Y H)

Where D=(x 1,y 1),...(x m,y m)D = {(x_1,y_1), ... (x_m, y_m)} are samples and labels, K X ij=k(x i,x j)K_{X_{ij}} = k(x_i, x_j) and K Y ij=k(y i,y j)K_{Y_{ij}} = k(y_i, y_j) -- that is, it's the kernel function applied to all pairs of (vectoral) input variables. H is the centering matrix. The kernel is simply a Gaussian kernel, k(x,y)=exp(1/2||xy|| 2/σ 2)k(x,y) = exp(-1/2 ||x-y||^2/\sigma^2) . So, if all the x and y are on average independent, then the inner-product will be mean zero, the kernel will be mean one, and after centering will lead to zero trace. If the inner product is large within the realm of the derivative of the kernel, then the HSIC will be large (and negative, i think). In practice they use three different widths for their kernel, and they also center the kernel matrices.

But still, the feedback is an aggregate measure (the trace) of the product of two kernelized (a nonlinearity) outer-product spaces of similarities between inputs. it's not unimaginable that feedback networks could be doing something like this...

For example, a neural network could calculate & communicate aspects of joint statistics to reward / penalize weights within a layer of a network, and this is parallelizable / per layer / adaptable to an unsupervised learning regime. Indeed, that was done almost exactly by this paper: Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks albeit in a much less intelligible way.

Robust Learning with the Hilbert-Schmidt Independence Criterion

Is another, later, paper using the HSIC. Their interpretation: "This loss-function encourages learning models where the distribution of the residuals between the label and the model prediction is statistically independent of the distribution of the instances themselves." Hence, given above nomenclature, E X(P T i|XI(X;T i))=0 E_X( P_{T_i | X} I(X ; T_i) ) = 0 (I'm not totally sure about the weighting, but might be required given the definition of the HSIC.)

As I understand it, the HSIC loss is a kernellized loss between the input, output, and labels that encourages a degree of invariance to input ('covariate shift'). This is useful, but I'm unconvinced that making the layer output independent of the input is absolutely essential (??)

hide / / print
ref: -2020 tags: Principe modular deep learning kernel trick MNIST CIFAR date: 10-06-2021 16:54 gmt revision:2 [1] [0] [head]

Modularizing Deep Learning via Pairwise Learning With Kernels

  • Shiyu Duan, Shujian Yu, Jose Principe
  • The central idea here is to re-interpret deep networks, not with the nonlinearity as the output of a layer, but rather as the input of the layer, with the regression (weights) being performed on this nonlinear projection.
  • In this sense, each re-defined layer is implementing the 'kernel trick': tasks (like classification) which are difficult in linear spaces, become easier when projected into some sort of kernel space.
    • The kernel allows pairwise comparisons of datapoints. EG. a radial basis kernel measures the radial / gaussian distance between data points. A SVM is a kernel machine in this sense.
      • As a natural extension (one that the authors have considered) is to take non-pointwise or non-one-to-one kernel functions -- those that e.g. multiply multiple layer outputs. This is of course part of standard kernel machines.
  • Because you are comparing projected datapoints, it's natural to take contrastive loss on each layer to tune the weights to maximize the distance / discrimination between different classes.
    • Hence this is semi-supervised contrastive classification, something that is quite popular these days.
    • The last layer is of tuned with cross-entropy labels, but only a few are required since the data is well distributed already.
  • Demonstrated on small-ish datasets, concordant with their computational resources ...

I think in general this is an important result, even if its not wholly unique / somewhat anticipated (it's a year old at the time of writing). Modular training of neural networks is great for efficiency, parallelization, and biological implementations! Transport of weights between layers is hence non-essential.

Classes still are, but I wonder if temporal continuity can solve some of these problems?

(There is plenty of other effort in this area -- see also {1544})

hide / / print
ref: -0 tags: kernel regression structure discovery fitting gaussian process date: 09-24-2018 22:09 gmt revision:1 [0] [head]

Structure discovery in Nonparametric Regression through Compositional Kernel Search

  • Use Gaussian process kernels (squared exponential, periodic, linear, and ratio-quadratic)
  • to model a kernel function, k(x,x)k(x,x') which specifies how similar or correlated outputs yy and yy' are expected to be at two points $$x$ and xx' .
    • By defining the measure of similarity between inputs, the kernel determines the pattern of inductive generalization.
    • This is different than modeling the mapping y=f(x)y = f(x) .
    • It's something more like y=N(m(x)+k(x,x))y' = N(m(x') + k(x,x')) -- check the appendix.
    • See also: http://rsta.royalsocietypublishing.org/content/371/1984/20110550
  • Gaussian process models use a kernel to define the covariance between any two function values: Cov(y,y)=k(x,x)Cov(y,y') = k(x,x') .
  • This kernel family is closed under addition and multiplication, and provides an interpretable structure.
  • Search for kernel structure greedily & compositionally,
    • then optimize parameters with conjugate gradients with restarts.
    • This seems straightforwardly intuitive...
  • Kernels are scored with the BIC.
  • C.f. {842} -- "Because we learn expressions describing the covariance structure rather than the functions themselves, we are able to capture structure which does not have a simple parametric form."
  • All their figure examples are 1-D time-series, which is kinda boring, but makes sense for creating figures.
    • Tested on multidimensional (d=4) synthetic data too.
    • Not sure how they back out modeling the covariance into actual predictions -- just draw (integrate) from the distribution?

hide / / print
ref: Gandolfo-1996.04 tags: learning approximation kernel field Bizzi Gandolfo date: 12-07-2011 03:40 gmt revision:1 [0] [head]

Motor learning by field approximation.

  • PMID-8632977[0]
    • studied the generalization properties of force compensation in humans.
    • learning to compensate only occurs in regions of space where the subject actually experianced the force.
    • they posit that the CNS builds an internal model of the external world in order to predict and compensate for it. what a friggn surprise! eh well.
  • PMID-1472573[1] Vector field approximation: a computational paradigm for motor control and learning
    • Recent experiments in the spinalized frog (Bizzi et al. 1991) have shown that focal microstimulation of a site in the premotor layers in the lumbar grey matter of the spinal cord results in a field of forces acting on the frog's ankle and converging to a single equilibrium position
    • they propose that the process of generating movements is the process of combining basis functions/fields. these feilds may be optimized based on making it easy to achieve goals/move in reasonable ways.
  • alternatly, these basis functions could make movements invariant under a number of output transformations. yes...


hide / / print
ref: notes-0 tags: blackfin LED kernel module linux BF537 STAMP tftp BF537 bridge date: 11-13-2007 17:59 gmt revision:4 [3] [2] [1] [0] [head]

so, you want to control the LEDs on a BF537-STAMP board? You'll need a linux box with a serial port, then will need to do a few things:

  1. get the blackfin build tools:
    1. download the RPM file from blackfin.uclinux.org and use alien (if you are on debian, like me) to install it.
    2. installation instructions
  2. get uClinux distribution and compile it. http://blackfin.uclinux.org/gf/project/uclinux-dist/frs/
    1. unpack it to a local directory
    2. 'make menuconfig'
    3. select your vendor & device
    4. make sure runtime module loading is enabled.
    5. 'make' (it takes much less time than the full linux kernel)
    6. this will result in a linux.bin image, which uBoot can use.
  3. you need to set up a tftp server for uboot, see http://linuxgazette.net/125/pramode.html
  4. attach the blackfin stamp to the serial port on your computer. configure kermit with:
    set line /dev/ttyS1
    set speed 57600
    set carrier-watch off
    set prefixing all
    set parity none
    set stop-bits 1
    set modem none
    set file type bin
    set file name lit
    set flow-control none
    set prompt "Linux Kermit> " 
    (this is assuming that your serial port is /dev/ttyS1)
  5. power on the stamp, at the uBoot prompt press space.
  6. issue the following commands:
    set serverip
    set ipaddr
    tftpboot 0x1000000  linux
    bootelf 0x1000000 
    to get the device to boot your new uClinux image from SDRAM. your IP addresses will vary.
    1. note: you can boot any ELF image at this point; for example, the 'blink' example in the blackfin tool trunk SVN, 'make' produces a ELF file, which can be loaded into SDRAM via tftp and executed. I'm not sure what part of L1 uboot uses for its instruction, but conceivably you could load into L1 / data ram and execute from there. see also {403} you would do something like:
set serverip
set ipaddr
tftpboot 0x1000000  blink
bootelf 0x1000000 
  1. at the uCLinux prompt : ifconfig eth0
  2. write a simple kernel module, for example:
    #include <linux/module.h>
    //#include <linux/config.h>
    #include <linux/init.h>
    #include <linux/fs.h>
    #include <asm/uaccess.h>
    #include <asm/blackfin.h>
    #include <asm/io.h>
    #include <asm/irq.h>
    #include <asm/dma.h>
    #include <asm/cacheflush.h>
    int major;
    char *name = "led";
    int count = 0;
    ssize_t led_write(struct file* filp, const char *buf, size_t size, loff_t *offp)
    	printk("LED write called "); 
    	if (size < 2) return -EMSGSIZE;
    	if (!buf) return -EFAULT;
    	printk("led_write called with: %s ", buf); 
    	if(buf[0] == '0') {bfin_write_PORTFIO_CLEAR(1<< 6); }
    	else{ bfin_write_PORTFIO_SET(1<<6); }
    	return size;
    int led_open(struct inode *inode, struct file *file){
    	printk("led opened"); 
    	return 0; 
    int led_release(struct inode *inode, struct file *file){
    	printk("led released"); 
    	return 0; 
    struct file_operations fops = {
    	 .owner = THIS_MODULE,
    	.read = NULL,
    	.write = led_write,
    	.open = led_open,
    	.release = led_release
    int __init init_module(void)
    	// Set PF2 as output -- clear the FER bit.
    	bfin_write_PORTF_FER(bfin_read_PORTF_FER() & (~(1 << 6))); 
    	bfin_write_PORTFIO_SET(1<< 6);
    	bfin_write_PORTFIO_DIR(bfin_read_PORTFIO_DIR() | (1<<6)); 
    	major = register_chrdev(0, name, &fops);//hope it succeeds!
    	printk("registered, major = %d ", major); 
    	printk("portF = %d", bfin_read_PORTFIO()); 
    	printk("portF_FER = %d", bfin_read_PORTF_FER()); 
    	printk("portF_DIR = %d", bfin_read_PORTFIO_DIR()); 
    	return 0;
    void __exit cleanup_module(void)
    	unregister_chrdev(major, name);
    	printk("led: cleanup "); 
  3. write a makefile for this module, for example:
            make -C /uClinux-dist/linux-2.6.x/ M=`pwd`
  4. setup apache on your computer, e.g. 'apt-get install apache2'
  5. 'ln -s' your build directory to /var/www/, so that you can wget the resulting kernel module
  6. rm led.ko
    wget (for example)
    insmod led.ko
    rm /dev/led
    mknod /dev/led c 253 0
    chmod 0644 /dev/led
    echo 1 >> /dev/led