 m8ta
use https for features.
 text: sort by tags: modified type: chronology
{1410}
hide / / print
ref: -0 tags: kernel regression structure discovery fitting gaussian process date: 09-24-2018 22:09 gmt revision:1  [head]
• Use Gaussian process kernels (squared exponential, periodic, linear, and ratio-quadratic)
• to model a kernel function, $k(x,x')$ which specifies how similar or correlated outputs $y$ and $y'$ are expected to be at two points x\$ and $x'$ .
• By defining the measure of similarity between inputs, the kernel determines the pattern of inductive generalization.
• This is different than modeling the mapping $y = f(x)$ .
• It's something more like $y' = N(m(x') + k(x,x'))$ -- check the appendix.
• See also: http://rsta.royalsocietypublishing.org/content/371/1984/20110550
• Gaussian process models use a kernel to define the covariance between any two function values: $Cov(y,y') = k(x,x')$ .
• This kernel family is closed under addition and multiplication, and provides an interpretable structure.
• Search for kernel structure greedily & compositionally,
• then optimize parameters with conjugate gradients with restarts.
• This seems straightforwardly intuitive...
• Kernels are scored with the BIC.
• C.f. {842} -- "Because we learn expressions describing the covariance structure rather than the functions themselves, we are able to capture structure which does not have a simple parametric form."
• All their figure examples are 1-D time-series, which is kinda boring, but makes sense for creating figures.
• Tested on multidimensional (d=4) synthetic data too.
• Not sure how they back out modeling the covariance into actual predictions -- just draw (integrate) from the distribution?

{806}
hide / / print
ref: work-0 tags: gaussian random variables mutual information SNR date: 01-16-2012 03:54 gmt revision:26       [head]

I've recently tried to determine the bit-rate of conveyed by one gaussian random process about another in terms of the signal-to-noise ratio between the two. Assume $x$ is the known signal to be predicted, and $y$ is the prediction.

Let's define $SNR(y) = \frac{Var(x)}{Var(err)}$ where $err = x-y$ . Note this is a ratio of powers; for the conventional SNR, $SNR_{dB} = 10*log_{10 } \frac{Var(x)}{Var(err)}$ . $Var(err)$ is also known as the mean-squared-error (mse).

Now, $Var(err) = \sum{ (x - y - sstrch \bar{err})^2 estrch} = Var(x) + Var(y) - 2 Cov(x,y)$ ; assume x and y have unit variance (or scale them so that they do), then

$\frac{2 - SNR(y)^{-1}}{2 } = Cov(x,y)$

We need the covariance because the mutual information between two jointly Gaussian zero-mean variables can be defined in terms of their covariance matrix: (see http://www.springerlink.com/content/v026617150753x6q/ ). Here Q is the covariance matrix,

$Q = \left[ \array{Var(x) & Cov(x,y) \\ Cov(x,y) & Var(y)} \right]$

$MI = \frac{1 }{2 } log \frac{Var(x) Var(y)}{det(Q)}$

$Det(Q) = 1 - Cov(x,y)^2$

Then $MI = - \frac{1 }{2 } log_2 \left[ 1 - Cov(x,y)^2 \right]$

or $MI = - \frac{1 }{2 } log_2 \left[ SNR(y)^{-1} - \frac{1 }{4 } SNR(y)^{-2} \right]$

This agrees with intuition. If we have a SNR of 10db, or 10 (power ratio), then we would expect to be able to break a random variable into about 10 different categories or bins (recall stdev is the sqrt of the variance), with the probability of the variable being in the estimated bin to be 1/2. (This, at least in my mind, is where the 1/2 constant comes from - if there is gaussian noise, you won't be able to determine exactly which bin the random variable is in, hence log_2 is an overestimator.)

Here is a table with the respective values, including the amplitude (not power) ratio representations of SNR. "

 SNR Amp. ratio MI (bits) 10 3.1 1.6 20 10 3.3 30 31 5.0 40 100 6.6 90 31e3 15
Note that at 90dB, you get about 15 bits of resolution. This makes sense, as 16-bit DACs and ADCs have (typically) 96dB SNR. good.

Now, to get the bitrate, you take the SNR, calculate the mutual information, and multiply it by the bandwidth (not the sampling rate in a discrete time system) of the signals. In our particular application, I think the bandwidth is between 1 and 2 Hz, hence we're getting 1.6-3.2 bits/second/axis, hence 3.2-6.4 bits/second for our normal 2D tasks. If you read this blog regularly, you'll notice that others have achieved 4bits/sec with one neuron and 6.5 bits/sec with dozens {271}.

{762}
hide / / print
ref: work-0 tags: covariance matrix adaptation learning evolution continuous function normal gaussian statistics date: 06-30-2009 15:07 gmt revision:0 [head]
• Details a method of sampling + covariance matrix approximation to find the extrema of a continuous (but intractable) fitness function
• HAs flavors of RLS / Kalman filtering. Indeed, i think that kalman filtering may be a more principled method for optimization?
• Can be used in high-dimensional optimization problems like finding optimal weights for a neural network.
• Optimum-seeking is provided by weighting the stochastic samples (generated ala a particle filter or unscented kalman filter) by their fitness.
• Introductory material is quite good, actually...