Structure discovery in Nonparametric Regression through Compositional Kernel Search
 Use Gaussian process kernels (squared exponential, periodic, linear, and ratioquadratic)
 to model a kernel function, $k(x,x\prime )$ which specifies how similar or correlated outputs $y$ and $y\prime $ are expected to be at two points $$x$ and $x\prime $ .
 By defining the measure of similarity between inputs, the kernel determines the pattern of inductive generalization.
 This is different than modeling the mapping $y=f(x)$ .
 It's something more like $y\prime =N(m(x\prime )+k(x,x\prime ))$  check the appendix.
 See also: http://rsta.royalsocietypublishing.org/content/371/1984/20110550
 Gaussian process models use a kernel to define the covariance between any two function values: $\mathrm{Cov}(y,y\prime )=k(x,x\prime )$ .
 This kernel family is closed under addition and multiplication, and provides an interpretable structure.
 Search for kernel structure greedily & compositionally,
 then optimize parameters with conjugate gradients with restarts.
 This seems straightforwardly intuitive...
 Kernels are scored with the BIC.
 C.f. {842}  "Because we learn expressions describing the covariance structure rather than the functions themselves, we are able to capture structure which does not have a simple parametric form."
 All their figure examples are 1D timeseries, which is kinda boring, but makes sense for creating figures.
 Tested on multidimensional (d=4) synthetic data too.
 Not sure how they back out modeling the covariance into actual predictions  just draw (integrate) from the distribution?
