PMID26621426 Causal Inference and Explaining Away in a Spiking Network
 Rubén MorenoBote & Jan Drugowitsch
 Use linear nonnegative mixing plus nose to generate a series of sensory stimuli.
 Pass these through a onelayer spiking or nonspiking neural network with adaptive global inhibition and adaptive reset voltage to solve this quadratic programming problem with nonnegative constraints.

 N causes, one observation: $\mu = \Sigma_{i=1}^{N} u_i r_i + \epsilon$ ,
 $r_i \geq 0$  causes can be present or not present, but not negative.
 cause coefficients drawn from a truncated (positive only) Gaussian.
 linear spiking network with symmetric weight matrix $J = U^TU  \beta I$ (see figure above)
 That is ... J looks like a correlation matrix!
 $U$ is M x N; columns are the mixing vectors.
 U is known beforehand and not learned
 That said, as a quasicorrelation matrix, it might not be so hard to learn. See ref [44].
 Can solve this problem by minimizing the negative logposterior function: $$ L(\mu, r) = \frac{1}{2}(\mu  Ur)^T(\mu  Ur) + \alpha1^Tr + \frac{\beta}{2}r^Tr $$
 That is, want to maximize the joint probability of the data and observations given the probabilistic model $p(\mu, r) \propto exp(L(\mu, r)) \Pi_{i=1}^{N} H(r_i)$
 First term quadratically penalizes difference between prediction and measurement.
 second term, alpha is a L1 regularization term, and third term w beta is a L2 regularization.
 The negative loglikelihood is then converted to an energy function (linear algebra): $W = U^T U$ , $h = U^T \mu$ then $E(r) = 0.5 r^T W r  r^T h + \alpha 1^T r + 0.5 \beta r^T r$
 This is where they get the weight matrix J or W. If the vectors U are linearly independent, then it is negative semidefinite.
 The dynamics of individual neurons w/ global inhibition and variable reset voltage serves to minimize this energy  hence, solve the problem. (They gloss over this derivation in the main text).
 Next, show that a spikebased network can similarly 'relax' or descent the objective gradient to arrive at the quadratic programming solution.
 Network is N leaky integrate and fire neurons, with variable synaptic integration kernels.
 $\alpha$ translates then to global inhibition, and $\beta$ to lowered reset voltage.

 Yes, it can solve the problem .. and do so in the presence of firing noise in a finite period of time .. but a little bit meh, because the problem is not that hard, and there is no learning in the network.

PMID28650477 Video rate volumetric Ca2+ imaging across cortex using seeded iterative demixing (SID) microscopy
 Tobias Nöbauer, Oliver Skocek, Alejandro J PerníaAndrade, Lukas Weilguny, Francisca Martínez Traub, Maxim I Molodtsov & Alipasha Vaziri
 Cellscale imaging at video rates of hundreds of GCaMP6 labeled neurons with lightfield imaging followed by computationallyefficient deconvolution and iterative demixing based on nonnegative factorization in space and time.


 Utilized a hybrid lightfield and 2p microscope, but didn't use the latter to inform the SID algorithm.
 Algorithm:
 Remove motion artifacts
 Time iteration:
 Compute the standard deviation versus time (subtract mean over time, measure standard deviance)
 Deconvolve standard deviation image using RichardsonLucy algo, with nonnegativity, sparsity constraints, and a simulated PSF.
 Yields hotspots of activity, putative neurons.
 These neuron lcoations are convolved with the PSF, thereby estimating its ballistic image on the LFM.
 This is converted to a binary mask of pixels which contribute information to the activity of a given neuron, a 'footprint'
 Form a matrix of these footprints, p * n, $S_0$ (p pixels, n neurons)
 Also get the corresponding image data $Y$ , p * t, (t time)
 Solve: minimize over T $ Y  ST_2$ subject to $T \geq 0$
 That is, find a nonnegative matrix of temporal components $T$ which predicts data $Y$ from masks $S$ .
 Space iteration:
 Start with the masks again, $S$ , find all sets $O^k$ of spatially overlapping components $s_i$ (e.g. where footprints overlap)
 Extract the corresponding data columns $t_i$ of T (from temporal step above) from $O^k$ to yield $T^k$ . Each column corresponds to temporal data corresponding to the spatial overlap sets. (additively?)
 Also get the data matrix $Y^k$ that is image data in the overlapping regions in the same way.
 Minimize over $S^k$ $ Y^k  S^k T^k_2$
 Subject to $S^k >= 0$
 That is, solve over the footprints $S^k$ to best predict the data from the corresponding temporal components $T^k$ .
 They also impose spatial constraints on this nonnegative least squares problem (not explained).
 This process repeats.
 allegedly 1000x better than existing deconvolution / blind source segmentation algorithms, such as those used in CaImAn
