use https for features.
text: sort by
tags: modified
type: chronology
hide / / print
ref: -0 tags: asymmetric locality sensitive hash maximum inner product search sparsity date: 03-30-2020 02:17 gmt revision:5 [4] [3] [2] [1] [0] [head]

Improved asymmetric locality sensitive hashing for maximum inner product search

  • Like many other papers, this one is based on a long lineage of locality-sensitive hashing papers.
  • Key innovation, in [23] The power of asymmetry in binary hashing, was the development of asymmetric hashing -- the hash function of the query is different than the hash function used for storage. Roughly, this allows additional degrees of freedom since the similarity-function is (in the non-normalized case) non-symmetric.
    • For example, take query Q = [1 1] with keys A = [1 -1] and B = [3 3]. The nearest neighbor is A (distance 2), whereas the maximum inner product is B (inner product 6).
    • Alternately: self-inner product for Q and A is 2, whereas for B it's 18. Self-similarity is not the highest with inner products.
    • Norm of the query does not have an effect on the arg max of the search, though. Hence, for the paper assume that the query has been normalized for MIPS.
  • In this paper instead they convert MIPS into approximate cosine similarity search (which is like normalized MIPS), which can be efficiently solved with signed random projections.
  • (Established): LSH-L2 distance:
    • Sample a random vector a, iid normal N(0,1)
    • Sample a random normal b between 0 and r
      • r is the window size / radius (free parameters?)
    • Hash function is then the floor of the inner product of the vector a and input x + b divided by the radius.
      • I'm not sure about how the floor op is converted to bits of the actual hash -- ?
  • (Established): LSH-correlation, signed random projections h signh^{sign} :
    • Hash is the sign of the inner product of the input vector and a uniform random vector a.
    • This is a two-bit random projection [13][14].
  • (New) Asymmetric-LSH-L2:
    • P(x)=[x;||x|| 2 2;||x|| 2 4;....;||x|| 2 2 m]P(x) = [x;||x||^2_2; ||x||^4_2; .... ; ||x||^{2^m}_2] -- this is the pre-processing hashing of the 'keys'.
      • Requires that then norm of these keys, {||x||}_2 < U < 1$$
      • m3 m \geq 3
    • Q(x)=[x;1/2;1/2;...;1/2]Q(x) = [x;1/2; 1/2; ... ; 1/2] -- hashing of the queries.
    • See the mathematical explanation in the paper, but roughly "transformations P and Q, when normas are less than 1, provide correction to the L2 distance ||Q(p)P(x i)|| 2||Q(p) - P(x_i)||_2 , making in rank correlate with un-normalized inner product."
  • They then change the augmentation to:
    • P(x)=[x;1/2||x|| 2 2;1/2||x|| 2 4;...;1/2||x|| 2 2 m]P(x) = [x; 1/2 - ||x||^2_2; 1/2 - ||x||^4_2; ... ; 1/2 - ||x||^{2^m}_2]
    • Q(x)=[x;0;...;0]Q(x) = [x; 0; ...; 0]
    • This allows use of signed nearest-neighbor search to be used in the MIPS problem. (e.g. the hash is the sign of P and Q, per above; I assume this is still a 2-bit operation?)
  • Then the expand the U,M compromise function ρ\rho to allow for non-normalized queries. U depends on m and c (m is the codeword extension, and c is the ratio between o-target and off-target hash hits.
  • Tested on Movielens and Netflix databases, this using SVD preprocessing on the user-item matrix (full rank matrix indicating every user rating on every movie (mostly zeros!)) to get at the latent vectors.
  • In the above plots, recall (hah) that precision is the number of true positives / number of false positives as the number of draws k increases; recall is the number of true positives / number of draws k.
    • Clearly, the curve bends up and to the right when there are a lot of hash tables K.
    • Example datapoint: 50% precision at 40% recall, top 5. So on average you get 2 correct hits in 4 draws. Or: 40% precision, 20% recall, top 10: 2 hits in 5 draws. 20/40: 4 hits in 20 draws. (hit: correctly within the top-N)
    • So ... it's not that great.

Use case: Capsule: a camera based positioning system using learning
  • Uses 512 SIFT features as keys and queries to LSH. Hashing is computed via sparse addition / subtraction algorithm, with K bits per hash table (not quite random projections) and L hash tables. K = 22 and L = 24. ~ 1000 training images.
  • Best matching image is used as the location of the current image.

hide / / print
ref: -2016 tags: locality sensitive hash deep learning regularization date: 03-30-2020 02:07 gmt revision:5 [4] [3] [2] [1] [0] [head]

Scalable and sustainable deep learning via randomized hashing

  • Central idea: replace dropout, adaptive dropout, or winner-take-all with a fast (sublinear time) hash based selection of active nodes based on approximate MIPS (maximum inner product search) using asymmetric locality-sensitive hashing.
    • This avoids a lot of the expensive inner-product multiply-accumulate work & energy associated with nodes that will either be completely off due to the ReLU or other nonlinearity -- or just not important for the algorithm + current input.
    • The result shows that you don't need very many neurons active in a given layer for successful training.
  • C.f: adaptive dropout adaptively chooses the nodes based on their activations. A few nodes are sampled from the network probabalistically based on the node activations dependent on their current input.
    • Adaptive dropouts demonstrate better performance than vanilla dropout [44]
    • It is possible to drop significantly more nodes adaptively than without while retaining superior performance.
  • WTA is an extreme form of adaptive dropout that uses mini-batch statistics to enforce a sparsity constraint. [28] {1507} Winner take all autoencoders
  • Our approach uses the insight that selecting a very sparse set of hidden nodes with the highest activations can be reformulated as dynamic approximate query processing, solvable with LSH.
    • LSH can be sub-linear time; normal processing involves the inner product.
    • LSH maps similar vectors into the same bucket with high probability. That is, it maps vectors into integers (bucket number)
  • Similar approach: Hashed nets [6], which aimed to decrease the number of parameters in a network by using a universal random hash function to tie weights. Compressing neural networks with the Hashing trick
    • "HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value."
  • Ref [38] shows how asymmetric hash functions allow LSH to be converted to a sub-linear time algorithm for maximum inner product search (MIPS).
  • Used multi-probe LSH: rather than having a large number of hash tables (L) which increases hash time and memory use, they probe close-by buckets in the hash tables. That is, they probe bucket at B_j(Q) and those for slightly perturbed query Q. See ref [26].
  • See reference [2] for theory...
  • Following ref [42], use K randomized hash functions to generate the K data bits per vector. Each bit is the sign of the asymmetric random projection. Buckets contain a pointer to the node (neuron); only active buckets are kept around.
    • The K hash functions serve to increase the precision of the fingerprint -- found nodes are more expected to be active.
    • Have L hash tables for each hidden layer; these are used to increase the probability of finding useful / active nodes due to the randomness of the hash function.
    • Hash is asymmetric in the sense that the query and collection data are hashed independently.
  • In every layer during SGD, compute K x L hashes of the input, probe about 10 L buckets, and take their union. Experiments: K = 6 and L = 5.
  • See ref [30] where authors show around 500x reduction in computations for image search following different algorithmic and systems choices. Capsule: a camera based positioning system using learning {1506}
  • Use relatively small test data sets -- MNIST 8M, NORB, Convex, Rectangles -- each resized to have small-ish input vectors.

  • Really want more analysis of what exactly is going on here -- what happens when you change the hashing function, for example? How much is the training dependent on suitable ROC or precision/recall on the activation?
    • For example, they could have calculated the actual real activation & WTA selection, and compared it to the results from the hash function; how correlated are they?

hide / / print
ref: -2017 tags: locality sensitive hashing olfaction kenyon cells neuron sparse representation date: 01-18-2020 21:13 gmt revision:1 [0] [head]

PMID-29123069 A neural algorithm for a fundamental computing problem

  • Ceneral idea: locality-sensitive hashing, e.g. hashing that is sensitive to the high-dimensional locality of the input space, can be efficiently solved using a circuit inspired by the insect olfactory system.
  • Here, activation of 50 different types of ORNs is mapped to 50 projection neurons, which 'centers the mean' -- concentration dependence is removed.
  • This is then projected via a random matrix of sparse binary weights to a much larger set of Kenyon cells, which in turn are inhibited by one APL neuron.
  • Normal locality-sensitive hashing uses dense matrices of Gaussian-distributed random weights, which means higher computational complexity...
  • ... these projections are governed by the Johnson-Lindenstrauss lemma, which says that projection from high-d to low-d space can preserve locality (distance between points) within an error bound.
  • Show that the WTA selection of the top 5% plus random binary weight preserves locality as measured by overlap with exact input locality on toy data sets, including MNIST and SIFT.
  • Flashy title as much as anything else got this into Science... indeed, has only been cited 6 times in Pubmed.

hide / / print
ref: -0 tags: polyimide electrodes ecog japan photosensitive date: 06-28-2013 01:50 gmt revision:0 [head]

PMID-22719725 Photosensitive-polyimide based method for fabricating various neural electrode architectures

  • Yasuhiro X. Kato,1,* Shigeto Furukawa,2 Kazuyuki Samejima,1 Naoyuki Hironaka,2 and Makio Kashino2
  • many typos in this paper (not that I should talk..) Yet still, it's informative.
  • 12um thick photosensitive polyimide + Cr/Au fabrication.
  • Wet etch (photodeveloper).
  • Wet release (ferric chloride) from glass substrate.
  • Soldered a connector to the polyimide w/ stiffener.
  • Note that polyimide tends to shrink (up to 29%) during baking, unlike parylene!
  • Suggest 20-40um diameter neural recording sites; they did not coat.

hide / / print
ref: -0 tags: voltage sensitive dyes fluorescent protein date: 01-02-2013 05:08 gmt revision:0 [head]

PMID-20622860 Imaging brain electric signals with genetically targeted voltage-sensitive fluorescent proteins.

  • Interesting: Most fluorescent fusion proteins form intracellular aggregates during long-term expression in mammalian neurons, although this effect appears to be minimal in Aequorea victoria–derived fluorescent proteins.
  • See also {1185}

hide / / print
ref: -0 tags: optical recording voltage sensitive dyes redshirt date: 01-02-2013 03:17 gmt revision:3 [2] [1] [0] [head]

PMID-16050036 Imaging brain activity with voltage- and calcium-sensitive dyes.

  • Voltage-sensitive dyes are well suited for measuring synaptic integration, as:
    • Electrodes are too blunt to effectively record these fine, < 1um diameter structures.
    • The surface area to volume ratio is highest in the dendrites
    • Voltage-sensitive dyes also permeate internal membranes not subject to voltage gradients, hence this does not contribute to the signal, leading to a decreased ΔF/F\Delta F / F .
  • Dominant experimental noise is shot noise, statistical -- see {1181}.
  • modern dyes and imagers can reliably record single action potentials; spatial averaging yields similar resolution as electrical recording.
  • They performed optical recording of Aplysia sensory ganglia, and discovered following light tail touch: "It is almost as if the Aplysia nervous system is designed such that every cell in the abdominal ganglion cares about this (and perhaps every) sensory stimulus. In addition, more than 1000 neurons in other ganglia are activated by this touch..."
      • These results force a more pessimistic view of the present understanding of the neuronal basis of apparently simple behaviors in relatively simple nervous systems.
  • Used calcium imaging on olfactory glomeruli of mice and turtles; measurements were limited by either shot-noise or heart/breathing artifacts.
  • Confocal and two-photon microscopes, due to their exchange of spatial resolution for sensitivity, are not useful with voltage-sensitive dyes.
    • The generation of fluorescent photons in the 2-photon confocal microscope is not efficient. We compared the signals from Calcium Green-1 in the mouse olfactory bulb using 2-photon and ordinary microscopy. In this comparison the number of photons contributing to the intensity measurement in the 2-photon confocal microscope was about 1000 times smaller than the number measured with the conventional microscope and a CCD camera.
  • By the numbers, quote: Because only a small fraction of the 10e16 photons/ms emitted by a tungsten filament source will be measured, a signal-to-noise ratio of 10e8 (see above) cannot be achieved. A partial listing of the light losses follows. A 0.9-NA lamp collector lens would collect 0.1 of the light emitted by the source. Only 0.2 of that light is in the visible wavelength range; the remainder is infrared (heat). Limiting the incident wavelengths to those, which have the signal means, that only 0.1 of the visible light is used. Thus, the light reaching the
preparation might typically be reduced to 1013 photons/ms. If the light-collecting system that forms the image has high efficiency e.g., in an absorption measurement, about 1013 photons/ms will reach the image plane. (In a fluorescence measurement there will be much less light measured because 1. only a fraction of the incident photons are absorbed by the fluorophores, 2. only a fraction of the absorbed photons appear as emitted photons, and 3. only a fraction of the emitted photons are collected by the objective.) If the camera has a quantum efficiency of 1.0, then, in absorption, a total of 10e13 photoelectrons/ms will be measured. With a camera of 1000 pixels, there will be 10e10 photoelectrons/ms/pixel. The shot noise will be 10e5 photoelectrons/ms/pixel; thus the very best that can be expected is a noise that is 10e−5 of the resting light (a signal-to-noise ratio of 100 db). The extra light losses in a fluorescence measurement will further reduce the maximum obtainable signal-to-noise ratio.

hide / / print
ref: -0 tags: optical coherence tomography neural recording squid voltage sensitive dyes review date: 12-23-2012 21:00 gmt revision:4 [3] [2] [1] [0] [head]

PMID-20844600 Detection of Neural Action Potentials Using Optical Coherence Tomography: Intensity and Phase Measurements with and without Dyes.

  • Optical methods of recording have been investigated since the 1940's:
    • During action potential (AP) propagation in neural tissue light scattering, absorption, birefringence, fluorescence, and volume changes have been reported (Cohen, 1973).
  • OCT is reflection-based, not transmission: illuminate and measure from the same side.
    • Here they use spectral domain OCT, where the mirror is not scanned; rather SD-OCT uses a spectrometer to record interference of back-scattered light from all depth points simultaneously (Fercher et al., 1995).
    • Use of a spectrometer allows imaging of an axial line within 10-50us, sufficient for imaging action potentials.
    • SD-OCT, due to some underlying mathematics which I can't quite grok atm, can resolve/annul common-mode phase noise for high temporal and Δphase\Delta phase measurement (high sensitivity).
      • This equates to "microsecond temporal resolution and sub-nanometer optical path length resolution".
  • OCT is generally (intially?) used for in-vivo imaging of retinas, in humans and other animals.
  • They present new data for depth-localization of neural activity in squid giant axons (SGA) stained with a voltage-sensitive near-infrared dye.
    • Note: averaged over 250 sweeps.
  • ΔPhase>>ΔIntensity\Delta Phase &gt;&gt; \Delta Intensity -- figure 4 in the paper.
  • Use of voltage-sensitive dyes improves the resolution of ΔI\Delta I , but not dramatically --
    • And Δphase\Delta phase is still a bit delayed.
    • Electrical recording is the control.
      • It will take significant technology development before optical methods exceed electrical methods...
  • Looks pretty preliminary. However, OCT can image 1-2mm deep in transparent tissue, which is exceptional.
  • Will have to read their explanation of OCT.
  • Used in a squid giant axon prep. 2010, wonder if anything new has been done (in vivo?).
  • Claim that progress is hampered by limited understanding of how these Δphase\Delta phase signals arise.