m8ta fun

m8ta
You are not authenticated, login.

{1506}

hide / / print

ref: -0 tags: asymmetric locality sensitive hash maximum inner product search sparsity date: 03-30-2020 02:17 gmt revision:5 [4] [3] [2] [1] [0] [head]

Like many other papers, this one is based on a long lineage of locality-sensitive hashing papers.
Key innovation, in [23] The power of asymmetry in binary hashing, was the development of asymmetric hashing -- the hash function of the query is different than the hash function used for storage. Roughly, this allows additional degrees of freedom since the similarity-function is (in the non-normalized case) non-symmetric.

For example, take query Q = [1 1] with keys A = [1 -1] and B = [3 3]. The nearest neighbor is A (distance 2), whereas the maximum inner product is B (inner product 6).
Alternately: self-inner product for Q and A is 2, whereas for B it's 18. Self-similarity is not the highest with inner products.
Norm of the query does not have an effect on the arg max of the search, though. Hence, for the paper assume that the query has been normalized for MIPS.

In this paper instead they convert MIPS into approximate cosine similarity search (which is like normalized MIPS), which can be efficiently solved with signed random projections.
(Established): LSH-L2 distance:

Hash function is then the floor of the inner product of the vector a and input x + b divided by the radius.

I'm not sure about how the floor op is converted to bits of the actual hash -- ?

Hash is the sign of the inner product of the input vector and a uniform random vector a.
This is a two-bit random projection [13][14].

$P(x) = [x;||x||^2_2; ||x||^4_2; .... ; ||x||^{2^m}_2]$ -- this is the pre-processing hashing of the 'keys'.

$Q(x) = [x;1/2; 1/2; ... ; 1/2]$ -- hashing of the queries.
See the mathematical explanation in the paper, but roughly "transformations P and Q, when normas are less than 1, provide correction to the L2 distance $||Q(p) - P(x_i)||_2$ , making in rank correlate with un-normalized inner product."

$P(x) = [x; 1/2 - ||x||^2_2; 1/2 - ||x||^4_2; ... ; 1/2 - ||x||^{2^m}_2]$
$Q(x) = [x; 0; ...; 0]$
This allows use of signed nearest-neighbor search to be used in the MIPS problem. (e.g. the hash is the sign of P and Q, per above; I assume this is still a 2-bit operation?)

Then the expand the U,M compromise function $\rho$ to allow for non-normalized queries. U depends on m and c (m is the codeword extension, and c is the ratio between o-target and off-target hash hits.
Tested on Movielens and Netflix databases, this using SVD preprocessing on the user-item matrix (full rank matrix indicating every user rating on every movie (mostly zeros!)) to get at the latent vectors.
In the above plots, recall (hah) that precision is the number of true positives / number of false positives as the number of draws k increases; recall is the number of true positives / number of draws k.

Clearly, the curve bends up and to the right when there are a lot of hash tables K.
Example datapoint: 50% precision at 40% recall, top 5. So on average you get 2 correct hits in 4 draws. Or: 40% precision, 20% recall, top 10: 2 hits in 5 draws. 20/40: 4 hits in 20 draws. (hit: correctly within the top-N)
So ... it's not that great.

Uses 512 SIFT features as keys and queries to LSH. Hashing is computed via sparse addition / subtraction algorithm, with K bits per hash table (not quite random projections) and L hash tables. K = 22 and L = 24. ~ 1000 training images.
Best matching image is used as the location of the current image.