Improved asymmetric locality sensitive hashing for maximum inner product search
- Like many other papers, this one is based on a long lineage of locality-sensitive hashing papers.
- Key innovation, in [23] The power of asymmetry in binary hashing, was the development of asymmetric hashing -- the hash function of the query is different than the hash function used for storage. Roughly, this allows additional degrees of freedom since the similarity-function is (in the non-normalized case) non-symmetric.
- For example, take query Q = [1 1] with keys A = [1 -1] and B = [3 3]. The nearest neighbor is A (distance 2), whereas the maximum inner product is B (inner product 6).
- Alternately: self-inner product for Q and A is 2, whereas for B it's 18. Self-similarity is not the highest with inner products.
- Norm of the query does not have an effect on the arg max of the search, though. Hence, for the paper assume that the query has been normalized for MIPS.
- In this paper instead they convert MIPS into approximate cosine similarity search (which is like normalized MIPS), which can be efficiently solved with signed random projections.
- (Established): LSH-L2 distance:
- Sample a random vector a, iid normal N(0,1)
- Sample a random normal b between 0 and r
- r is the window size / radius (free parameters?)
- Hash function is then the floor of the inner product of the vector a and input x + b divided by the radius.
- I'm not sure about how the floor op is converted to bits of the actual hash -- ?
- (Established): LSH-correlation, signed random projections :
- Hash is the sign of the inner product of the input vector and a uniform random vector a.
- This is a two-bit random projection [13][14].
- (New) Asymmetric-LSH-L2:
- -- this is the pre-processing hashing of the 'keys'.
- Requires that then norm of these keys, {||x||}_2 < U < 1$$
-
- -- hashing of the queries.
- See the mathematical explanation in the paper, but roughly "transformations P and Q, when normas are less than 1, provide correction to the L2 distance , making in rank correlate with un-normalized inner product."
- They then change the augmentation to:
-
-
- This allows use of signed nearest-neighbor search to be used in the MIPS problem. (e.g. the hash is the sign of P and Q, per above; I assume this is still a 2-bit operation?)
- Then the expand the U,M compromise function to allow for non-normalized queries. U depends on m and c (m is the codeword extension, and c is the ratio between o-target and off-target hash hits.
- Tested on Movielens and Netflix databases, this using SVD preprocessing on the user-item matrix (full rank matrix indicating every user rating on every movie (mostly zeros!)) to get at the latent vectors.
-
- In the above plots, recall (hah) that precision is the number of true positives / number of false positives as the number of draws k increases; recall is the number of true positives / number of draws k.
- Clearly, the curve bends up and to the right when there are a lot of hash tables K.
- Example datapoint: 50% precision at 40% recall, top 5. So on average you get 2 correct hits in 4 draws. Or: 40% precision, 20% recall, top 10: 2 hits in 5 draws. 20/40: 4 hits in 20 draws. (hit: correctly within the top-N)
- So ... it's not that great.
Use case: Capsule: a camera based positioning system using learning
- Uses 512 SIFT features as keys and queries to LSH. Hashing is computed via sparse addition / subtraction algorithm, with K bits per hash table (not quite random projections) and L hash tables. K = 22 and L = 24. ~ 1000 training images.
- Best matching image is used as the location of the current image.
|