Improved asymmetric locality sensitive hashing for maximum inner product search
 Like many other papers, this one is based on a long lineage of localitysensitive hashing papers.
 Key innovation, in [23] The power of asymmetry in binary hashing, was the development of asymmetric hashing  the hash function of the query is different than the hash function used for storage. Roughly, this allows additional degrees of freedom since the similarityfunction is (in the nonnormalized case) nonsymmetric.
 For example, take query Q = [1 1] with keys A = [1 1] and B = [3 3]. The nearest neighbor is A (distance 2), whereas the maximum inner product is B (inner product 6).
 Alternately: selfinner product for Q and A is 2, whereas for B it's 18. Selfsimilarity is not the highest with inner products.
 Norm of the query does not have an effect on the arg max of the search, though. Hence, for the paper assume that the query has been normalized for MIPS.
 In this paper instead they convert MIPS into approximate cosine similarity search (which is like normalized MIPS), which can be efficiently solved with signed random projections.
 (Established): LSHL2 distance:
 Sample a random vector a, iid normal N(0,1)
 Sample a random normal b between 0 and r
 r is the window size / radius (free parameters?)
 Hash function is then the floor of the inner product of the vector a and input x + b divided by the radius.
 I'm not sure about how the floor op is converted to bits of the actual hash  ?
 (Established): LSHcorrelation, signed random projections $h^{sign}$ :
 Hash is the sign of the inner product of the input vector and a uniform random vector a.
 This is a twobit random projection [13][14].
 (New) AsymmetricLSHL2:
 $P(x) = [x;x^2_2; x^4_2; .... ; x^{2^m}_2]$  this is the preprocessing hashing of the 'keys'.
 Requires that then norm of these keys, {x}_2 < U < 1$$
 $m \geq 3$
 $Q(x) = [x;1/2; 1/2; ... ; 1/2]$  hashing of the queries.
 See the mathematical explanation in the paper, but roughly "transformations P and Q, when normas are less than 1, provide correction to the L2 distance $Q(p)  P(x_i)_2$ , making in rank correlate with unnormalized inner product."
 They then change the augmentation to:
 $P(x) = [x; 1/2  x^2_2; 1/2  x^4_2; ... ; 1/2  x^{2^m}_2]$
 $Q(x) = [x; 0; ...; 0]$
 This allows use of signed nearestneighbor search to be used in the MIPS problem. (e.g. the hash is the sign of P and Q, per above; I assume this is still a 2bit operation?)
 Then the expand the U,M compromise function $\rho$ to allow for nonnormalized queries. U depends on m and c (m is the codeword extension, and c is the ratio between otarget and offtarget hash hits.
 Tested on Movielens and Netflix databases, this using SVD preprocessing on the useritem matrix (full rank matrix indicating every user rating on every movie (mostly zeros!)) to get at the latent vectors.

 In the above plots, recall (hah) that precision is the number of true positives / number of false positives as the number of draws k increases; recall is the number of true positives / number of draws k.
 Clearly, the curve bends up and to the right when there are a lot of hash tables K.
 Example datapoint: 50% precision at 40% recall, top 5. So on average you get 2 correct hits in 4 draws. Or: 40% precision, 20% recall, top 10: 2 hits in 5 draws. 20/40: 4 hits in 20 draws. (hit: correctly within the topN)
 So ... it's not that great.
Use case: Capsule: a camera based positioning system using learning
 Uses 512 SIFT features as keys and queries to LSH. Hashing is computed via sparse addition / subtraction algorithm, with K bits per hash table (not quite random projections) and L hash tables. K = 22 and L = 24. ~ 1000 training images.
 Best matching image is used as the location of the current image.
