 m8ta
use https for features.
 text: sort by tags: modified type: chronology
 {1509} hide / / print ref: -2002 tags: hashing frequent items count sketch algorithm google date: 03-30-2020 02:04 gmt revision:7       [head] Notation: S is a data stream, $S = q_1, q_2, ..., q_n$ length n. Each object $q_i \in O = {o_1, ... o_m}$ That is, there are m total possible objects (e.g. English words). Object $o_i$ occurs $n_i$ times in S. The $o_n$ are ordered so that $n_1 \geq n_2 \geq n_m$ . Task: Given an input stream S, integer k, and real $\epsilon$ Output a list of k elements from S such that each element has $n_i \gt (1-\epsilon)n_k$ . That is, if the ordering is perfect, $n_i \geq n_k$ , with equality on the last element. Algorithm: $h_1, ..., h_t$ hashes from object q to buckets ${1, ..., b}$ $s_1, ..., s_t$ hashes from object q to ${-1, +1}$ For each symbol, add it to the 2D hash array by hashing first with $h_i$ , then increment that counter with $s_i$ . The double-hasihing is to reduce the effect of collisions with high-frequency items. When querying for frequency of a object, hash like others, and take the median over i of $h_i[q] * s_i[q]$ $t = O(log(\frac{n}{\delta}))$ where the algorithm fails with at most probability $\delta$ Demonstrate proof of convergence / function with Zipfian distributions with varying exponent. (I did not read through this). Also showed that it's possible to compare these hash-counts directly to see what's changed,or importantly if the documents are different. Mission: Ultra large-scale feature selection using Count-Sketches Task: Given a labeled dataset $(X_i, y_i)$ for $i \in {1,2, ..., n}$ and $X_i \in \mathbb{R}^p, y_i \in \mathbb{R}$ Find the k-sparse feature vector / linear regression for the mean squares problem $\frac{min}{||B||_0=k} ||y-X\Beta||_2$ $||B||_0=k$ counts the non-zero elements in the feature vector. THE number of features $p$ is so large that a dense $\Beta$ cannot be stored in memory. (X is of course sparse). Such data may be from ad click-throughs, or from genomic analyses ... Use the count-sketch algorithm (above) for capturing & continually updating the features for gradient update. That is, treat the stream of gradient updates, in the normal form $g_i = 2 \lambda (y_i - X_i \Beta_i X^t)^t X_i$ , as the semi-continuous time series used above as $S$ Compare this with greedy thresholding, Iterative hard thresholding (IHT) e.g. throw away gradient information after each batch. This discards small gradients which may be useful for the regression problem. Works better, but not necessarily better than straight feature hashing (FH). Meh. {1496} hide / / print ref: -2017 tags: locality sensitive hashing olfaction kenyon cells neuron sparse representation date: 01-18-2020 21:13 gmt revision:1  [head] PMID-29123069 A neural algorithm for a fundamental computing problem Ceneral idea: locality-sensitive hashing, e.g. hashing that is sensitive to the high-dimensional locality of the input space, can be efficiently solved using a circuit inspired by the insect olfactory system. Here, activation of 50 different types of ORNs is mapped to 50 projection neurons, which 'centers the mean' -- concentration dependence is removed. This is then projected via a random matrix of sparse binary weights to a much larger set of Kenyon cells, which in turn are inhibited by one APL neuron. Normal locality-sensitive hashing uses dense matrices of Gaussian-distributed random weights, which means higher computational complexity... ... these projections are governed by the Johnson-Lindenstrauss lemma, which says that projection from high-d to low-d space can preserve locality (distance between points) within an error bound. Show that the WTA selection of the top 5% plus random binary weight preserves locality as measured by overlap with exact input locality on toy data sets, including MNIST and SIFT. Flashy title as much as anything else got this into Science... indeed, has only been cited 6 times in Pubmed. 