Sketch  Program synthesis by sketching
 Sketch consists of a Clike language with the additional primitives: integervalued holes ??, reorder, generator (macros where the holes can vary), assert, repeat (also a variablehole parameterized macro), regularexpression generators for selecting substitute programming primitives, and atomics for concurrencyfocused synthesis (fork and compareandswap).
 The other common programming primitives are there, too  if, else, while, procedures, arrays.
 Loops, arrays, and recursion are limited to a fixed size to constrain the search space.
 Sketch is inductive  working solutions are derived from the skeleton provided by the programmer + counterexamples provided by a verification function.
 This is compared to previous work, which focused on deductive synthesis: programs are derived from a constructive proof of the satisfiabiltiy of a problem.
 This 'proved' challenging in practice  programmers are much better versed in construction rather than abstract mathematical derivation.
 "By giving up the deductive approach, we lose correctness by construction, and we are forced to rely on an external validation procedure."
 So they use CEGIS: CounterExample Guided Inductive Synthesis.
 For most sketches, only a small set of counterexamples are needed to fully constrain the solution.
 The counterexamples are fed to the inductive synthesizer so that the next candidate will be guaranteed to work correctly for all.
 Each counterexample reduces the search space by (typically) 2^15  much information for each  so only a handful of validation calls are usually needed, even when the solution space is very large.
 E.g. 40 iterations when the candidate space is 2^600
 Illuminating that computers can so readily handle such large spaces with prefect fidelity!
 Counterexamples are much easier to find than an abstact model (as in the decuctive approach)
 To do the synthesis, the sketch is expanded to a 'control': a set of variables, holes and calling contexts.
 This control includes parameterized values that track the value of each variable, along with operators that define their relation to other variables, holes, and calling contexts.
 The denotation function translates any expression to a function from a state to a parameterized value.
 The program sketch is parsed (partially evaluated) to iteratively update $\sigma, \Phi$ , where $\sigma$ is a mapping from the set of variable names to parameterized values, and $\Phi$ is the set of control functions (holes and calling contexts).

 With control flow like while and if, the set of parameterized values $\sigma$ (with calling context) and control functions $\Phi$ are unrolled (to a limit) and unioned for the two branches (if & else).
 Similarly for procedures and generators  iterative state $\sigma, \Phi$ is updated with additional calling context $\tau$ (and bounded recursion / bounded evaluation!  $k$ is a hyperparameter)
 Assertions and "ifs" are modeled as intersections on the iterative state.
 In this way, semantics can model the behavior of all candidate solutions simultaneously  e.g. $\Phi$ is a set of all possible assignments to the holes.

 (Chapter 5) In practice: control is represented in an intermediate language, itself represented as a DAG.
 Several optimizations are applied to make the DAG simpler, e.g. structural hashing to remove common subtrees, and constant propagation (which may be redundant with internal SAT rules).
 There seems to be a tension between Chapter 3 (abstract denotational semantics) and Chapter 5 (concrete DAG manipulations); how exactly they relate could have been spelled out better.
 I.e.: how do the context, parameterized values, and control functions directly lead to the DAG??
 I infer: the DAG seems to be a representation of the data & code flow graph that makes explicit the relationships between variables, holes, and control structures. (this is never mentioned, but seems to be the meaning of 'denotation'.)
 The denotation function (compilation or partial evaluation) only needs to be computed once  subsequent CEGIS loop evaluations only need to update the SAT problem from the counterexamples.
 Integers are represented via mutuallyexclusive & ordered guard bits: if a bit is 1, then the hole or variable evaluates to that integer. Aka onehot encoding. This permits translation from DAG to CNF.
 CEGIS starts by finding a solution = integer assignment of holes & permutations that make the SAT problem evaluate to True.
 Based on this control vector, it then looks for a solution (value of the input variables) to the SAT problem that evaluates as False.
 Clever reuse of the SAT solver & representation!
 This counterexample is used to improve the SAT problem = add statements (system must evaluate to true with all inputoutput counterexamples) = remove hypotheses from $\Phi$ .
 Did not delve much into the concurrencyrelated chapters (79) or the Stencil scientific computing chapters (1011).
The essential algorithm, in words:
Take the sketch, expand it to a set of parameterized variables, holes, and calling contexts. Convert these to a DAG aka (?) datacode flow graph w/ dependencies. Try to simplify the DAG, onehot encode integers, and convert to either a conjunctivenormalform (CNF) SAT problem for MiniSat, or to a boolean circuit for the ABC solver.
Apply MiniSat or ABC to the problem to select a set of control values = values for the holes & permutations that satisfy the boolean constraints. Using this solution, use the SAT solver to find a input variable configuration that does not satisfy the problem. This serves as a counterexample. Run this through the validator function (oracle) to see what it does; use the counterexample and (inputs and outputs) to add clauses to the SAT problem. Run several times until either no counterexamples can be found or the problem is `unsat`.
Though the thesis describes a system that was academic & relatively small back in 2008, Sketch has enjoyed continuous development, and remains used. I find the work that went into it to be remarkable and impressive  even with incremental improvements, you need accurate expansion of the language & manipulations to show proofofprinciple. Left wondering what limits its application to even larger problems  need for a higherlevel loop that further subdivides / factorizes the problem, or DFS for filling out elements of the sketch?
Interesting links discovered in while reading the dissertation:
 Lockfree queues using compare_exchange_weak in C++ (useful!)
 Karatsuba algorithm for multiplying large ndigit numbers with $O(n^{1.58})$ complexity. (Trick: divide and conquer & use algebra to reuse intermediate results via addition and subtraction)

Finding frequent items in data streams
 Notation:
 S is a data stream, $S = q_1, q_2, ..., q_n$ length n.
 Each object $q_i \in O = {o_1, ... o_m}$ That is, there are m total possible objects (e.g. English words).
 Object $o_i$ occurs $n_i$ times in S. The $o_n$ are ordered so that $n_1 \geq n_2 \geq n_m$ .
 Task:
 Given an input stream S, integer k, and real $\epsilon$
 Output a list of k elements from S such that each element has $n_i \gt (1\epsilon)n_k$ .
 That is, if the ordering is perfect, $n_i \geq n_k$ , with equality on the last element.
 Algorithm:
 $h_1, ..., h_t$ hashes from object q to buckets ${1, ..., b}$
 $s_1, ..., s_t$ hashes from object q to ${1, +1}$
 For each symbol, add it to the 2D hash array by hashing first with $h_i$ , then increment that counter with $s_i$ .
 The doublehasihing is to reduce the effect of collisions with highfrequency items.
 When querying for frequency of a object, hash like others, and take the median over i of $h_i[q] * s_i[q]$
 $t = O(log(\frac{n}{\delta}))$ where the algorithm fails with at most probability $\delta$
 Demonstrate proof of convergence / function with Zipfian distributions with varying exponent. (I did not read through this).
 Also showed that it's possible to compare these hashcounts directly to see what's changed,or importantly if the documents are different.
Mission: Ultra largescale feature selection using CountSketches
 Task:
 Given a labeled dataset $(X_i, y_i)$ for $i \in {1,2, ..., n}$ and $X_i \in \mathbb{R}^p, y_i \in \mathbb{R}$
 Find the ksparse feature vector / linear regression for the mean squares problem $\frac{min}{B_0=k} yX\Beta_2$
 $B_0=k$ counts the nonzero elements in the feature vector.
 THE number of features $p$ is so large that a dense $\Beta$ cannot be stored in memory. (X is of course sparse).
 Such data may be from ad clickthroughs, or from genomic analyses ...
 Use the countsketch algorithm (above) for capturing & continually updating the features for gradient update.
 That is, treat the stream of gradient updates, in the normal form $g_i = 2 \lambda (y_i  X_i \Beta_i X^t)^t X_i$ , as the semicontinuous time series used above as $S$
 Compare this with greedy thresholding, Iterative hard thresholding (IHT) e.g. throw away gradient information after each batch.
 This discards small gradients which may be useful for the regression problem.
 Works better, but not necessarily better than straight feature hashing (FH).

 Meh.
