m8ta
You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{1577}
hide / / print
ref: -2008 tags: sketch armando solar lezema inductive program synthesis date: 11-11-2023 20:08 gmt revision:5 [4] [3] [2] [1] [0] [head]

Sketch - Program synthesis by sketching

  • Sketch consists of a C-like language with the additional primitives: integer-valued holes ??, reorder, generator (macros where the holes can vary), assert, repeat (also a variable-hole parameterized macro), regular-expression generators for selecting substitute programming primitives, and atomics for concurrency-focused synthesis (fork and compare-and-swap).
    • The other common programming primitives are there, too - if, else, while, procedures, arrays.
      • Loops, arrays, and recursion are limited to a fixed size to constrain the search space.
  • Sketch is inductive - working solutions are derived from the skeleton provided by the programmer + counterexamples provided by a verification function.
    • This is compared to previous work, which focused on deductive synthesis: programs are derived from a constructive proof of the satisfiabiltiy of a problem.
      • This 'proved' challenging in practice -- programmers are much better versed in construction rather than abstract mathematical derivation.
    • "By giving up the deductive approach, we lose correctness by construction, and we are forced to rely on an external validation procedure."
  • So they use CEGIS: CounterExample Guided Inductive Synthesis.
    • For most sketches, only a small set of counterexamples are needed to fully constrain the solution.
    • The counterexamples are fed to the inductive synthesizer so that the next candidate will be guaranteed to work correctly for all.
    • Each counterexample reduces the search space by (typically) 2^15 -- much information for each -- so only a handful of validation calls are usually needed, even when the solution space is very large.
      • E.g. 40 iterations when the candidate space is 2^600
        • Illuminating that computers can so readily handle such large spaces with prefect fidelity!
    • Counterexamples are much easier to find than an abstact model (as in the decuctive approach)
  • To do the synthesis, the sketch is expanded to a 'control': a set of variables, holes and calling contexts.
  • This control includes parameterized values that track the value of each variable, along with operators that define their relation to other variables, holes, and calling contexts.
    • The denotation function translates any expression to a function from a state to a parameterized value.
  • The program sketch is parsed (partially evaluated) to iteratively update σ,Φ\sigma, \Phi , where σ\sigma is a mapping from the set of variable names to parameterized values, and Φ\Phi is the set of control functions (holes and calling contexts).
    • With control flow like while and if, the set of parameterized values σ\sigma (with calling context) and control functions Φ\Phi are unrolled (to a limit) and unioned for the two branches (if & else).
    • Similarly for procedures and generators -- iterative state σ,Φ\sigma, \Phi is updated with additional calling context τ\tau (and bounded recursion / bounded evaluation! - kk is a hyperparameter)
    • Assertions and "ifs" are modeled as intersections on the iterative state.
      • In this way, semantics can model the behavior of all candidate solutions simultaneously -- e.g. Φ\Phi is a set of all possible assignments to the holes.
  • (Chapter 5) In practice: control is represented in an intermediate language, itself represented as a DAG.
    • Several optimizations are applied to make the DAG simpler, e.g. structural hashing to remove common sub-trees, and constant propagation (which may be redundant with internal SAT rules).
    • There seems to be a tension between Chapter 3 (abstract denotational semantics) and Chapter 5 (concrete DAG manipulations); how exactly they relate could have been spelled out better.
      • I.e.: how do the context, parameterized values, and control functions directly lead to the DAG??
    • I infer: the DAG seems to be a representation of the data & code flow graph that makes explicit the relationships between variables, holes, and control structures. (this is never mentioned, but seems to be the meaning of 'denotation'.)
  • The denotation function (compilation or partial evaluation) only needs to be computed once -- subsequent CEGIS loop evaluations only need to update the SAT problem from the counter-examples.
  • Integers are represented via mutually-exclusive & ordered guard bits: if a bit is 1, then the hole or variable evaluates to that integer. Aka one-hot encoding. This permits translation from DAG to CNF.
  • CEGIS starts by finding a solution = integer assignment of holes & permutations that make the SAT problem evaluate to True.
  • Based on this control vector, it then looks for a solution (value of the input variables) to the SAT problem that evaluates as False.
    • Clever re-use of the SAT solver & representation!
  • This counter-example is used to improve the SAT problem = add statements (system must evaluate to true with all input-output counter-examples) = remove hypotheses from Φ\Phi .
  • Did not delve much into the concurrency-related chapters (7-9) or the Stencil scientific computing chapters (10-11).

The essential algorithm, in words:

Take the sketch, expand it to a set of parameterized variables, holes, and calling contexts. Convert these to a DAG aka (?) data-code flow graph w/ dependencies. Try to simplify the DAG, one-hot encode integers, and convert to either a conjunctive-normal-form (CNF) SAT problem for MiniSat, or to a boolean circuit for the ABC solver.

Apply MiniSat or ABC to the problem to select a set of control values = values for the holes & permutations that satisfy the boolean constraints. Using this solution, use the SAT solver to find a input variable configuration that does not satisfy the problem. This serves as a counter-example. Run this through the validator function (oracle) to see what it does; use the counter-example and (inputs and outputs) to add clauses to the SAT problem. Run several times until either no counter-examples can be found or the problem is `unsat`.

Though the thesis describes a system that was academic & relatively small back in 2008, Sketch has enjoyed continuous development, and remains used. I find the work that went into it to be remarkable and impressive -- even with incremental improvements, you need accurate expansion of the language & manipulations to show proof-of-principle. Left wondering what limits its application to even larger problems -- need for a higher-level loop that further subdivides / factorizes the problem, or DFS for filling out elements of the sketch?

Interesting links discovered in while reading the dissertation:

  • Lock-free queues using compare_exchange_weak in C++ (useful!)
  • Karatsuba algorithm for multiplying large n-digit numbers with O(n 1.58)O(n^{1.58}) complexity. (Trick: divide and conquer & use algebra to re-use intermediate results via addition and subtraction)

{1509}
hide / / print
ref: -2002 tags: hashing frequent items count sketch algorithm google date: 03-30-2020 02:04 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Finding frequent items in data streams

  • Notation:
    • S is a data stream, S=q 1,q 2,...,q n S = q_1, q_2, ..., q_n length n.
    • Each object q iO=o 1,...o mq_i \in O = {o_1, ... o_m} That is, there are m total possible objects (e.g. English words).
    • Object o i o_i occurs n in_i times in S. The o no_n are ordered so that n 1n 2n m n_1 \geq n_2 \geq n_m .
  • Task:
    • Given an input stream S, integer k, and real ε\epsilon
    • Output a list of k elements from S such that each element has n i>(1ε)n k n_i \gt (1-\epsilon)n_k .
      • That is, if the ordering is perfect, n in k n_i \geq n_k , with equality on the last element.
  • Algorithm:
    • h 1,...,h th_1, ..., h_t hashes from object q to buckets 1,...,b{1, ..., b}
    • s 1,...,s ts_1, ..., s_t hashes from object q to 1,+1{-1, +1}
    • For each symbol, add it to the 2D hash array by hashing first with h ih_i , then increment that counter with s is_i .
      • The double-hasihing is to reduce the effect of collisions with high-frequency items.
    • When querying for frequency of a object, hash like others, and take the median over i of h i[q]*s i[q] h_i[q] * s_i[q]
    • t=O(log(nδ))t = O(log(\frac{n}{\delta})) where the algorithm fails with at most probability δ\delta
  • Demonstrate proof of convergence / function with Zipfian distributions with varying exponent. (I did not read through this).
  • Also showed that it's possible to compare these hash-counts directly to see what's changed,or importantly if the documents are different.


Mission: Ultra large-scale feature selection using Count-Sketches
  • Task:
    • Given a labeled dataset (X i,y i)(X_i, y_i) for i1,2,...,ni \in {1,2, ..., n} and X i p,y iX_i \in \mathbb{R}^p, y_i \in \mathbb{R}
    • Find the k-sparse feature vector / linear regression for the mean squares problem min||B|| 0=k||yXΒ|| 2 \frac{min}{||B||_0=k} ||y-X\Beta||_2
      • ||B|| 0=k ||B||_0=k counts the non-zero elements in the feature vector.
    • THE number of features pp is so large that a dense Β\Beta cannot be stored in memory. (X is of course sparse).
  • Such data may be from ad click-throughs, or from genomic analyses ...
  • Use the count-sketch algorithm (above) for capturing & continually updating the features for gradient update.
    • That is, treat the stream of gradient updates, in the normal form g i=2λ(y iX iΒ iX t) tX ig_i = 2 \lambda (y_i - X_i \Beta_i X^t)^t X_i , as the semi-continuous time series used above as SS
  • Compare this with greedy thresholding, Iterative hard thresholding (IHT) e.g. throw away gradient information after each batch.
    • This discards small gradients which may be useful for the regression problem.
  • Works better, but not necessarily better than straight feature hashing (FH).
  • Meh.