Competent Program Evolution
 An excellent start, excellent good description + metadescription / review of existing literature.
 He thinks about things in a slightly different way  separates what I call solutions and objective functions "post and prerepresentational levels" (respectively).
 The thesis focuses on postrepresentational search/optimization, not prerepresentational (though, I believe that both should meet in the middle  eg. prerepresentational levels/ objective functions tuned iteratively during postrepresentational solution creation. This is what a human would do!)
 The primary difficulty in competent program evolution is the intense nondecomposability of programs: every variable, constant, branch effects the execution of every other little bit.
 Competent program creation is possible  humans create programs significantly shorter than lookup tables  hence it should be possible to make a program to do the same job.
 One solution to the problem is representation  formulate the program creation as a set of 'knobs' that can be twiddled (here he means both gradientdescent partialderivative optimization and simplex or heuristic onedimensional probabilistic search, of which there are many good algorithms.)
 pp 27: outline of his MOSES program. Read it for yourself, but looks like:
 The representation step above "explicitly addresses the underlying (semantic) structure of program space independently of the search for any kind of modularity or problem decomposition."
 In MOSES, optimization does not operate directly on program space, but rather on subspaces defined by the representationbuilding process. These subspaces may be considered as being defined by templates assigning values to some of the underlying dimensions (e.g., they restrict the size and shape of any resulting trees).
 In chapter 3 he examines the properties of the boolean programming space, which is claimed to be a good model of larger/more complicated programming spaces in that:
 Simpler functions are much more heavily sampled  e.g. he generated 1e6 samples of 100term boolean functions, then reduced them to minimal form using standard operators. The vast majority of the resultant minimum length (compressed) functions were simple  tautologies or of a few terms.
 A corollary is that simply increasing syntactic sample length is insufficient for increasing program behavioral complexity / variety.
 Actually, as random program length increases, the percentage with interesting behaviors decreases due to the structure of the minimum length function distribution.
 Also tests random perturbations to large boolean formulae (variable replacement/removal, operator swapping)  ~90% of these do nothing.
 These randomly perturbed programs show a similar structure to above: most of them have very similar behavior to their neighbors; only a few have unique behaviors. makes sense.
 Run the other way: "syntactic space of large programs is nearly uniform with respect to semantic distance." Semantically similar (boolean) programs are not grouped together.
 Results somehow seem a letdown: the program does not scale to even moderately large problem spaces. No loops, only functions with conditional evalutation  Jacques Pitrat's results are far more impressive. {815}
 Seems that, still, there were a lot of metaknobs to tweak in each implementation. Perhaps this is always the case?
 My thought: perhaps you can run the optimization not on program representations, but rather program codepaths. He claims that one problem is that behavior is loosely or at worst chaotically related to program structure  which is true  hence optimization on the program itself is very difficult. This is why Moshe runs optimization on the 'knobs' of a representational structure.
