You are not authenticated, login.
text: sort by
tags: modified
type: chronology
{889} is owned by tlh24.{868} is owned by tlh24.{797} is owned by tlh24.{822} is owned by tlh24.{851} is owned by tlh24.{845} is owned by tlh24.
hide / / print
ref: work-0 tags: distilling free-form natural laws from experimental data Schmidt Cornell automatic programming genetic algorithms date: 12-30-2021 05:11 gmt revision:7 [6] [5] [4] [3] [2] [1] [head]

Distilling free-form natural laws from experimental data

  • The critical step was to use the full set of all pairs of partial derivatives ( δx/δy\delta x / \delta y ) to evaluate the search for invariants.
  • The selection of which partial derivatives are held to be independent / which variables are dependent is a bit of a trick too -- see the supplemental information.
    • Even yet, with a 4D data set the search for natural laws took ~ 30 hours.
  • This was via a genetic algorithm, distributed among 'islands' on different CPUs, with mutation and single-point crossover.
  • Not sure what the IL is, but it appears to be floating-point assembly.
  • Timeseries data is smoothed with Loess smoothing, which fits a polynomial to the data, and hence allows for smoother / more analytic derivative calculation.
    • Then again, how long did it take humans to figure out these invariants? (Went about it in a decidedly different way..)
    • Further, how long did it take for biology to discover similar 'design equations'?
      • The same algorithm has been applied to biological data - a metabolic pathway - with some success pub 2011.
      • Of course evolution had to explore a much larger space - proteins and regulatory pathways, not simpler mathematical expressions / linkages.

Since his Phd, Michael Schmidt has gone on to found Nutonian, which produced Eurequa software, apparently without dramatic new features other than being able to use the cloud for equation search. (Probably he improved many other detailed facets of the software..). Nutonian received $4M in seed funding, according to Crunchbase.

In 2017, Nutonian was acquired by Data Robot (for an undisclosed amount), where Michael has worked since, rising to the title of CTO.

Always interesting to follow up on the authors of these classic papers!

hide / / print
ref: -0 tags: automatic programming inductive functional igor date: 07-29-2014 02:07 gmt revision:0 [head]

Inductive Rule Learning on the Knowledge Level.

  • 2011.
  • v2 of their IGOR inductive-synthesis program.
  • Quote: The general idea of learning domain specific problem solving strategies is that first some small sample problems are solved by means of some planning or problem solving algorithm and that then a set of generalized rules are learned from this sample experience. This set of rules represents the competence to solve arbitrary problems in this domain.
  • My take is that, rather than using heuristic search to discover programs by testing specifications, they use memories of the output to select programs directly (?)
    • This is allegedly a compromise between the generate-and-test and analytic strategies.
  • Description is couched in CS-lingo which I am inexperienced in, and is perhaps too high-level, a sin I too am at times guilty of.
  • It seems like a good idea, though the examples are rather unimpressive as compared to MagicHaskeller.

hide / / print
ref: -0 tags: automatic programming date: 05-18-2014 07:02 gmt revision:1 [0] [head]


  • http://www.lighttable.com/2014/05/16/pain-we-forgot/ -- approaching the problem by making better tools / interfaces.
  • http://pchiusano.blogspot.com/2013/05/the-future-of-software-end-of-apps-and.html -- approaching the problem by making the web one giant functional & typed language, distributed over all clients.
    • Strongly advocates the utility and need for typed languages to provide contexts for actions.
    • Interesting, but altogether dreamy and unlikely.
  • Bloom language
    • "the standard data structures in Bloom are disorderly collections, rather than scalar variables and structures. These data structures reflect the realities of non-deterministic ordering inherent in distributed systems. Bloom provides simple, familiar syntax for manipulating these structures. In the Bud prototype, much of this syntax comes straight from Ruby, with a taste of MapReduce and SQL." perfect.
    • From Berkeley.
    • Based on Daedalus data language, which specifies temporal ordering?
      • "The basic idea is that Time (meaning both the sequentiality of program steps in a single “thread”, and coordination of steps across threads/machines) is needed for only one purpose: to prevent multiple possible states from co-occurring. I.e. the purpose of time is to seal fate at each instantaneous moment." src

hide / / print
ref: -0 tags: Moshe looks automatic programming google tech talk links date: 11-07-2012 07:38 gmt revision:3 [2] [1] [0] [head]

List of links from Moshe Looks google tech talk:

hide / / print
ref: -0 tags: automatic programming synthesis loops examples date: 10-03-2011 22:28 gmt revision:1 [0] [head]

Open letter proposing some ideas on how to automate programming: simulate a human! Rather from a neuro background, and rather sketchy (as in vague, not as in the present slang usage).


hide / / print
ref: -0 tags: automatic programming Greg date: 01-11-2011 07:00 gmt revision:0 [head]

The goal is to make a system that is capable of automatic debugging / design. It seems to me that a good bit of the work of designing and debugging such a program - which I imagine to be basically a set of heuristics triggered by certain situations - is mechanical, and hence could be automated.

First, debugging: Say your program doesn't compile. You look at the error message + location, look at the code, and change variables either there or in causally-connected code to fix the error. The knowledge of what to change basically involves a forward model of how the computer runs the program and experiential knowledge of what worked in the past. The former need not be encoded - the computer can simply run slightly different versions of the program and see what happens itself (aka 'taking the partial derivatives of a program'). The latter can be stored in a well-designed database.

Then your program crashes. Usually, you look at the code and run it on a model of the computer in your mind, checking at every point if the given data-path and code-path to see if it diverges from acceptable behavior. Maybe you instrument the code with printfs. Then, you look at the causally-connected code to see what can be changed to alleviated the error condition, or to make the code/datapath align with well-known programming patterns. Again, the computer does not need a model of itself - the computer can simply running the code as an interpreted language; everything is then instrumented. The well-known patterns can be entered by a human, or learned via experimentation (analogous to 'school' and 'hacking' to a person).

Now your program runs and compiles. But, it doesn't do what you want it to do - you see in one particular case there is an error. So, you look at the program, run it in your mind, and look at all the variables and codepaths that are causally related to producing that case, imagining what changes to each will do to the error. Possibly this involves changing the AST, adding an if-statement, etc. You poke around; eventually something works, if partially. The computer can do the same thing, via the well-instrumented interpreted language.

Eventually this produces spaghetti code (unless you're really smart, and predicted all this case-programming), and small tweaks break more things than they fix. Time to refactor - apply well-known invariant code transformations, or more generally look for simpler codepaths with the same effective datapaths. Perhaps you avoided it by using good initial programming patterns, found mostly through analogy with the real world (object-orientation, MVC etc). Again this is where a good memory of programming-patterns will serve well.

That's the gist. There is much more - namely, the importance of finding transforms that render problems and sub-problems linear, and the practical consideration that the end goal in much computer programming is itself found through iteration - but this email is long enough. What I've outlined probably doesn't look close to GA / EA, but in practice will most likely involve intense and random experimentation within each of the steps above, something that GA does a lot of. What should make it faster is that individual random experimentation is more constrained - smaller space to explore - and when possible memory can replace expending time and power on experimentation. To look at it another way, the project should move a feedback loop (write compile observe) from the human to the computer; this is OK as all the needed info is within the computer already; no outside interaction is needed (unlike, say physical evolution).

hide / / print
ref: ai-0 tags: automatic programming journal notes date: 12-31-2010 05:24 gmt revision:4 [3] [2] [1] [0] [head]

This evening, on the drive back from wacky (and difficult) Russian-style yoga, I got a chance to explain to my brother what I really want to be working on, the thing that really tickles my fancy. My brother and I, so much as genetic commonality and common upbringing seem to effect, have very similar styles of thinking, which made explaining things a bit easier. For you, dear readier, I'll expand a bit.

I'd like to write a program that writes other programs, iteratively, given some objective function / problem statement / environment in which to interact. The present concrete goal is to have a said program make a program that is able to lay out PCBs with quality similar to that of humans. The overarching framework that I'm planning on using is genetic/evolutionary algorithms (the latter does not have crossover, fyi), but no one has applied GA to the problem in this way: most people use GA to solve a particular instance of a problem. Rubbish, i say, this is energy wasteful!

Rubbish, you may return: the stated problem requires a degree of generalization and disconnect from the 'real world' (the PCB) that makes GAs extremely unlikely to come up with any solutions. Expressed another way: the space to be explored is too large (program begets program begets solution). This is a very sensible critique; there is no way in hell a GA can solve this problem. They are notably pathetic at exploring space in a energy-efficient way (to conclude a paragraph again with energy... ).

There are known solutions for this: memory -- cache the results, in terms of algorithm & behavior, of all 'hypotheses' or individuals tried out by a GA. This is what humans do -- they remember the results of their experiment, and substitute the result rather than running a test again. But humans do something far more sophisticated and interesting than just memory - they engineer systems; engineering is an iterative process that often goes down wrong design paths, yet it nonetheless delivers awesome things like Saabs and such.

As I described to K--, engineering is not magic and can be (has been?) described mechanistically. First of all, most engineering artifacts start off from established, well-characterized components, aggregated through the panoply of history. Some of these components describe how other components are put together, things that are either learned in school or by taking things apart. Every engineer, ala Newton, stands on the vast shoulders of the designers before; hence any program must also have these shoulders available. The components are assembled into a system in a seemingly ad-hoc and iterative procedure: sometimes you don't know what you want, so you play with the parts sorta randomly, and see what interesting stuff comes out. Other times you know damn well what you / your boss / the evil warlord who holds you captive wants. Both modes are interesting (and the dichotomy is artificial), but the latter is more computer-like, hence to be modeled.

Often the full details of the objective function or desired goal is very unclear in the hands of the boss / evil warlord (1), despite how reluctant they may be to admit this. Such an effect is well documented in Fred Brooks' book, __The Design of Design__. Likewise, how to get to a solution is unclear in the mind of an engineer, so he/she shuffles things around in the mind (2),

  1. looking for components that deliver particular desired features (e.g. in an electronic system, gain makes me think of an op-amp)
  2. looking for components that remove undesirable features (e.g. a recent noise problem on my wireless headstage made me think of a adaptive decorrelating filter I made once)
  3. looking for transforms that make the problem solvable in a linear space, something that Moshe Looks calls knob-twiddling.
    1. this is from both sides -- transforms that convert the problem or the nascent solution.
    2. An example would be the FFT. This makes it easy to see spectral features.
    3. Another example, used even more recently, is coordinate transforms - it makes thinks like line-line intersection much easier.
    4. When this doesn't work, you can do far more powerful automatic coordinate transform - math, calculus. This is ultimately what I needed when figuring out the shortest line segment between a line segment and a ellipse. Don't ask.

This search is applied iteratively, apparently a good bit of the time subconsciously. A component exists in our mind as a predictive model of how the thing behaves, so we simulate it on input, observe output, and check to see if anything there is correlated / decorrelated with target features. (One would imagine that our general purpose modeling ability grew from needing to model and predict the world and all the yummy food/dangerous animals/warlords in it). The bigger the number of internal models in the engineers mind, the bigger the engineers passion for the project, the more components can be simulated and selected for. Eventually progress is made, and a new subproblem is attacked in the same way, with a shorter path and different input/output to model/regress against.

This is very non-magical, which may appall the more intuitive designers among us. It is also a real issue, because it doesn't (or poorly) explains really interesting engineering: e.g. the creation of the Fourier transform, the creation of the expectation-maximization algorithm, all the statistical and mathematical hardware that lends beauty and power to our design lives. When humans create these things, they are at the height of their creative ability, and thus it's probably a bit ridiculous to propose having a computer program do the same. That does not prevent me from poking at the mystery here, though: perhaps it is something akin to random component assembly (and these must be well known components (highly accurate, fast internal models); most all innovations were done by people exceptionally familiar with their territory), with verification against similarly intimately known data (hence, all things in memory - fast 'iteration cycles'). This is not dissimilar to evolutionary approaches to deriving laws. A Cornell physicist / computer scientist was able to generate natural laws via a calculus-infused GA {842}, and other programs were able to derive Copernicus' laws from planetary data. Most interesting scientific formulae are short, which makes them accessible to GAs (and also aesthetically pleasurable, and/or memelike, but hey!). In contrast engineering has many important design patterns that are borrowed by analogy from real-world phenomena, such as the watermark algorithm, sorting, simulated annealing, the MVC framework, object-oriented programming, WIMP interface, verb/noun interface, programming language, even GAs themselves! Douglas Hofstadter has much more to say about analogies, so I defer to him here.

Irregardless, as K-- pointed out, without some model for creativity (even one as soulless as the one above), any proposed program-creating program will never come up with anything really new. To use a real-world analogy, at his work the boss is extremely crazy - namely, he mistook a circuit breaker for an elevator (in a one-story factory!). But, this boss also comes up with interminable and enthusiastic ideas, which he throws against the wall of his underlings a few dozen times a day. Usually these ideas are crap, but sometimes they are really good, and they stick. According to K--, the way his mind works is basically opaque and illogical (I've met a few of these myself), yet he performs an essential job in the company - he spontaneously creates new ideas. Without such a boss, he claimed, the creations of a program-creating-program will impoverished.

And perhaps hence this should be the first step. Tonight I also learned that at the company (a large medical devices firm) they try to start projects at the most difficult step. That way, projects that are unlikely to succeed are killed as soon as possible. The alternate strategy, which I have previously followed, is to start with the easiest things first, so you get some motivation to continue. Hmm...

The quandary to shuffle your internal models over tonight then, dear readers, is this: is creativity actually (or accurately modeled by) random component-combination creation (boss), followed by a selection/rejection (internal auditing, or colleague auditing)? (3)

  • (1) Are there any beneficent warlords?
  • (2) Yet: as I was educated in a good postmodernist tradition, this set of steps ('cultural software') is not the only way to design. I'm just using it since, well, best to start with something that already works.
  • (3) If anyone reads this and wants to comment, just edit this. Perhaps you want to draw a horizontal line and write comments below it? Anyway, writing is active thinking, so thanks for helping me think.

hide / / print
ref: -0 tags: automatic programming Herbert Simon date: 12-29-2010 04:43 gmt revision:1 [0] [head]

Whether Software Engineering Needs to Be Artificially Intelligent By Herbert Simon, author of __The Sciences of the Artificial__

  • He talks about many tasks & goals that, at least in 2010, (25 years later) seem relatively obvious:
    • There is aboundary between man and machine, and the automatic programming can make steps by moving the boundary toward peeople: e.g. incorporating shortcuts / macros whatever for the menial trivial tasks of programming.
      • I would imagine that garbage collection, automatic type inference to be two of these menial tasks. Alternately, you can look at the automatic GUI generators like Glade and MFC. But why progress so slow on this part? We haven't automated that much..
    • Must have a better language for formalizing desired program state, desired results. I agree! Programming languages still conflate the tasks of describing the data structures with the task of describing the desired end state..
      • He talks about writing desires, objectives in natural language. I don't find that all to likely..
    • Need a database of data representations from which to draw, code for the automatic programmer to copy into new programs.
      • Have thought about this, and storing what program snippets do seems very ill defined! We can remember in our vague memory, but perhaps that's because we choose relatively few building blocks for constructing larger structures (and these structures, in turn, are relatively few...)
  • A good example of moving the boundary between man and machine is the telephone exchange system. Initially, it was only operated by humans; later, it's mostly machines - it's become hard to find a human operator on the other end. The steps to this have been gradual, possibly due to the required effort, possibly due to institutional intertia.
    • In design "the informality gradually gets squeezed out".
  • The author tells that medical diagnosis, e.g. at the hands of caduceus, is well functional; looking back & at the present absence of expert systems in medicine, this statement seems overly optimistic.
    • That said, i do agree heartily with that a useful expert system "must be designed in an interactive mode for progressive modification of the sort that I have been describing"
  • Random: While moves in chess can be searched by traversing a tree, backgammon is not susceptible to the same approach. Hans Berliner, apparently in the 1980's devised a production program that was able to beat the world champion backgammon player. This was at least 10 years before Tesauro!

hide / / print
ref: -0 tags: automatic programming myths prospects date: 12-29-2010 04:42 gmt revision:3 [2] [1] [0] [head]

Automatic Programming: Myths and Prospects by Charles Rich and Richard C Waters, 1988

  • "End user oriented automatic programming systems must be domain experts. In particular, it is not possible to describe something briefly unless the hearer knows as much about the domain as the speaker does and can therefore understand the words and fill in what was left out.
  • Authors enumerate three general classes of automatic programming:
    • Bottom up. Eg. writing in very high-level programming languages; as in {853}, tries to push the boundary of human-machine toward the human.
    • Narrow domain. Eg. expert systems in a limited domain.
    • Assistant. Eg. IDEs, documentation: possibly the most popular approach.
  • Back in the day (1958) automatic programming consisted of what we now call assemblers and compilers. hah!
  • "The problem with reqirements (and contracts) is that they cannot be complete." (There is implicit, unstated information, as well as straight unspecified information).
    • Applied to practical things which must interact with people - they give the example of an ATM - i think so much is in the head of the human that automatic programming / using some potentially stochastic means of generating code makes no sense. Better to use templates and libraries! I'm more interested in the automatic creation of heuristic systems.
  • Program design is not serial: it involves continuous interaction between the programmer and client, which serves to communicate the client's desire and DSK to the programmer.
    • An automatic progarmming system must pay close attention to the medium that supports the dialog between human and program. The meduim must be, effectively, a powerful tool. (think of the palm..)
  • "Most of the productivity improvements introduced by automatic programming will almost certainly be used to atack enormous programs, rather than huge programs". So, we'll forever be in need of more powerful programming tools!
  • At some point, a high-level programming language description of a problem & it's algorithm of solution is as short or shorter than the English prose description of what the programm is/was to do :: what other degrees of compression can be extracted?
    • That said, "An interesting research direction is the design of artificial languages that intentionally allow informality" (ala English).
    • These will most likely not be natural languages, though: look at schematics, mathematical formula. These are really DSLs...
  • Regarding example-based automatic programming (despite the fact that humans communicate through examples and analogies extensively, just look at this blog (for example!)): "Unfortunately, exept for toy problems, no one has been able to make an example based automatic programming system work, and there is reason to believe this failure is fundamental."
    • Why? No matter how many examples are provided, there is no way to gurantee generalization. Oh, but I should contest this!
  • "The problem of sythesizing a program satisfying a given specification is formally equivalent to finding a constructive proof of the specifications satisfiability. This is the idea behind deductive logic based automatic programmers"
  • "Transformational methods in some form are certain to be part of all future automatic programming systems."
  • Programers think in cliches - and in fact, this is a useful technique for solving many problems by inspection. Should be possible to embed cliches in an expert-auto-programming system.