July 23rd, 2007

  • evan

icfp conclusion

ICFP this year: still good, but a bit of a bummer.

You were given "DNA" (a 7mb string of the letters I,C,F,P) and instructions on how to parse and "run" the DNA, producing "RNA". The "RNA" was then another language of drawing commands, which (when ran) produce an image. The image was of an alien who crash-landed on earth; the objective was to produce a "patch" of DNA that, when prefixed onto the original DNA, would produce an image as close as possible to a given altered target image of the alien, more "adapted" to his surroundings.

Like last year, the majority of the teams couldn't get something that could actually even produce the original image. I think we did fine on this (minus my awful bug mentioned in the previous post) -- I'd say minus that bug I spent ~4 hours on it, including a bunch of performance work -- but the "real" contest was all about reverse-engineering these languages.

The DNA language was made up of rewrite rules: each step was a "pattern" that would skip around and match part of the DNA, then a "template" which would perform some modification of the DNA.
Tobin (nibot / nibot_lab is good reading) had an awesome find of a textual message hidden in the letters of the DNA (that is, if you viewed it in a text editor at the right width), which led him to extracting another region, which was a binary-encoded PNG, which led to another region which was a binary-encoded MP3 that was of a person reading a patch. We were in! ...but not; we never figured out what the resulting image was for. I finally noticed that the very beginning bit of the "pure" DNA just wrote RNA to draw some letters, and upon running those letters as a patch it produced a page of a sort of manual as well as a few other patches.

One of these patches (this was about mid-Saturday for us, and recall that I started late on Friday) was all we (and everyone but the top 30 teams of the hundreds that participated) was the best we got in the entire contest.

Using these patches, we thought we managed quite a lot: by reverse-engineering the "manual page" patch we found there was a number encoded there, and by modifying that number we found other manual pages, which described all sorts of crazy things. Of special interest was details about the language, such as how execution worked (in order, memory was three zones: red green blue, which corresponded to currently-running code, data segment, and stack) as well as a list of symbols in the "green" (static code/data) zone.

But there were also pages on Lindenmayer systems, steganography (Tobin figured out this one and found a secret number in another message), compression, adaptation trees(?), alien text encoding, alien hieroglyphics, even an "encrypted message" that Ryan attacked (was just a substitution cypher, I think).

But all for nothing. Even as we cracked these individual messages, the larger problem was how to use this information -- for example, it looks like L-systems were for drawing a plant, but how could we patch their L-system generator? And aside from minor patches (like trying to swap out constants found at the symbol table) we never successfully made a working "software" patch.

Many of the DNA rules would look like (as a Perl-style regexp; this is of course *not* what we used): s/^(.*.{offset1}.{offset2}(.{constant}))/$1$2/ -- that is, snip out a region from the middle, paste it onto the front as well as the all the code you traversed again. This is pretty obviously a sort of "function call", but beyond that you couldn't see much; since the string was continually rewriting itself (and the offsets were often relative) it was hard to figure out what offset of code was actually being run. (Often times code ended up duplicated into various regions.) Since the parse of the pattern affected where the template started, you couldn't tell whether a given region was pattern or template language.

I even wrote a (pretty spiffy, to me at least) "compiler" to produce DNA code from a higher-level language. One of their docs was about how to write a DNA "gene" (function) call, which I eventually wrapped up into a single line of my source language. But even patches to run innocuous-looking functions from their symbol table, like "addInt", would appear to run fine (we could see the result of the addition on the stack) and then execution would veer off catastrophically.

So in summary, we scored the same as pretty much everyone else, despite (to me) getting pretty far beyond that point. I expect the real jintent of the contest was that people would spend most of their time hacking these subproblems, like compression and L-systems, but we (and I imagine most people) never got that far. I'm not sure if it was a problem with the docs -- or maybe the curve was too steep? -- but we ended up pretty disappointed. Our final score, along with the majority of contestants, was the result of just running the RNA found at the very beginning of the input DNA, applying it as a patch, and then submitting one of the resulting outputs.