April 27th, 2006

  • evan

my meme-propagation technique is unstoppable

A few weeks ago, I guy I know from work* pointed me at an rather interesting but rather nerdy paper from four years ago. I then dutify relayed to LJ. Then I think Gaal picked it up and mentioned it Audrey, 'cause it's been mentioned on the Pugs blog a few times, and just now I saw it show up on CleverCS.

I used to discover all the neat new things, like that Packrat paper, from a site called "sweetcode.org". If you were lucky enough to see it before it was snapped up by the domain squatters you too can share a moment of silence with me in memory, for it was truly a great site.
It turns out that the guy behind sweetcode is a coworker as well, and though he was in New York when I discovered this fact, it just so happened that he ended up sitting directly on the other side of the glass wall of the office I used to hang out in. I had always intended to go beg him to bring the site back but I kept forgetting about it. And now I'm far away and email just doesn't seem proper.

...oh! It just occurred to me that you can you can look through the internet archive of it! Especially now that I can look back, I can see things that took me years to rediscover later: on just one randomly chosen page of it from 2003 they mention Zero Install (rediscovered that for fg recently), darcs (I've been using it recently), Judy (the hash that Audrey's been raving about), and Sequitur (an algorithm for finding hierarchies in strings, which I learned about via erinearl's research). And those are just the ones I rediscovered: the page lists five other equally-interesting things to consider.

* I feel a bit weird using "friend" 'cause we've only hung out once or twice, and I'm sorta in too much awe of him to really be able to talk as people do: most people at work are pretty sharp but I've met a few whose minds must be a strict superset of mine, and I think he's one of them.
  • evan


The original reason I thought to mention that Packrat paper again was to mention the other parsing paper from the same site. It's at least worth skimming through the introduction for this snippet:
Chomsky's generative system of grammars, from which the ubiquitous context-free grammars (CFGs) and regular expressions (REs) arise, was originally designed as a formal tool for modelling and analyzing natural (human) languages. Due to their elegance and expressive power, computer scientists adopted generative grammars for describing machine-oriented languages as well. The ability of a CFG to express ambiguous syntax is an important and powerful tool for natural languages. Unfortunately, this power gets in the way when we use CFGs for machine-oriented languages that are intended to be precise and unambiguous.
The relative power of different languages (in the Chomsky hierarchy sense) is another one of those things I really want to learn more about.

I've been carrying around two textbooks from college with the intent of rereading them: one is the linear algebra one I've recently resumed, and the other on the theory of computation, which is all about this stuff. It's actually conveniently the only citation from the wikipedia article on context-free grammars.

I've had this on my mind recently, especially now that I'm surrounded with Japanese. Human languages may all be derived from the same proto-language but they've had plenty of time to diverge, in the sense that I don't think developments in English really affected the structure of Japanese, and both are pretty different from the native American languages. And so while human languages definitely have similar core ideas (nouns, verbs, time) there is surprising variation in parameters I wouldn't naively expect to vary -- for example, it's arguable that Chinese doesn't have adjectives, and I find even the concept of "word" to be pretty fuzzy around the edges.

Computer languages, by contrast, haven't had the opportunity to develop in isolation from each other. (Additionally, the quote above, which is what recently set me off on this tack, points out that our theory of expressing computer languages is itself derived from a theory that was trying to model human languages.) There have been a few genuinely different approaches: lisp, surely, and maybe smalltalk, but generally all the the people working on these things read each other's papers and write code in each other's languages. I've often wondered how much that lack of biodiversity is keeping us back. (You can find a more ranty exposition of the same idea in this post from Steve.)

This is why I'm so interested in Haskell: it's not that I necessarily think their approach is more useful (certainly you'll still see plenty of ruby code here), but I love that they are trying something really different, y'know, shaking up some of the fundamental parameters and finding out what effects that has. It's even better when things like Packrat parsing or free asynchronous exceptions or interesting concurrency support* fall out of those decisions.

I feel the same way about the Perl6 community, even though I don't see myself ever using the language: they're willing to do some crazy stuff and potentially (probably? :P) fail, but at least they'll try to break new ground.

* New readers of this LJ will find some great discussions on SMP linked from that post from our resident grungy CPU expert [aka not me]. Also a followup here.