evan_tech

Previous Entry Share Next Entry
01:09 am, 27 Apr 06

biodiversity

The original reason I thought to mention that Packrat paper again was to mention the other parsing paper from the same site. It's at least worth skimming through the introduction for this snippet:
Chomsky's generative system of grammars, from which the ubiquitous context-free grammars (CFGs) and regular expressions (REs) arise, was originally designed as a formal tool for modelling and analyzing natural (human) languages. Due to their elegance and expressive power, computer scientists adopted generative grammars for describing machine-oriented languages as well. The ability of a CFG to express ambiguous syntax is an important and powerful tool for natural languages. Unfortunately, this power gets in the way when we use CFGs for machine-oriented languages that are intended to be precise and unambiguous.
The relative power of different languages (in the Chomsky hierarchy sense) is another one of those things I really want to learn more about.

I've been carrying around two textbooks from college with the intent of rereading them: one is the linear algebra one I've recently resumed, and the other on the theory of computation, which is all about this stuff. It's actually conveniently the only citation from the wikipedia article on context-free grammars.


I've had this on my mind recently, especially now that I'm surrounded with Japanese. Human languages may all be derived from the same proto-language but they've had plenty of time to diverge, in the sense that I don't think developments in English really affected the structure of Japanese, and both are pretty different from the native American languages. And so while human languages definitely have similar core ideas (nouns, verbs, time) there is surprising variation in parameters I wouldn't naively expect to vary -- for example, it's arguable that Chinese doesn't have adjectives, and I find even the concept of "word" to be pretty fuzzy around the edges.

Computer languages, by contrast, haven't had the opportunity to develop in isolation from each other. (Additionally, the quote above, which is what recently set me off on this tack, points out that our theory of expressing computer languages is itself derived from a theory that was trying to model human languages.) There have been a few genuinely different approaches: lisp, surely, and maybe smalltalk, but generally all the the people working on these things read each other's papers and write code in each other's languages. I've often wondered how much that lack of biodiversity is keeping us back. (You can find a more ranty exposition of the same idea in this post from Steve.)

This is why I'm so interested in Haskell: it's not that I necessarily think their approach is more useful (certainly you'll still see plenty of ruby code here), but I love that they are trying something really different, y'know, shaking up some of the fundamental parameters and finding out what effects that has. It's even better when things like Packrat parsing or free asynchronous exceptions or interesting concurrency support* fall out of those decisions.

I feel the same way about the Perl6 community, even though I don't see myself ever using the language: they're willing to do some crazy stuff and potentially (probably? :P) fail, but at least they'll try to break new ground.

* New readers of this LJ will find some great discussions on SMP linked from that post from our resident grungy CPU expert [aka not me]. Also a followup here.