evan_tech

Previous Entry Share Next Entry
12:17 am, 13 Nov 03

where syntax actually matters

Actually, that last subject was lying. I have some more pseudo-science between linguistics and computers.

Human language is generally pretty efficient, in that speakers are motivated to say what they're trying to say in the fewest number of words / smallest amount of time¹. Languages with longer common constructions get them shortened via slang or muttering ("I am going to", English's weird new future tense, has become "eye-m-un-a") and ideally that means they eventually optimize themselves to the "best" balance between efficiency and other concerns (redundancy, compatibility, clarity (non-ambiguousness)).

The example I keep thinking about is complementizers. Consider:
(1) The boy that I like has blue eyes.
(2) The boy that likes me has blue eyes.
The "that" in the first sentence can be dropped out, but not in the second. Really, the "that" in both sentences isn't a meaningful word in the way "boy" is; "that" is there only to indicate a subclause is following. I mentally liken it to a [for example] curly brace in programming language: when you hit the "that", you know you need to parse in a specific structure (in English, a relative clause) that follows and that it specifies attributes of the word preceeding it.
(Contrast class Foo; and class Foo { ...bar... };.)

We can drop the "that" in the first example because we can figure it out from the doubled noun ("boy I"), etc. A different syntax would allow us to avoid "that" completely: in (my broken) Japanese, the sentences would be (roughly, of course):
(1) i like boy blue eyes has.
(2) me likes boy blue eyes has.
Because phrases are verb-final, a noun that follows a verb indicates it's a relative clause. (Japanese does have complementizers in a different context. I recall some discussion in a class about complementizers versus structure that totally blew my mind, but I'm pretty sure I never wrote it down and I can't remember it.)

Ok, programming languages. Who cares about code efficiency in terms of letters (ie, typing)? Most people don't--though type inference is quite nice once you've used it a bit--with one exception that hit me today: shells.

There was some noise on lambda_ultimate about Microsoft's new shell for their next Windows and the fancy way it passes objects around between processes, and then someone else retorted with another fancy shell, and I realized with both of these all I immediately thought about was how much will I have to type?

Unix shells work because there's a good tradeoff between simplicity and what you can accomplish with it. Really, only being able to communicate via unstructured byte streams kinda sucks (as anyone who has tried to get a list of files in a directory ordered by size has found²), but we've managed to do quite a bit with it (-print0 is a nice hack around some limitations of plain text, for example).

And really, how much real programming do we do in the shell anyway, where the difference will matter? (I certainly use for pretty regularly.) Contextual tab-completion, for example, needs to know what context you're in, which implies syntactic parsing. Maybe we don't go farther only because we don't know what we're missing.


1 A notable exception is formal speech, where we often introduce useless or longer words/phrases. The effect is really noticable in Japanese, where something like "come" (kiru, two short syllables) inflates up to five syllables with a long vowel and doubled consonant (irasshaimasu, yeah?) when you're talking about a superior. I assume this is because doing things that take more effort or that are uncomfortable are a way of showing respect/inferiority.
2 ls -lS is cheating; try ls -l | awk '{ print $5 " " $9 }' | sort -n instead. Only knowing a few extra characters like | buys me quite a bit, but all the quoting and awk step in an ideal world wouldn't be necessary.