October 8th, 2003

  • evan

more on currying and safety

Now that I'm officially studying programming languages I'm beginning to get a better grasp of how thorough the ML approach is, and I'm really blown away by it. They basically took an idea (type safety and inference everywhere) and designed the language around it, even making some pretty significant sacrifices to achieve that goal.

You'd have to learn the language to fully appreciate it, but the basic idea is this: there are countless errors that only crop up at runtime in many languages that really ought to have been detected at compile-time¹. Cyclone adds some of these detection features to C, but it does so at the cost of more verbose syntax (though they really kept it down to a minimum); ML manages to make it actually less verbose than C but simultaneously more safe.

The way they seem to approach safety is by extending the type system thoroughly and pushing as much as possible into the type system. With that in mind, I thought of another aspect of currying today that may make it more clear for you. If anyone else could contribute to this I'd appreciate it.

First: currying is a mathematical abstraction, and isn't necessarily useful except for its clarity and reduction of a complicated concept into a simpler one. It's easy to see (and write code) that converts from a curried style to a tradional tuple-passing style. But with curried functions, there's more conceptual elegance. No longer do we have functions of multiple arguments; if f is a function, f refers to the function and f x applies f to x with the whole expression evalutating to the result of f applied to x, and that's all of the syntax/semantics there is.

When you write a compiler, there's a pile of steps and verification you do at different levels. First you lex the text into keywords and variables, then you parse those into a syntax tree, and errors found there are pretty obviously invalid programs. There's a step to verify types; for example, 1 + "2" isn't legal in most languages. Then there are extra steps for processing all the other information in the program that is outside of the syntax: for example, in C when you call f(a, b, c) you need to verify that f takes three arguments and that a, b, and c are each of the correct type, and that is why you need function prototypes that describe how functions behave and an extra step in your compiler to take each subexpression and verify it matches the prototype.

Now consider ML, where everything is about types: the type verification of f x requires that f be a function, because that is the syntax for function application. (If x is an integer, an expression like x 3 produces This expression is not a function, it cannot be applied.) For a more complicated expression, like a function add (used like add 1 2 as you'd expect), the subexpressions have these types:
codetype
addint -> (int -> int) (a function mapping integers to functions that map integers to integers)
add 1int -> int (this is the parenthesized expression above, and is another function)
(add 1) 2int (two chained function applications; this is the result of plugging an int into the expression above)

(Because functions always take only one argument, the parenthesis are unnecessary: add 1 2 is equivalent to (add 1) 2.)

So what? Note that using the result of add 1 2 in code like print_int (add 1 2) means that the type checker validates not only that the parameters to the add are of the correct type, but also that you supplied the correct number of arguments to add. For example, something like print_int (add 1) tries to apply print_int (which expects an integer) to the result of add 1, which is a function from ints to ints (see the second line of the table above). Similarly with add 1 2 3: there's an error on the 3 because the result of add 1 2 is not a function to be applied to 3. So there's no iteration in the compiler over a function's arguments and no counting of arguments... that verification just implicitly falls out of the way things are defined.



1 GTK code is littered with functions like
void foo_set_bar(Foo *foo, ...) {
  if (foo == NULL)
    some_fatal_error();
  ...
}

where they have some preprocessor macros so those NULL tests disappear when in a non-debug build.
I guess this is an OK solution, but it's also ugly. Ideally, the solution is to make it impossible to express the idea of passing a NULL to such a function, because the function is undefined for foo = NULL, rather than waiting for it to happen and then throwing an exception.
  • evan

the dark side of anonymity

When we were discussing trying to make weblog systems interoperate, there was talk about trackback and comments working across sites. Brad, of course, refused to allow trackback/comments that wasn't authenticated. Everyone else said "oh, it's not that much of a problem... people generally sign their trackbacks/comments properly".

Well, of course they do. They did so in email, too, and now I get hundreds of spams a day. Doing it on the web is even more insidious, as recent wiki spamming indicates. This guy (calling himself a "search engine optimizer") inserts links to his site into wiki pages. Why? It inflates his Google pagerank.

It's just a matter of time before someone starts crawling the web looking for open comment/trackback forms and doing the same thing. (Also, this guy reportedly did it manually; a script would be much more effective.) What do we do then? Me, I'll just laugh at them from the safety of LiveJournal. On the Atom list, they're talking about how to combat this manually(!?) by creating notifications that let them be notified of the spam so they can revert the pages.

(Though my email situation still sucks. And if LJ ever drops invite codes, the problem returns...)