evan_tech

Previous Entry Share Next Entry
12:12 pm, 8 Oct 03

more on currying and safety

Now that I'm officially studying programming languages I'm beginning to get a better grasp of how thorough the ML approach is, and I'm really blown away by it. They basically took an idea (type safety and inference everywhere) and designed the language around it, even making some pretty significant sacrifices to achieve that goal.

You'd have to learn the language to fully appreciate it, but the basic idea is this: there are countless errors that only crop up at runtime in many languages that really ought to have been detected at compile-time¹. Cyclone adds some of these detection features to C, but it does so at the cost of more verbose syntax (though they really kept it down to a minimum); ML manages to make it actually less verbose than C but simultaneously more safe.

The way they seem to approach safety is by extending the type system thoroughly and pushing as much as possible into the type system. With that in mind, I thought of another aspect of currying today that may make it more clear for you. If anyone else could contribute to this I'd appreciate it.

First: currying is a mathematical abstraction, and isn't necessarily useful except for its clarity and reduction of a complicated concept into a simpler one. It's easy to see (and write code) that converts from a curried style to a tradional tuple-passing style. But with curried functions, there's more conceptual elegance. No longer do we have functions of multiple arguments; if f is a function, f refers to the function and f x applies f to x with the whole expression evalutating to the result of f applied to x, and that's all of the syntax/semantics there is.

When you write a compiler, there's a pile of steps and verification you do at different levels. First you lex the text into keywords and variables, then you parse those into a syntax tree, and errors found there are pretty obviously invalid programs. There's a step to verify types; for example, 1 + "2" isn't legal in most languages. Then there are extra steps for processing all the other information in the program that is outside of the syntax: for example, in C when you call f(a, b, c) you need to verify that f takes three arguments and that a, b, and c are each of the correct type, and that is why you need function prototypes that describe how functions behave and an extra step in your compiler to take each subexpression and verify it matches the prototype.

Now consider ML, where everything is about types: the type verification of f x requires that f be a function, because that is the syntax for function application. (If x is an integer, an expression like x 3 produces This expression is not a function, it cannot be applied.) For a more complicated expression, like a function add (used like add 1 2 as you'd expect), the subexpressions have these types:
codetype
addint -> (int -> int) (a function mapping integers to functions that map integers to integers)
add 1int -> int (this is the parenthesized expression above, and is another function)
(add 1) 2int (two chained function applications; this is the result of plugging an int into the expression above)

(Because functions always take only one argument, the parenthesis are unnecessary: add 1 2 is equivalent to (add 1) 2.)

So what? Note that using the result of add 1 2 in code like print_int (add 1 2) means that the type checker validates not only that the parameters to the add are of the correct type, but also that you supplied the correct number of arguments to add. For example, something like print_int (add 1) tries to apply print_int (which expects an integer) to the result of add 1, which is a function from ints to ints (see the second line of the table above). Similarly with add 1 2 3: there's an error on the 3 because the result of add 1 2 is not a function to be applied to 3. So there's no iteration in the compiler over a function's arguments and no counting of arguments... that verification just implicitly falls out of the way things are defined.



1 GTK code is littered with functions like
void foo_set_bar(Foo *foo, ...) {
  if (foo == NULL)
    some_fatal_error();
  ...
}

where they have some preprocessor macros so those NULL tests disappear when in a non-debug build.
I guess this is an OK solution, but it's also ugly. Ideally, the solution is to make it impossible to express the idea of passing a NULL to such a function, because the function is undefined for foo = NULL, rather than waiting for it to happen and then throwing an exception.