June 28th, 2006

  • evan


_why has a hack to generalize _with_index from Ruby, but it's quite complicated.

Background: many Ruby iterators have a _with_index flavor that gives you each value along with its index in the collection. For example, you can do
array.each do { |element| ... }
or you can do
array.each_with_index do { |element, index| ... }
but it's always felt a bit lacking that you have to implement both versions when you're writing code. (For example, I've needed map_with_index in a few programs so I've implemented it myself.)

Whenever I see these sorts of complicated hacks, I think to myself, "transforming code? that's functional!" The Ruby code there does crazy stuff like creating method names from strings and sticking instances variables into a block. Here's a more sane implementation in Haskell.

So, for example, I can take code like:
map (\char -> toUpper char) ['a', 'b', 'c']
and when I need the index of each element it becomes
map (\(char,index) -> some_code_using_char_and_index) (with_index ['a', 'b', 'c'])

The magic "with_index" code is simple enough that it's almost not worth including:
with_index :: [a] -> [(a, Int)]
with_index l = zip l [0..]

Laziness makes many "iterator" patterns nearly transparent in Haskell, because you don't have this tension between "do I create a data structure and return the whole thing?" or "do I expose an interface to get each element sequentially?" as they're both implemented the same way. (It's not completely magical, of course; sometimes IO complicates things.)

(Rubyists will probably feel my pain when they see the following snippet. I feel like I've written code like this a million times, where "foo", "bar", and "baz" depend on the situation:
def foo
  ret = []
  bar.baz { |x| ret << x }
  • evan

dynamic variables and global state

There's a casual remark in this paper Delimited Dynamic Binding that prompted a bit of head scratching and eventually some gained intution:
In general, dynamic binding generalizes global state and the singleton pattern to multiple application instances that may coexist in the same execution environment.

First, some background (skip this and the following paragraph if you already understand dynamic scope). Pretty much all programming languages use lexically scoped variables, so it's a bit hard (at least for me) to reason about alternatives. This wikipedia article goes into details with examples, but in brief, lexical scoping means that a reference to a named variable refers to the variable defined in a lexically (often textually) enclosing scope. If I define a variable "x" in a namespace and then within a function in that namespace I refer to "x", I'm referring to the "x" I defined "above".

In contrast, with dynamic scoping, a reference to "x" refers to whichever "x" that has been most recently defined by whichever code calls the function. Imagine a function in an entirely separate module defines an "x" then calls into the previously-described function: with a dynamically-scoped "x", that function would get the caller's "x" and not the namespaced one. (There are longer and clearer examples in the wikipedia article I mentioned, so go read that if this didn't make sense. Reportedly Emacs lisp uses dynamic scoping? Maybe one of you Emacs hackers can correct my understanding here.)

Dynamic scoping is definitely weird and arguably not useful, but it helps to think about it in how it can be applied. The most popular application I know of is Perl (which of course has every crazy language feature imaginable, including dynamic variables). The standard Perl example is to make a temporary change to the input field separator when slurping a file. In Perl, $x = <INFILE>; reads a single line from INFILE, so idiomatically to read an entire file, you temporarly munge the end-of-line magic variable before reading: { local $/ = undef; $x = <INFILE>; }.* One way to parse that code is that this "<>" operator reads from a file, and it uses whichever value of $/ has been most recently defined to know where to stop.

But Perl is also confusing because you can dynamicize any variable. Another example that applies across languages is the concept of stdout. It's common enough in a shell script to change where stdout points, and I've done it in larger programs. When I run "foo.sh > bar", I've changed stdout not only for foo.sh, but also for all subprograms it invokes. In fact, in Ruby, stdout has two different names to (I imagine) reflect conflicting ideals. STDOUT is a global (immutable) constant, while $stdout is a global variable.

The paper has four different samples of how these might be generally useful, and they all have the same pattern: when you have some sort of globally well-known variable or value that's still a parameter of sorts. I call it "well-known" because the actual name of the variable matters -- it's not an explicit parameter to the function, and instead the caller and callee use an agreed-upon name. (In Unix's stdout case, the agreed-upon name is "file descriptor number 1".) Yet it still works like a parameter in the sense that sometimes you need to twiddle it between multiple calls to the same function.

All of this is more interesting and useful in a language like Haskell: interesting because of static typing, and useful because you don't really have mutable global values. To illustrate the latter, imagine a program that has a command-line flag like "--verbose", and that your main function parses command line flags. Then, in Haskell, every function within your program that wants to know whether the verbose flag is on must take that as an explicit additional parameter. (This is part of one of the fundamental Haskell promises: a given function, given the same input, always has the same output, no matter when/where it's called. Allowing a function to refer to global values that can be changed between calls would violate that.) This is actually a huge pain in real programs.

It turns out there's an experimental dynamic scoping extension in GHC (the main Haskell implementation), but they call it implicit parameters for reasons I hope the previous example shows. I had read about this before when I was browsing the manual, but the paper and quote I mention at the beginning showed it in a new light: these implicit parameters really do behave like a generalization of global variables. They act like globals in that they're not explicit parameters to a callee and but they're tweakable by a caller, and they're more general than globals in that different parts of the program can use them with different values without stepping on each other.

For further reading, check out the wikipedia article and GHC docs I linked to. The latter goes into some hairy details on how they make it really work.
(Regarding the rest of the paper: still working on understanding it.)

* In fact, now that I look I see one example that instead uses the same trick to temporarily munge @ARGV to put the filename in question in and then just reads from <>, which knows to read from "the first argument" to the program, pulling it from @ARGV. Perl is nothing if not sick.