October 31st, 2006

  • evan

lazy lists as iterators

I think I've posted about this before, but lazy lists make a great iterator pattern. I keep writing code where I wish I could use them.

To be honest, laziness to me always seemed like a sorta unnecessary language feature. As with every language feature, there are always a couple toy examples floating around where it could be cool, but in general it never seemed like something worth designing a practical language around. (Haskell was designed around laziness for a bunch of valid and interesting reasons, but Haskell is also intentionally weird.)

But consider an iterator: you need a way to retrieve the current value, a way to advance, and a way to signal you're done. And then note how similar that is to a lazy list: you have a cons of the current value and a pointer to the code to advance, and the "done" signal is the end of list.

Other languages have a variety of approaches that always felt inelegant:
  • Python and friends use an exception to indicate end-of-iteration, which to me crosses past inelegance to plain "hack".
  • Many languages have special syntax to support iteration, via both a yield keyword as well as foreach-style looping, and these loops usually end up not supporting some useful but rare special case.
  • Ruby's blocks mean that your control flow is inverted -- there's no way to say in the middle of your loop "in this case, skip the next element".

To make this concrete, imagine I have some tree datatype that supports a preorder traversal. With a lazy list, the traversal is a function from trees to lists of nodes:
preorder :: Tree -> [Node]

Here are some compare-and-contrast patterns that I experience in other languages.
  1. Typical iteration operations are done with your existing library of list-processing functions.
    For example, to print each element in the preorder (for e in tree.preorder(): print e) you can use map:
    map print (preorder tree)
  2. What if I want to get a list back instead of an iterator?
    In Python you'll write code like:
    elems = []; for e in tree.preorder(): elems.append(e)
    in Ruby you'll do something similar or need to have preorder support either a block or returning an array. But when your preorder returns a list anyway, this is transparently supported:
    elems = preorder tree
  3. What if I want each index as I go? This is common enough in Ruby that there's an each_with_index method, but I always end up wanting a map_with_index and other variants. With lazy lists, you again can use the existing list-munging functions. Code like:
    zip [0..] (preorder tree)
    gives you a list (or equivalently an iterator) of pairs of (index, element).
  4. For more complicated control flow, like skipping certain elements, you can fall back on referring to the iterator explicitly like you do in C++ or Python:
    process (preorder tree) where
      process []     -> ...
      process (x:xs) ->
        if x == 3 then do ...; process (tail xs)     -- skip one
                  else do ...; process xs
Another way of looking at this is that lazy list iterators let you look at different iteration patterns as different views of your data type. In some sense, preorder tree means "tree when viewed in preorder". And since these are functions instead of data types, instead of iteration being this concrete step-by-step process you instead can combine different "views" together: for example, you could write an "everyOther" function that pulls out every other element from a list and then use that as just another piece in a chain of iterators. A reverse iterator over every other element from a tree with the elements numbered in preorder is then just
reverse (everyOther (zip [0..] (preorder tree))).

This also shows what you lose, and what I've left out of these examples: the control over when the actual iteration code runs. If, for example, iterating to the next element actually involved a network request, these lazy lists don't make explicit whether you pay all the processing up-front or if you pay as you go. (Since in Haskell you do make "when IO happens" explicit, if these examples involved IO they would make this explicit but also would be more complicated than how I've written them above. But if they were simply computationally expensive, instead of IO-based, then you basically* can't control it.)

* You can, but it's hacky.