evan_tech

Previous Entry Share Next Entry
07:51 pm, 22 Aug 06

lazy big-O

Concatenating two linked lists of length m and n in your standard functional languages is a common operation understood to be O(m).

This gets weird in a lazy language. Here's an implementation of a standard concat function, for reference:
concat (a:as) bs = a : concat as bs
concat []     bs = bs

If I write code like:
x = concat l1 l2
I get back a lazy value like:
{a0 : concat a1.. bs}
(I'll use curlies around code to represent a suspended computation, and subscripts to index into the lists). This is an O(1) operation, at least wrt to the list constants m and n.

If I later try to get the first element of x, it's forced to evaluate a single step and produce the first element of the concatenation. This is again O(1), leaving x as:
a0 : {a1 : concat a2.. bs}

By that logic, concatenating the lists is O(1), but you haven't shifted the time elsewhere -- you've added O(k) to accessing the kth element of x, but that was O(k) before concatenation was involved. It's like the concatenation happened for free! (Except it's not really free, of course; the concat created an an extra O(m) new conses.)

In terms of big-O algorithm time though, the effect is constant and dropped. This is one of the reasons laziness can (almost paradoxically) lead to more efficient code: you only pay for the bits you actually use. If the bit you use is significantly smaller than the total list -- small enough to offset the (large) additional cost of introducing and evaluating all these thunks -- then you've saved time.



But you still need to take care. If you have a list of lists to concatenate, then you can make it inefficient if you concatenate the "wrong way":
concat (concat (concat L0 L1) L2) L3
There, getting to the elements in the leftmost list goes through O(length(L)) layerings of concat before it reaches an element.

Instead, concatenating "on the other side":
concat L0 (concat L1 (concat L2 ...))
gets you back to the "computation for free" state.

This difference between these two recursion schemas (which have the same result for associative operators) is the difference between foldl and foldr, which are demonstrated in a amusingly interactive way via foldl.com and foldr.com. In Haskell, the list-of-lists concatenation function is itself called concat (the concat function I defined at the beginning of the post is just ++), and it's defined in the Haskell prelude as foldr (++) [].

PS: Rubyists, you may recognize foldl as Enumerable#inject; I don't think Ruby has a foldr. My impression is that Ruby's gratuitously-different names for functional operators (like select for filter) date back to Smalltalk.