One part of code called something like
re.compile(u'.*')and the other
re.compile('.*'). (In Python, prefixing a string literal with "u" makes it a Unicode string.) The problem turned out to be something like this: while those two regular expressions have different behaviors (the regular expression engine uses the Unicode flag to vary the matching behavior, I imagine), internally the
remodule caches the generated regular expression objects and only used their string values (and not the Unicode flag) as the key in the cache.
Depending on which of those two
re.compiles happened first, both would evaluate to either a Unicode regexp or a non-Unicode regexp.
So it turned out to be Python bug, and in fact one that has already been fixed. But the reason this is interesting to me is the reason why it's hard to track down. Nowhere in this programmer's (or my) mental model of how this code worked was the notion that a call to
re.compilewould behave differently depending on where you called it.
The technical term is that you expect
re.compileto be referentially transparent. This Python bug is easy to have in code written in pretty much any language (including my beloved O'Caml) because functions are effectively just subroutines.
In the Haskell gospels, they talk about "substituting equals for equals": all functions are by definition referentially transparent unless they're explicitly marked as having state. It's because Haskell sorta doesn't have an order of operations; you're just expressing definitions. In fact, one cool way to view monads that I only grokked recently is that they're a way of expressing a sequence of computations. In that light, a series of statements in a monad correspond to a series of "let" expressions in ML, which uses let expressions as a way of expressing order of evaluation.
It's really refreshing to think that this language is boiling away in my brain, changing structure and giving me new perspectives on ideas already familiar.
P.S.: I had a related bug today in some code that I had borrowed from Russell. Some function was defined as
foo(Bar bar, int baz)and I helpfully changed it to
foo(Bar& bar, int baz)so
barwouldn't have to be passed on the stack. But of course, the properly safe transformation is to
const Bar& bar, and had I done that I wouldn't have lost a day. (Bugs in programs that process large amounts of data are hard to track down.) Mostly stupid on my part.