$ ./perf
input is 325709 bytes.
* HaXml: ...
3.06 per second.
* hexpat: .......
37.31 per second.
[Updated: using ByteStrings brought this up to around 42 on my machine...]
(These comparisons, are, of course, not at all fair; HaXml provides a lot more functionality and a DOM-like API. But for my project I don't care about all that. I just want it to be fast.)
I spent quite a while trying to figure out how to get GHC to re-run a pure function (for benchmarking purposes) and eventually gave up. It seems you can make it work sometimes but the optimizer likes to say "oh, we already computed that". It seems there ought to be some pragma related to this but I couldn't figure it out.