GHC is designed around the spineless tagless G-machine, a paper I've never gotten around to reading, but conceptually is like a VM that GHC targets and then codegens from. It has a number of values ("virtual registers") it wants to keep track of while running, like the current stack pointer and current thread object (see table 2.1 in the paper). Because they're used so frequently, GHC pins these through to machine registers. I was surprised to read that on x86 this leaves only one register free for doing computations!
That was mentioned in a section describing one place where the LLVM implementation outperforms the GHC implementation: because LLVM doesn't support this register pinning directly, the code generator instead sets things up so that the relevant values are frequently (at the entrance to each function, in fact) in the appropriate registers, with the assumption that the LLVM optimizer will leave the values alone since each function call would otherwise need to restore those values in the appropriate registers. But in fact, sometimes it does make sense to spill those to the stack, and those are exactly some of the cases where the LLVM implementation is superior.
(Update: found an old thread where SPJ briefly discusses LLVM.)