evan_tech

Previous Entry Share Next Entry
01:14 pm, 14 Nov 09

ghc llvm

I read this thesis on an LLVM backend for GHC, primarily because I was curious to learn more about GHC internals. The thesis serves well as an overview of the pieces. As for the actual change, it seems to fall somewhere in between compiling via C and generating machine code directly on the sorts of tradeoffs you'd expect (generating machine code directly = more control, more performance, more work).

GHC is designed around the spineless tagless G-machine, a paper I've never gotten around to reading, but conceptually is like a VM that GHC targets and then codegens from. It has a number of values ("virtual registers") it wants to keep track of while running, like the current stack pointer and current thread object (see table 2.1 in the paper). Because they're used so frequently, GHC pins these through to machine registers. I was surprised to read that on x86 this leaves only one register free for doing computations!

That was mentioned in a section describing one place where the LLVM implementation outperforms the GHC implementation: because LLVM doesn't support this register pinning directly, the code generator instead sets things up so that the relevant values are frequently (at the entrance to each function, in fact) in the appropriate registers, with the assumption that the LLVM optimizer will leave the values alone since each function call would otherwise need to restore those values in the appropriate registers. But in fact, sometimes it does make sense to spill those to the stack, and those are exactly some of the cases where the LLVM implementation is superior.

(Update: found an old thread where SPJ briefly discusses LLVM.)