First, some background on pointer tagging. In the implementation of a dynamic language you need to be able to figure out what exactly a given value (thinking concretely at the machine level here, like a value in a register) holds. One easy way to do this is to make everything a pointer to a known layout of an object, but that means that even integers are pointers and that's inefficient. A common trick is this: because pointers are word-aligned, the low bits are zero, so you can use the lowest bit(s) as a tag on a word to indicate that the integer is stored "inline": that is, if bit 0 is set, then bits 1-31 are the integer's value, while otherwise it's a pointer to some other sort of object. (If this is unfamiliar to you, read this overview of Ruby's VALUE datatype; I didn't know Ruby did this until I looked up that article but I expected that was their design.) I believe Lisps do something similar as well.
I made an old post¹ on here about O'Caml does this -- it's the same thing, or at least it was in 2003 -- but the cute bit with O'Caml is in their implementation. Since they're compiling to native code, they can actually use the x86 addressing hardware to add integers in a single instruction. That is: when you have two integers A and B that are represented in registers as
(A << 1 | 1)and
(B << 1 | 1), they can compute their representation of the sum,
((A+B)<<1 | 1), without doing all the bit math.
Ok, then: Haskell. Haskell² instead has this hideously inefficient-seeming implementation where all variables³ are -- get ready, it's a bit freaky -- pointers to code that is executed whenever the value of the variable is needed. This is a cute trick for lazy evaluation: the first time the value is actually needed, only then does the code for computing its value run; at the end of its run, this code overwrites itself with code that just quickly returns the result of its computation. So all subsequent accesses to that value (which again execute that code) can return the computed value instantly.
But this design is also terrible, because any time you access any of these variables you're doing a couple of jumps and (as this paper's introduction observes) you screw branch prediction and cause pipeline stalls. To me this design is very Haskell-y: sacrificing performance for internal purity and consistency. (For all of that, it still performs well in microbenchmarking contests -- why?)
Haskell's got the same two low bits on these pointers (3 bits, even, on 64-bit architectures) that are unused just like in these other languages. The paper says: let's use those bits to tag pointers to maybe avoid doing these jumps and speed things up!
...and right there was my understanding when I first skimmed through this paper -- which also has nice architectural descriptions about GHC and is pretty easy to follow -- and it seemed straightforward enough to not warrant posting about here.
But here's actually something more interesting going on. (As common when I post here, I'm embarrassing myself in writing this because I clearly didn't understand the paper at in depth on my first-read through.) The thing about Haskell is that it's statically typed: you don't need to use the tag bits to represent information about what type of data you're looking at, because that's known at compile-time -- so not only can you use these bits, you can use them differently and specifically depending on what type of data you're considering!
So for a list, the pointer can encode whether you've got a cons cell or the end of the list. Or for a boolean, the pointer directly encodes whether it's pointing at true or false. (Due to the way Haskell's implemented, these are both specific examples of the underlying implementation of this feature.) When it's time to use the value you don't need to follow the pointer at all.
There's still lazy evaluation to support: if the bottom bits are 00, that encodes that you still need to do the jump to code to compute the value. But once the value has been computed, for data types that have three or fewer cases for their value (including booleans and lists), that value be encoded directly into the pointer. For types that have more than three cases, you can still encode whether they've been evaluated already or not (which allows the implementation to read the value behind the pointer without executing code). And again, you can encode all of this information in just two bits because, as a statically-typed language, you can have different behavior for different types.
The paper claims this change is a few hundred lines to the Haskell compiler and produces a 10-14% speedup over their benchmarks suite.
(PS: What about O'Caml? They're statically typed too, so why do they need the this-is-an-integer tagging in the first place? Totally guessing here, but maybe it's so the garbage collector can know that integer values (indicated with a low bit of 1) are not to be followed as pointers. But that's totally a guess...)
1 I tracked it down to discover all of the links were dead. :(
2 Here by "Haskell" I mean "the GHC implementation of Haskell".
3 Haskell doesn't really have variables, because values can't change. But hopefully you get what I'm saying.