Previous Entry Share Next Entry
11:18 am, 26 Oct 08

gat update

I've still been playing with gat occasionally; getting sick recently (I blame snej!) probably helped. The needless goto I mentioned in my last entry was, of course, found and cleaned up by the multitude of people chiseling away at git.

I added a command like "git log" to display the history of a project and that finally had enough work to do for me to look into performance. GHC has a profiler but for some reason I had expected it to not be helpful, which is not at all the case -- it did pretty much exactly what I wanted! I guess I had thought it'd be hard to properly attribute blame when everything's computed lazily, but on the other hand I can image how it might work. My initial naive code (e.g. repeatedly concatenating linked lists) was somewhere around 10x as slow as git; now it's about 1.4x, which is quick enough that I don't care. (Git is of course doing a lot more, so it remains to be seen whether that will hold.)

I also got the pager working, so gat diff HEAD^ now shows a paged, colored diff.

For those following at home, the performance improvements were pretty standard: stuff like dropping a high-level library or two, using better algorithms, and caching. Brad commented that it's sad that for performance I have to stop using my nice libraries, but I think I'm just dropping a level or two of abstraction and not the entire benefit of using a high-level language. You make separate modules that are covered by tests and then you're free to make the internals as gnarly as you want, and even my gnarlified parser is still using stuff like foldl.

I've commented many times about how the culture of people using a language drive the direction it heads in, for better or for worse. I find it interesting that for Haskell the two main drivers seem to be performance and research into ever-fancier abstractions, as those seem at odds. What you end up with is this incredible mathematical purity alongside operations talking about pointers; my binary search, above, juxtaposes being parameterized by two type classes (one of which was Monad) with searching within a block of mmap'ed memory.

I especially dig this because you typically think of programming languages and libraries as trading off productivity and performance, and it's nice to at least imagine there's some way to get both. For example, recent work on left fold enumerators promises to be a high-level abstraction around event-driven block-based IO, and it tickles me to see quotes like "The ordinary, recursive left fold is the fix point of the non-recursive one. On the other hand, the instantiation of the non-recursive left fold as a stream, as we shall see, effectively captures a continuation in a monadic action" in a discussion of a performant IO representation.