So, gat. It can read the index, loose objects, and a bit of pack files; it has an implementation of the delta decoder so it can read delta-compressed objects out of the pack format; it understands how to convert
refs/remotes/git-svn, recursively parsing out its parent pointers, and decoding that; it can do Git-style color diffs between various trees and commits. And finally and most importantly: it does some
mmap()ing but is not at all fast, it is incorrect in the pieces it has implemented, and it does no writing of any data structures whatsoever -- that is to say, nothing useful.
But it's been fun to play with, at least. Maybe I'll clean up those docs and post them too at some point.
A few notes from reading Git code:
- Git is a jumble of random nearly-commentless code, full of globals and strange state and not at all clear control flows. On the other hand, it's also much more Unixy than the code I'm used to reading, doing all sorts of tricks like using mmap() instead of read() (because the latter just involves an extra copy, y'know?) and forking. I am simultaneously impressed and terrified of what's likely going on in my kernel.
- To align some disk-based data structure Git uses this code:
(offsetof(struct cache_entry,name) + (len) + 8) & ~7
That's an off-by-one error (gets eight bytes of padding where zero would do) that I'm a little puzzled by -- maybe it's intentional? Maybe it's remained for backwards compatibility?
- This goto also makes no sense to me:
ret = ent->data; if (ret && ent->p == p && ent->base_offset == base_offset) goto found_cache_entry; return unpack_entry(p, base_offset, type, base_size); found_cache_entry:(Those are the only two instances of
found_cache_entryin the file...)
- Git contains at least three incompatible bit-packed integer encodings, found in pack file offsets, object headers, and the delta format.
- It's crazy to me that in 2008 people are still writing code that is manually passing around buffers and lengths and verifying by hand that the space works out.
lesscan take all sorts of useful flags, causing it to e.g. only page if necessary, allow color but not other control codes through, and more! When Git spawns less as a pager, it actually execs less in the parent side of the fork, making itself the child of the less process; dear Unix people, explain to me why you'd do that? (I have my thoughts, but I'm curious about yours.)