2. git-svn, the utility that bridges svn repositories into git branches, has a similar index for mapping svn revision numbers to git tree hashes. Theirs is one hash per line, with newlines for clarity. It seems they had a very similar struggle to mine (from the git-svn source):
# Tie::File seems to be prone to offset errors if revisions get sparse,
# it's not that fast, either. Tie::File is also not in Perl 5.6. So
# one of my favorite modules is out :< Next up would be one of the DBM
# modules, but I'm not sure which is most portable... So I'll just
# go with something that's plain-text, but still capable of
# being randomly accessed. So here's my ultra-simple fixed-width
# database. All records are 40 characters + "\n", so it's easy to seek
# to a revision: (41 * rev) is the byte offset.
# A record of 40 0s denotes an empty revision.
# And yes, it's still pretty fast (faster than Tie::File).
Both of these rely on the source space being dense. If you had an LJ itemid of 10e6 you'd get a 40mb index file. This density was a property of LJ that I relied upon, even though it wasn't necessarily required of LJ. git-svn has the same shortcoming, especially when used with a tree managed with gvn, which tends to make the revision counter increase rapidly. A shared tree at work has 110k revisions and a 4mb revision database, despite my shallow git-svn clone of it only taking 4mb itself.
(PS: of course, both of these could easily be solved while still not pulling in a database by using a slightly more complicated file format, like a sorted list of entries. But it's not worth the effort; even 4mb is tiny, especially for data that's just a cache anyway.)
1 And with an endianness bug waiting to happen, which of course did only after it had been released and was working fine and someone managed to run LogJam with an NFS-mounted home directory on two systems with different endianness. Ugh.