evan_tech

Previous Entry Share Next Entry
10:19 am, 20 Jun 09

using git with svnsync

git.chromium.org is a git-svn mirror of the canonical SVN repository. It works like this: the SVN server pushes out, with svnsync, a mirror to another machine. That machine then has a cron job that runs git-svn against the local svnsync'd repo. (Part of this design was so this mirror machine doesn't have any access to the writable repo.)

By the way, there's info on our wiki about how to set up git-svn such that you can fetch with a fast git fetch from the mirror while still using the slow SVN server when it's time to commit.

This has been working fine for quite a while but I noticed that occasionally (rarely) it was getting the proper commit data but the author wrong.

$ echo $(git rev-list --author=chrome-bot origin | wc -l) $(git rev-list origin | wc -l)
86 14624

Half a percent of commits.

I asked around and the best guess is this surprising gotcha: SVN commits aren't atomic. :(
The author metadata is a separate property of a commit and so it's possible for my mirror to grab a commit before the author data has synced over.

What's the fix? svnsync puts a lock in the repo before syncing. Right now I check the lock. To be correct I'd need to grab the svnsync lock myself while I'm doing my copy. Another option is to rewind and try again whenever I see a bad commit get mirrored, but git-svn doesn't really like having history rewound without clobbering its metadata and I can't let it just rebuild its metadata from the commit history for complicated reasons outside the scope of this post.

In summary, now I have this git repo that has the wrong authors in some commits. Fixing it would require rebuilding history from the earliest instance of the problem, invalidating everyone else's copies. I haven't done it since I'm not convinced it's too important. Now that I look at the logs, it seems to have gotten much worse recently...