10:19 am, 20 Jun 09
using git with svnsync
git.chromium.org is a git-svn mirror of the canonical SVN repository. It works like this: the SVN server pushes out, with svnsync, a mirror to another machine. That machine then has a cron job that runs git-svn against the local svnsync'd repo. (Part of this design was so this mirror machine doesn't have any access to the writable repo.)
By the way, there's info on our wiki about how to set up git-svn such that you can fetch with a fast
This has been working fine for quite a while but I noticed that occasionally (rarely) it was getting the proper commit data but the author wrong.
$ echo $(git rev-list --author=chrome-bot origin | wc -l) $(git rev-list origin | wc -l)
86 14624
Half a percent of commits.
I asked around and the best guess is this surprising gotcha: SVN commits aren't atomic. :(
The author metadata is a separate property of a commit and so it's possible for my mirror to grab a commit before the author data has synced over.
What's the fix? svnsync puts a lock in the repo before syncing. Right now I check the lock. To be correct I'd need to grab the svnsync lock myself while I'm doing my copy. Another option is to rewind and try again whenever I see a bad commit get mirrored, but git-svn doesn't really like having history rewound without clobbering its metadata and I can't let it just rebuild its metadata from the commit history for complicated reasons outside the scope of this post.
In summary, now I have this git repo that has the wrong authors in some commits. Fixing it would require rebuilding history from the earliest instance of the problem, invalidating everyone else's copies. I haven't done it since I'm not convinced it's too important. Now that I look at the logs, it seems to have gotten much worse recently...
By the way, there's info on our wiki about how to set up git-svn such that you can fetch with a fast
git fetch
from the mirror while still using the slow SVN server when it's time to commit.This has been working fine for quite a while but I noticed that occasionally (rarely) it was getting the proper commit data but the author wrong.
$ echo $(git rev-list --author=chrome-bot origin | wc -l) $(git rev-list origin | wc -l)
86 14624
Half a percent of commits.
I asked around and the best guess is this surprising gotcha: SVN commits aren't atomic. :(
The author metadata is a separate property of a commit and so it's possible for my mirror to grab a commit before the author data has synced over.
What's the fix? svnsync puts a lock in the repo before syncing. Right now I check the lock. To be correct I'd need to grab the svnsync lock myself while I'm doing my copy. Another option is to rewind and try again whenever I see a bad commit get mirrored, but git-svn doesn't really like having history rewound without clobbering its metadata and I can't let it just rebuild its metadata from the commit history for complicated reasons outside the scope of this post.
In summary, now I have this git repo that has the wrong authors in some commits. Fixing it would require rebuilding history from the earliest instance of the problem, invalidating everyone else's copies. I haven't done it since I'm not convinced it's too important. Now that I look at the logs, it seems to have gotten much worse recently...
I don't know if this would address the problem (is it svnsync or git-svn that's seeing the half-committed commits?) but I wonder why that second svn repository is needed... couldn't that second repository instead be a git repository? git seems to be much better at mirroring than svn, so I generally try to escape svn as early as possible and do everything with git after that last hop.
svn
svn, yes. But cvs was not atomic. It committed files one by one. atomic changesets was theoretically one of the big reasons to switch from cvs to svn.Crying wolf
Hold on, svn commits ARE atomic. Isn't the problem here just that svnsync operates non-atomically: it changes the destination repository twice for each revision. First it makes a normal (atomic) commit, with itself ("chrome-bot") as the "author" of the commit, and then it makes a separate edit to change the "author" to the author of the original (source repository) revision. It's that sequence of two changes that's catching you out.- Julian