Evan Martin (evan) wrote in evan_tech,
Evan Martin

using git with svnsync

git.chromium.org is a git-svn mirror of the canonical SVN repository. It works like this: the SVN server pushes out, with svnsync, a mirror to another machine. That machine then has a cron job that runs git-svn against the local svnsync'd repo. (Part of this design was so this mirror machine doesn't have any access to the writable repo.)

By the way, there's info on our wiki about how to set up git-svn such that you can fetch with a fast git fetch from the mirror while still using the slow SVN server when it's time to commit.

This has been working fine for quite a while but I noticed that occasionally (rarely) it was getting the proper commit data but the author wrong.

$ echo $(git rev-list --author=chrome-bot origin | wc -l) $(git rev-list origin | wc -l)
86 14624

Half a percent of commits.

I asked around and the best guess is this surprising gotcha: SVN commits aren't atomic. :(
The author metadata is a separate property of a commit and so it's possible for my mirror to grab a commit before the author data has synced over.

What's the fix? svnsync puts a lock in the repo before syncing. Right now I check the lock. To be correct I'd need to grab the svnsync lock myself while I'm doing my copy. Another option is to rewind and try again whenever I see a bad commit get mirrored, but git-svn doesn't really like having history rewound without clobbering its metadata and I can't let it just rebuild its metadata from the commit history for complicated reasons outside the scope of this post.

In summary, now I have this git repo that has the wrong authors in some commits. Fixing it would require rebuilding history from the earliest instance of the problem, invalidating everyone else's copies. I haven't done it since I'm not convinced it's too important. Now that I look at the logs, it seems to have gotten much worse recently...
Tags: chromium, git

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • google ime

    Japanophiles might be interested to learn that Google released a Japanese IME. IME is the sort of NLP problem that Google is nearly uniquely…

  • ghc llvm

    I read this thesis on an LLVM backend for GHC, primarily because I was curious to learn more about GHC internals. The thesis serves well as an…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.