10:41 am, 10 Feb 07
monotone tech talk
(For some background, try my other posts on these systems, which span four years of history, yikes. My earliest post there I'm tentatively considering distributed version control as well as functional programming!)
There was a tech talk on monotone at work on Friday. It was kinda disastrous from a giving-talk standpoint, where the speaker was late (not his fault, I'm sure), then tried to project his slides with Linux (people always try this and it never works), then nobody had a laptop to lend him, then someone's Windows laptop decided to somehow die right as they were plugging it in, then someone's Mac laptop projected but the PDF-displayer program wasn't responding to key presses... I felt bad for the guy.
I was already familiar with most of monotone (and you can read their docs if you wanna learn more; it's pretty cool) but the presentation emphasized an aspect of monotone that I hadn't really considered. Since their model is much more about islands (computers) of source exchanging little bits of code, computer errors (network, security, disk) become more serious than the more traditional model of "keep the central repository backed up and on a secure/stable machine". Suppose Bob's disk has a bit error and that gets sent out to everyone, or I copy code from Eve who has tried to backdoor the project (as in the Linux kernel).
So the main point Nathaniel made that I hadn't really appreciated is the implicit security of identifying files and revisions by hashes. If you assume the hash function is secure (which all of this is predicated upon), then any modifications to a file cause the file's identity to change; any subsequent committed change ("revision") that involves that file is identified by a hash over data including that file's id, so the changedness bleeds into the revision; and any subsequent revisions that use that revision also mix in the hash... this cascades into the fact that every tiny bit change in any file bleeds all the way down into all subsequent revisions. And since everything else uses these hashes (like the netsync protocol), these sorts of errors become visible during normal operations. They mentioned that they've had users mail their list asking why monotone was complaining(?) about a file, ending with the user discovering their disk had introduced bit errors in the file.
Upon reflection, though, this protection is only against what I'd call "physical" attacks, and not the against the more "social" attack that I linked to above on the Linux kernel. If someone managed to steal a good committer's keys and stuff some bad revisions in, the only way anyone would notice is if they were reviewing all code they merge in.
There was a tech talk on monotone at work on Friday. It was kinda disastrous from a giving-talk standpoint, where the speaker was late (not his fault, I'm sure), then tried to project his slides with Linux (people always try this and it never works), then nobody had a laptop to lend him, then someone's Windows laptop decided to somehow die right as they were plugging it in, then someone's Mac laptop projected but the PDF-displayer program wasn't responding to key presses... I felt bad for the guy.
I was already familiar with most of monotone (and you can read their docs if you wanna learn more; it's pretty cool) but the presentation emphasized an aspect of monotone that I hadn't really considered. Since their model is much more about islands (computers) of source exchanging little bits of code, computer errors (network, security, disk) become more serious than the more traditional model of "keep the central repository backed up and on a secure/stable machine". Suppose Bob's disk has a bit error and that gets sent out to everyone, or I copy code from Eve who has tried to backdoor the project (as in the Linux kernel).
So the main point Nathaniel made that I hadn't really appreciated is the implicit security of identifying files and revisions by hashes. If you assume the hash function is secure (which all of this is predicated upon), then any modifications to a file cause the file's identity to change; any subsequent committed change ("revision") that involves that file is identified by a hash over data including that file's id, so the changedness bleeds into the revision; and any subsequent revisions that use that revision also mix in the hash... this cascades into the fact that every tiny bit change in any file bleeds all the way down into all subsequent revisions. And since everything else uses these hashes (like the netsync protocol), these sorts of errors become visible during normal operations. They mentioned that they've had users mail their list asking why monotone was complaining(?) about a file, ending with the user discovering their disk had introduced bit errors in the file.
Upon reflection, though, this protection is only against what I'd call "physical" attacks, and not the against the more "social" attack that I linked to above on the Linux kernel. If someone managed to steal a good committer's keys and stuff some bad revisions in, the only way anyone would notice is if they were reviewing all code they merge in.
One year during development, we noticed our "server" (the machine we all agree to sync up with periodically) crashing. When we investigated it, we found it was failing an integrity check while pulling file deltas off the disk. We told the admin that the disk was failing, and the machine went offline to have its disks replaced.
I punched a hole in my firewall, started a process serving from my laptop, and pasted a new URL into IRC. Everyone switched using it for a while. There are no trust issues with this because the server is just a switching point for the flow of certs and revs. All syncs continued to reuse the shared history, no interruption or retransmission of existing information.
When the "server" came back the database file that had been corrupt on the disk was still corrupt. So we deleted it. The next client who connected to the server restored the server's contents. As njs puts it: the code path called "restore from backup" is the same code path you use for all day-to-day work. So I closed my firewall port and life went back to normal.
We're not quite at the point of automatic failover against a pool of servers -- that's on the work list -- but it's very close already. We could probably also sync on top of a DHT (eDonkey?), if an enterprising SoC student wanted to try that.