evan_tech

Previous Entry Share Next Entry
A few separate posts, all in the same area.

1) Most (all?) the distributed bug tracking software I've glanced at stores bugs in a directory, one file per bug. This seemed like poor design to me. I confirmed by showing Brad the output of ls on one; his full response was "doesn't scale" and turning back to what he was working on.



2) Having thought more about the relationship between code and bug state, I have concluded I was thinking about it the wrong way. Going back and reviewing your comments, I see a bunch of you figured this out before me. Here's the critical piece I was missing. Code has history, which is tracked by the graph of related versions. Bug state both refers to the code history and also has its own history, in that new bugs are opened and old ones are closed. Those two histories related to bugs are not the same: even when examining old code, you generally care about the newest bug state. (This is why most modern bug systems only let you use the newest bug state; making changes to it permanently clobbers the old state. However, note that most do care about showing you the history of modifications to a bug; the interesting view is the most recent copy of the bug's entire history.)

As Aristotle and Lee pointed out on my older post, connecting the code history graph and the bug state could be modeled as annotations pointing at commits. The state of bugs present in a given version is the collection of all bugs states that have been attached to an ancestor of that version. This means discovering a bug in a previous release "infects" (to use his term, which is a good one) all branches derived from that release, and a given branch is only fixed once it merges the code that fixes the bug. (Making that work efficiently is an exercise for the reader; I have some ideas that aren't worth sharing yet.)



3) Part of the reason I got thinking about all of this because I wanted a separate feature: a command-line interface to bug tracking. I hate using web apps both because web sites become inaccessible, get slow, or go down (a problem addressed by making it distributed) and just because I hate clicking around on web forms (a website can't, for example, query my current checkout for which branch I'm claiming the bug is fixed on). You could make a CLI-based interface to Trac -- maybe one exists already -- and it would at least address the second half of that.

At a superficial level, the command-line problem isn't really at all same thing as a distributed system. But there also is a connection at a deep level. I like to say (and here by "say" I mean "think" because nobody ever wants to listen to me jabber on about this stuff) that even centralized projects have distributed branches every time someone edits a file in their own checkout; it's just that the tools we have for those branches are weaker than the tools typically used for "real" branches. On most systems you typically can't record your changes until you've verified they would merge cleanly with the master branch (though monotone/mercurial/fossil fix this implicitly and git does if you're using the proper workflow); on most systems you can't examine what happened upstream in the same way you examine changes that happened locally.

This same problem -- that forks happen on every checkout -- is true with any web-based database; it's just that when you're using a website you tend to commit more frequently so the forks don't get a chance to conflict. And for that reason the tools for managing conflicts are usually pretty weak, as anyone who's encountered a conflict on a bug tracker has probably experienced (my impression is that most just say "click back and type in what you were saying again"). My favorite bad conflict-resolution system is probably Google Docs, which, upon a conflict, pops up a window saying something to the effect of, "This paragraph was edited while you were editing it. Here is your text; please copy and paste it back into the document and figure out how to fix it."

So back to the command line: you need to solve this conflict problem anyway for a distributed system to work, but you also probably need to at least improve it for a command-line-editable centralized system to work, because with a command-line app you don't really get a "back button". You could implement back-button-like functionality, but if you're going to implement new functionality to handle this case, perhaps you could implement a more sane model instead.