Evan Martin (evan) wrote in evan_tech,
Evan Martin


I don't know hardly anything about compression, but a friend pointed me at the algorithm presented in A Block-sorting Lossless Data Compression Algorithm -- also known as the bzip2 algorithm -- and I was blown away. The basic approach is a simple pre-transformation of the data that makes traditional compression work better, but the transformation itself is sorta insane: you wouldn't expect it to work. The paper has an easy-to-follow example, too, so you have no excuse not to look at it. (The encoding is pretty simple, and takes only one page (page 2) of the paper; what's crazy is that you can get the original data back from it.)

It also turns out Mike Burrows, the first author of the paper ("the 'b' in 'bzip', I was told), sits about 20 feet from me. Tonight I introduced myself and we chatted about it for a bit. He added some amusing anecdotes. Such as:

The paper claims it has performance comparable to Lempel-Ziv while achieving better compression. This was true at the time, but Mike said that the final deshuffling step has poor cache performance so as computers have increased in speed and cache dependency, this algorithm has fallen behind gzip. At a recent compression conference (celebrating the 10th anniversary of the publishing of the paper, though he pointed it was actually discovered in 1978 -- the number they have in the paper is wrong) he proposed some methods to speed it up by a factor of two. But the compression folks were only interested in O() improvements, and since somebody had already proved it was O(n) (with a large constant) nobody was interested.

Mike seems like a generally hax0ry fellow, recently getting credit for writing some tight assembly for an upcoming Google project.

[Obligatory programming-language-related footnote: the author of the bzip2 library works on ghc, the Haskell compiler.]

  • livejournal kids

    Neat image from Jack Dorsey. Every so often someone will ask me about Twitter and I'll dig up a a random day from Brad's LJ in 1999 and talk about…

  • ljrb release 0.3.1

    LiveJournal Ruby module update: This release won't die when the "useragent" property is present in an entry. I've also added support for passing…

  • ljrb 0.3.0

    ljrb 0.3.0: This release adds support for the "current_location" field and fetching friendofs in the same request as fetching friends. There's also…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.