December 1st, 2007

  • evan

gitweb/perl speed

gitweb is nice but slow -- about 400ms to produce the index page for a directory that contains only one project. (Not really worth running mod_perl for, y'know?) By unfair contrast, Google's memory-resident server provides me search results in about 150ms, including HTTP transfer time.

About 200ms of gitweb's runtime seems to be Perl startup time, which I gauged by measuring a simple script that just imports the same things gitweb does and exits immediately. It opens 33 files from directories with "perl" in their name, mostly .pms but a couple .sos as well. Is there a way to "prelink" a perl program?

Perl spends another 140ms around calls to read() on the gitweb script. I guess it's probably parsing at the same time? (The script is 160k(!) of perl code.) I found you can run perlcc -B foo.cgi to produce bytecode, which speeds things up by a bit over 100ms. (Without the -B flag it tries to compile via C and aborts with some internal failure message. The manpage for the package has lots of scary-looking "do not use this for production" messages.) But the module loads are still external, and continue to dominate the startup time.
  • evan

rainbow tables

I somehow got the urge to play with rainbow tables yesterday. I knew the idea, but it turns out (like always) that there's a difference between knowing the idea and understanding it enough to be able to implement it. It turns out that the basic idea (of tables of iterated hashes) is due to Hellman (of Diffie- fame) in 1980, but it wasn't until 2003 that a newer refinement redubbed it "rainbow".

(This paragraph presupposes you get the general idea -- go read wikipedia if you don't, as its exposition is way better than whatever sources I had read.) The rainbow twist was using a different reduction function for each iteration of a chain, as that reduces collisions between two chains. That makes lookup more complicated, but it's more like O(n^2) in the chain length and chains are relatively short. With that in mind, I think the graphic provided on wikipedia has the coloring along the wrong axis; really, the differing reduction functions ought to be the stripes of the rainbow, as they're fixed and a known set for a given table containing any number of chains.

Though I know how and why it works, it's still a bit creepy to see it in action:
$ echo -n hello | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d  -
$ time ./rainbow data aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
  chains: 65536 of length 512 = 33554k hashes (with repeats)
  keys: keys of length 5 from 26 letters = 11881k outputs
pairtable loaded; 65539 entries of 11 bytes at offset 39
reversed: hello

real    0m0.151s
user    0m0.152s
sys     0m0.000s

Anyway, you can browse source and check it out with git clone if you're really curious. Since I started writing this post I glanced at the source of the project that wikipedia links to and found it pretty opaque (of course, it's much more industrial-strength than my hobbyist project), so maybe you'll find it interesting to learn from. I could easily still have my implementation wrong, though...