11:22 am, 12 Apr 04

### graphjoin revisited

I thought I'd take another stab at this graphjoin problem. I went to the local cafe only to find I had a really old copy of the code on my laptop.

So I rewrote it, but in C this time, in the way the author of the paper suggested (using an adjacency matrix instead of an adjacency list).

I have friends data for: barbee brad bram dl ellenlouise erik evan faith gaal lisa meena mobley patrick petit_chou toast whitaker xaotica. I figured that'd have some interesting subgroups because they're different circles of friends of mine.

To run on my 400mhz Powerbook:

Way slow.

gprof says:

Which is not at all what I expected. matrix_step is the main iteration of the algorithm: it basically iterates matrix_dq over all edges and picks the largest one.

And matrix_dq (which is called a whole lot of times) is the part that moves into floating point and does some divides... but it's not the bottleneck. Instead, it's the iteration over the (rather sparse) graph. But I guess with n people (1105 in this case), you merge somewhere around n times (I end up with four groups), so it's O(n^3) iterations.

Then I rewrote it the way I did it in OCaml, except I could mutate the lists: 0.35s. Kick ass.

So I rewrote it, but in C this time, in the way the author of the paper suggested (using an adjacency matrix instead of an adjacency list).

I have friends data for: barbee brad bram dl ellenlouise erik evan faith gaal lisa meena mobley patrick petit_chou toast whitaker xaotica. I figured that'd have some interesting subgroups because they're different circles of friends of mine.

To run on my 400mhz Powerbook:

`./friends 23.79s user 0.14s system 92% cpu 25.800 total`Way slow.

gprof says:

% cumulative self self total time seconds seconds calls s/call s/call name 93.33 216.86 216.86 1101 0.20 0.21 matrix_step 3.71 225.47 8.62 1100 0.01 0.01 merge 1.60 229.18 3.71 730718 0.00 0.00 matrix_dq 0.92 231.32 2.14 1948423 0.00 0.00 lookup

Which is not at all what I expected. matrix_step is the main iteration of the algorithm: it basically iterates matrix_dq over all edges and picks the largest one.

And matrix_dq (which is called a whole lot of times) is the part that moves into floating point and does some divides... but it's not the bottleneck. Instead, it's the iteration over the (rather sparse) graph. But I guess with n people (1105 in this case), you merge somewhere around n times (I end up with four groups), so it's O(n^3) iterations.

Then I rewrote it the way I did it in OCaml, except I could mutate the lists: 0.35s. Kick ass.

bradevan## Re: so if its sparse

Mutating in general is "changing a value that you've already set"; as in:i = i + 1

It's generally considered something worth avoiding for code clarity purposes.

Specifically here I meant modifing list links (as opposed to the functional approach of creating new lists).

htang## Re: so if its sparse

Well, if he's storing ints, then depending on how he's iterating over the matrix (row-major vs column-major), he could be getting really good locality out of his cache. Hard to say without the full gprof output, though ...On an unrelated note, I've always loved showing this slide to non-techies, because it's next to impossible to explain to them what "orders of magnitude" means.

evan## Re: so if its sparse

That slide is neat!33mhz