December 1st, 2003

  • evan

lj analyses

  • The LiveJournal interests graph is odd: each user is linked to a bunch of interests, and each interest is linked to a bunch of users (in this representation, all links are bidirectional). No users are directly linked and no interests are directly linked.

    Two subset interpretations:
    1. Make the nodes exclusively users and the links to other users based on shared interests. Now you can find a "what do [a] and [b] have in common?" based on shortest paths.

    2. Make the nodes exclusively interests and the links to other interests based on users who include both interests. (This graph may be more useful than the one above if the number of interests is below 20k or so, because then we can do full graph computations on a normal computer.) Make the link cost a function (inverse) of the number of users who include both interests. Now you can find the diameter (and the maximally-distant links) as the "most dissimilar" interests; users who intentionally pick dissimilar interests with the intention of messing up the graph will only add a high-cost link.

    Figuring out who in general has "interests in common" with a given user is a different problem. I recall some stuff from my stats class about relative information (a common interest like "music" is much less meaningful than a specific one like "debian") but I don't remember the details of how it works. :(

  • The shortest-path problem between two users or interests is much easier to solve if you're only computing for two specific users. A breadth-first search from both ends at the same time ought to be pretty quick; I imagine this is how LJ Connect actually works.

  • I'd really like to run some imaginary "flow" algorithm and figure out who is the "most popular" (high in-degree in friends links isn't as meaningful as "valuable" links-- think Google's pagerank), but I don't know anything about how those work.
  • evan


JoelOnSoftware's latest article on Craftsmanship leaves me oddly vindicated. It discusses (in a somewhat self-important way, I might add) the pain he took to make a lengthy "load file" dialog include a progress meter.

I wasted spent a significant amount of time designing LogJam's "network processing" dialog because I hate it when apps block. (There's a picture of it on the LogJam tour page.) It's gone through multiple incarnations, from threads (yikes) to pipes between multiple processes (which Joel also picked), and then to even more abstraction for lengthy processes that span multiple network requests (the journal backup dialog). It even includes some fancy SVG support so that the animation rotates smoothly (it looks rad and is entirely useless, I assure you).

98% of the time you never see the dialog, because stuff happens too quickly. The other 2% it's wonderful to be able to watch a transfer go (loading your huge friends list over a modem) or cancel a request (LiveJournal hiccup?). It's probably the part of the program I'm most proud of, even though it (like all of it) kinda sucks.

PS: It's weird to see the needless jabs at Unix users at the end of that article. I see it a lot from Windows/Mac devotees (though less recently from the Mac-o-philes, because now it's always "elegant Unix underpinnings" etc. Though they still manage to stab at GUIs and such). Joel does it a lot, or at least enough that I notice it, but forgive him because he's frustrated with dealing with such an frustrating and useless operating system. :P
(PPS: I have written a lot of Windows code, even some professionally, so I am somewhat justified in saying that.)
  • evan

more ocaml, more to read

Tonight's project for better OCaml was Go Scoring. It's simplified by not needing to find dead stones, but it still was way too hard for me to solve. I saw how it would be solved pretty quickly, but I couldn't figure out the way to make it happen for a few hours. I wouldn't win any programming contests with that speed. :(

Collapse )

I realized I can search my history for "pdf" to find the URLs for stuff I've been reading.

Today: some basics on type inference. Doesn't look so bad so far! Constraint Generation, Type Inference, Implicit Polymorphism.
And a bit of Small-World Phenomenon (while I was bored in semantics class).
  • Current Mood
    way too many posts today