Evan Martin (evan) wrote in evan_tech,
Evan Martin

lj analyses

  • The LiveJournal interests graph is odd: each user is linked to a bunch of interests, and each interest is linked to a bunch of users (in this representation, all links are bidirectional). No users are directly linked and no interests are directly linked.

    Two subset interpretations:
    1. Make the nodes exclusively users and the links to other users based on shared interests. Now you can find a "what do [a] and [b] have in common?" based on shortest paths.

    2. Make the nodes exclusively interests and the links to other interests based on users who include both interests. (This graph may be more useful than the one above if the number of interests is below 20k or so, because then we can do full graph computations on a normal computer.) Make the link cost a function (inverse) of the number of users who include both interests. Now you can find the diameter (and the maximally-distant links) as the "most dissimilar" interests; users who intentionally pick dissimilar interests with the intention of messing up the graph will only add a high-cost link.

    Figuring out who in general has "interests in common" with a given user is a different problem. I recall some stuff from my stats class about relative information (a common interest like "music" is much less meaningful than a specific one like "debian") but I don't remember the details of how it works. :(

  • The shortest-path problem between two users or interests is much easier to solve if you're only computing for two specific users. A breadth-first search from both ends at the same time ought to be pretty quick; I imagine this is how LJ Connect actually works.

  • I'd really like to run some imaginary "flow" algorithm and figure out who is the "most popular" (high in-degree in friends links isn't as meaningful as "valuable" links-- think Google's pagerank), but I don't know anything about how those work.

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • more on synonyms

    Before Chrome, this is what I spent much of my Google career working on.

  • chrome

    I guess our project is in the news today... I'm glad that I finally can tell people what I work on at Google, and I'm glad I get to work primarily…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.