October 1st, 2004

  • evan

implementing social networks

Recently, I've been thinking:
These community-finding algorithms are trying to find a division of a graph that maximizes the "modularity", Q. The general idea is computing the fraction of edges that are within a community versus edges in general, and assign communities to maximize this number. (The probability of there being a connection between two arbitrary vertices in a given graph is proportional to their degrees, so you end up subtracting off this probability when considering their connection, so that more unlikely connections are weighted heavier.) Positive modularity indicates there is community structure beyond what you'd expect from a random graph.

The goal of these algorithms has been to find interesting subgroups within social networks. But what they're basically computing is a way to divvy up the graph so that you minimize edges between the pieces. This may be useful for implementing social networks, when considering which computing clusters to place users on; perhaps by maximizing their community you could minimize inter-cluster communication.

Unfortunately, this isn't fully accurate. For performance purposes you don't care how "strong" the connection (see the parenthetical remark in the first paragraph) between two users are so much as you care how often they're exercising their links (loading friends pages, browsing their orkut friends). Additionally, I'm not sure if minimizing inter-cluster communication is actually the goal: if network bandwidth is not the bottleneck, then actually spreading requests out across multiple clusters allows more parallelization for each request. (Any hints, Brad?)