evan_tech

Previous Entry Share Next Entry
I saw a great talk today by Rada Mihalcea on some of her research. The gist is interpreting text as a graph and applying PageRank*-like computations. Depending on what your graph represents, different applications include word sense disambiguation (nodes: possible senses (from wordnet) of words, edges: connections between senses), text summarization (nodes: sentences, edges: weighted by sentence similarity), and more.

Her work outperforms the current best-performing algorithms, even existing supervised algorithms while this approach is unsupervised. Awesome.
[update 19-oct-05] A note for gawkers coming from the unofficial Google weblog: this last paragraph was merely restating her conclusions and I haven't vetted them myself, nor was I involved in the grant-giving process. After this post, I read a few of her papers and then mostly forgot about the subject.


Rada referenced an algorithm by Jon Kleinberg, one of researchers I try to follow more closely. I ought to make a reading list for y'all; there are fewer than ten I've found who consistently produce introducing work.


Additionally, clevercs just updated a bunch of posts, and there's a bunch of promising-looking stuff going on in there. In particular they linked to Recovering Device Drivers, which just got Best Paper at OSDI '04. That work was done at the University of Washington! Hank Levy taught my operating systems class, and I think Michael Swift is the guy I had a decently long talk with at a poster session a year or so ago.
Regarding OSDI '04: The other Best Paper is on model checking, which I know little about but I think goes back to language theory. (And also, if you're looking to read any of these papers, check Scholar or mail me; I may be able to find a link for you.)


* (TM), heh. I didn't realize it until after I started here that Larry named the algorithm after himself. It was a nice coincidence that it applies to web pages, too.