May 10th, 2005

  • evan

greasemonkey index

People (myself included) have requested a way for Greasemonkey to tell you whether it has scripts that apply to the page you're currently viewing.

One proposal was that it could send the URL you're at to a server that could look it up. That's consistent with the XmlHttpRequest-style of Greasemonkey but terribly broken:
  • for privacy reasons, you don't want to send every URL you visit to a central server;
  • a central server couldn't handle the load of a bunch of people hitting it with every page.

It seems the right solution is to let clients periodically download an index.

Greasemonkey scripts provide regular expressions that match the URL when the script could run. So how would you build an index that could quickly find which script out of many would match? The straightforward way is to stick every regex in an array and then iterate through it, but that's O(n) and n's increasing rapidly.

Random thought: I wonder if it'd be worth building a tree of the regexes. At the top, a huge OR of all script regexes, and then two branches, each containing half. That shrinks the O() but not to log(n) 'cause you don't know which branch to take at each level... but at least most of the processing is happening inside the regex engine and not in Javascript. Or maybe you could wrap each one in parenthesis in the huge OR and check which group number matched...
  • evan

some papers i've been reading recently

  • Dan Klein and Chris Manning, "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency"
  • Kristina Toutanova, Christopher D. Manning, and Andrew Y. Ng., "Learning Random Walk Models for Inducing Word Dependency Distributions."
  • Andy Cheadle, Tony Field, Simon Marlow, Simon Peyton Jones, Lyndon While, "Non-stop Haskell"
  • and looking up stuff I've read before in papers I've read by Mark Newman and Philip Wadler.
The Haskell one is the only really evan_tech-related one, and it's pretty neat.

It's about garbage collection. One known approach to incremental copying garbage collection is basically whenever you access a pointer you check whether it points into the old region and if so you move it to the new region. This turns out to work well for Haskell because accessing data always goes through a method call anyway, so the check+copy can sorta be inserted in recently-moved objects to wrap around the method call.

(Of course, it's more complicated than that.)