Evan Martin (evan) wrote in evan_tech,
Evan Martin

parse as you type, 2

One of the reasons I've been thinking about "parse while you type" is from my experience on a simpler, but quite similar problem: "spell check while you type".

The reduction here is that all parsing is local. The algorithm is simple: whenever you add or remove text, scan forwards and backwards to the ends of the affected words, feed them to the spell checker, and then mark up the text as necessary.

But good spell checker would (in theory: I don't know if any open source ones do) actually need context to properly spell check. For a language like Japanese or Thai, you don't have spaces to delimit words. (There are even difficulties with simpler languages.)
Worse, discovering homophone errors requires(?) syntatic parsing, which is sorta like compiling but about a million times harder. (There's a subset of linguistics related to "discourse analysis"—that is, analysis across sentence boundaries—and the potential complexity of that terrifies me.)

I know Microsoft's Word does this to some extent because it underlines questionable structures with green. I can't, however, imagine how far they go or how they determined how far they could go.

(Now that I look at that Pango bug again, I really ought to fix it myself. That'd be a fun and worthwhile project, and Noah Levitt even provided a test case...
NO! Bad Evan! Finish your existing projects first.)

  • blog moved

    As described elsewhere, I've quit LiveJournal. If you're interested in my continuing posts, you should look at one of these (each contains feed…

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • treemaps

    I finally wrote up my recent adventures in treemapping, complete with nifty clickable visualizations.

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.