Evan Martin (evan) wrote in evan_tech,
Evan Martin

parse as you type, 2

One of the reasons I've been thinking about "parse while you type" is from my experience on a simpler, but quite similar problem: "spell check while you type".

The reduction here is that all parsing is local. The algorithm is simple: whenever you add or remove text, scan forwards and backwards to the ends of the affected words, feed them to the spell checker, and then mark up the text as necessary.

But good spell checker would (in theory: I don't know if any open source ones do) actually need context to properly spell check. For a language like Japanese or Thai, you don't have spaces to delimit words. (There are even difficulties with simpler languages.)
Worse, discovering homophone errors requires(?) syntatic parsing, which is sorta like compiling but about a million times harder. (There's a subset of linguistics related to "discourse analysis"—that is, analysis across sentence boundaries—and the potential complexity of that terrifies me.)

I know Microsoft's Word does this to some extent because it underlines questionable structures with green. I can't, however, imagine how far they go or how they determined how far they could go.

(Now that I look at that Pango bug again, I really ought to fix it myself. That'd be a fun and worthwhile project, and Noah Levitt even provided a test case...
NO! Bad Evan! Finish your existing projects first.)

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • google ime

    Japanophiles might be interested to learn that Google released a Japanese IME. IME is the sort of NLP problem that Google is nearly uniquely…

  • ghc llvm

    I read this thesis on an LLVM backend for GHC, primarily because I was curious to learn more about GHC internals. The thesis serves well as an…

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.