evan_tech

Previous Entry Share Next Entry
11:02 am, 24 Aug 03

parse as you type, 2

One of the reasons I've been thinking about "parse while you type" is from my experience on a simpler, but quite similar problem: "spell check while you type".

The reduction here is that all parsing is local. The algorithm is simple: whenever you add or remove text, scan forwards and backwards to the ends of the affected words, feed them to the spell checker, and then mark up the text as necessary.

But good spell checker would (in theory: I don't know if any open source ones do) actually need context to properly spell check. For a language like Japanese or Thai, you don't have spaces to delimit words. (There are even difficulties with simpler languages.)
Worse, discovering homophone errors requires(?) syntatic parsing, which is sorta like compiling but about a million times harder. (There's a subset of linguistics related to "discourse analysis"—that is, analysis across sentence boundaries—and the potential complexity of that terrifies me.)

I know Microsoft's Word does this to some extent because it underlines questionable structures with green. I can't, however, imagine how far they go or how they determined how far they could go.

(Now that I look at that Pango bug again, I really ought to fix it myself. That'd be a fun and worthwhile project, and Noah Levitt even provided a test case...
NO! Bad Evan! Finish your existing projects first.)