Previous Entry Share Next Entry
09:55 am, 11 May 05

xss and conventions

Joel Spolsky writes about conventions and uses defending against XSS as an example. He's more verbose than usual, so the paraphrased summary is this: If you prefix all variables holding user-provided strings with 'us' (unsafe) and all checked strings with 's' (safe) your errors will become apparent, as you never write code like Output(usFoo). In fact, you can rename your functions to be prefixed with the sort of data they accept to make it even more apparent.

I think it's well-intended but not a good example.
  1. As I've written before, many programming conventions come from limitations in your language's type system, and this one in particular is just screaming for it. Here, you want GetStringFromUser() to return an "unsafe" string, and OutputString() to only accept "safe" strings. If these were separate datatypes (along with a no-op function ThisStringIsNowSafe() that goes from unsafe to safe while documenting the programmer's knowledge) you couldn't make these mistakes.

    This sort of approach is even done by dynamically-typed languages: Perl's "taint" mode is a problem-specific hack for this. (Briefly, you add a flag to the interpreter(!), then all user input variables become "tainted", and applying a regular expression with grouping(!!) untaints the string.) Ruby has a similar but slightly more general approach. And the Amrita HTML templating system for Ruby escapes all inputs to templates, which the programmer can circumvent by calling SanitizedString on the strings.

    I imagine that SantizedString wraps an object around the string and that the template expansion knows to look for the object. In fact, this sort of trick (making a new type that's the same as the old but has to be constructed explicitly) is common enough in Haskell there's a keyword for it that tells the compiler that the wrapping is only for typing and can be dropped when compiling (after it's checked to be typesafe)*.
  2. But this points out a broader point: real websites don't/shouldn't have series of print statements to construct web pages, they just/should provide arguments to a template. This not only allows for programmer sanity with XSS (basically, everything's escaped and if you want otherwise you have to think about it / write more code), it also keeps consistency across pages, makes translation easier, etc. etc. (all the typical advantages of separating code from data).
* I have no idea why the compiler can't just figure this out on its own. It's uncharacteristic of Haskell.