September 26th, 2003

  • evan


So I was trying to figure out why was serving pages with a charset=iso-8859-1 HTTP header, and I eventually found it. Kinda interesting, really:

The problem is that the default charset is undefined, which means on pages with an unspecified charset there are subtle cross-site-scripting bugs where the server thinks it has properly escaped HTML but the client doesn’t. (Think UTF-16, where < has a different representation than the single byte UTF-8 uses.)

So Debian’s Apache is configured by default to add that Content-Type header. And that has the final effect that my (UTF-8, in theory) pages aren’t being served with the right charset.

(Credit: All of this is from this LJ post, which I found through Google.)