evan_tech

Previous Entry Share Next Entry
12:22 am, 11 May 06

sorry, specification!

Tonight I was talking with a coworker about search quality, and I said something like: "There are so many cases of where there's the right way to do something, and then there's the way that works around all the bugs, and they always have to do the latter." Underneath I guess there's a philosophical issue (blah blah blah Postel's law) but I think I prefer to look at it as being descriptive instead of prescriptive. (Which doesn't mean I necessarily like it.)

One of my (least) favorites is page encoding: since pretty much nobody can get their webserver headers right, and many people can't even get the encoding explicitly stated in the page itself right (see also: Windows-1252 in purportedly-ISO-8859-1 XML feeds), browsers basically just ignore all those and guess at what encoding the page really is. And then, since people test their pages with their browsers, to get the best coverage Google has to match the browsers' detection behaviors. Sorry, HTTP and HTML encoding specifications!

Another one is displaying XHTML in search results: it doesn't do much good to show a search result with a link directly to an XHTML document if Internet Explorer is just gonna give you a download box. Sorry, XHTML!

So it's nice to see that Matt and them have discovered(?) the Google bug that's been cropping up in my work lately, too: briefly, it appears that when using a persistent connection to a single host but varying the Host: header (like to a single site with virtual hosts), some hosts get confused and feed you the wrong pages. I dunno what the fix is, but I'd guess they have to make separate connections for different hosts on the same IP. Sorry, persistent connections!