evan_tech

Previous Entry Share Next Entry
04:32 pm, 14 Jun 09

browser future

Since I think about browsers all day here are some thoughts on the future. As always, I'm biased towards the Linux ecosystem, so in some sense this post could be considered a response to the question that came up recently in the Gnome world, which was roughly "What next? We've written all this GUI software but nobody really cares anymore."

Background: HTML as the new display protocol

X11 is a remote display protocol. It can do some pretty cool tricks but it is also terrible for latent (aka "anything but a short wire") connections, as nearly everything you do involves round trips. To fix this you want to be able to put more smarts into the low-latency piece of hardware you're locally interacting with. Many people have tried to improve this but they've mostly failed. (NX is pretty sweet but there's still not enough smarts client-side.)

In the meantime the world has evolved, for better or for worse, into a remote display protocol called HTML. (There are many interesting reasons for that, but they are beyond the scope of this post.) From the X11 perspective, HTML involves a loadable DSL for interface description, another for sandboxed execution, a well-understood and cacheable networking protocol, and a install base larger than anything before seen. It's not the system anyone would have designed if we were starting over but it's what we've got, and it continues to evolve rapidly. Any future platform that wants their own API will need to support the web one anyway -- witness the annoyingness of developing for Android/iPhone versus the the Palm phone as an example of embracing this.

Most of the software I use today runs in part on my machine and in part on other peoples' machines, and cloud computing hyperbole aside I expect this difference to only get blurrier. I used to scoff at the concept of "web applications" but consider that even relatively mundane websites like LiveJournal really are software -- without LJ's servers doing it for me it, the generation of my friends page would necessarily happen in code on my machine. One of the reasons I decided to work on Chrome was when I realized the only windowed apps I used anymore were terminals and browsers, and terminals more or less dead-ended evolutionarily with the original xterm (I have much more to say about that, actually, but it's for another time).

Software like GWT can be thought of as a toolkit (like GTK/Qt) in this world; existing apps are written to something analagous to xcb but stuff like Wave shows how things might work in the future. (I don't especially want to write Java, but conceptually GWT is in the right place.

Thesis

So here's a thesis: Chrome comes with a few almost orthogonal goals behind it -- tabs as processes, some novel* UI bits -- which are a good step but can't go far enough due to the ecosystem it runs in.

People still think of browsers as applications, but their modern use is more as a runtime coupled with a random assortment of retrofitted system-level services. In part this is because there is no other route for third-party application developers on the Other Platforms to "upload" themselves into system-level services, which is where I think the free software world has an opportunity.

Processes

The elevation (or you could say "recognition") of web applications as things your operating system can help schedule and protect from one another is an old idea, and surely many browsers in the future will do as we've done, but even in today's browsers the web processes are still driven by a fragile "browser process" that is the single point of failure. For example, the network stack all lives in the privileged frontend process, so any bug in the HTTP stack sidesteps the sandboxing and takes out all tabs simultaneously.

Why do networking bugs affect the UI? Why do downloads stop when when I close the last visible window? Ultimately it's because you have a hodgepodge of services (including high-privilege inputs like access to the mouse and keyboard mixed with untrusted network inputs) in a single place: the browser process. Researchers have even called it the "browser kernel", in fact. (One sad reason for the single monolithic process is so people don't complain more when they look at the process list in task manager. Another is that application vendors like Google aren't allowed to provide system services, while OS vendors like Microsoft aren't incented (or sometimes allowed) to integrate such services.)

HTTP, viewed as system-wide remote display protocol, shouldn't be implemented at the application level. Why don't other tools like wget know how to use my cookies implicitly? Or even simpler: my proxy settings? Cookies are credentials much like my user id; Unix apps have a builtin for the former (getuid()) but the web equivalent is considered an application-level problem (wget --load-cookies ~/.mozilla/firefox/mumble/etc).

MacOS and Windows have approached this problem by making HTTP fetching and proxy settings system libraries; however, by virtue of being owned by the OS distributor and closed-source it leaves you at the mercy of their bugs and limitations. (We measured an improvement in page-load performance across our (opted-in-to-measurements) user base when we switched off the Windows HTTP stack.) And the ease of integration is still too lacking; your favorite programming language of the month is still poorly reimplementing HTTP without integrating with my cookie store.

What I'd like: a network service that would offload HTTP interaction from applications. Applications could hand off jobs like "download this file" to the service, which would live beyond a given browser session and allow me to centrally control bandwidth, caching, proxies, limit/resue TCP connections to a particular host, etc.

Tabs

Why do we have tabs? Because window management has failed. As one commenter put it, on a modern machine we now have running native apps in a strip of buttons across the bottom of the screen and a list of running web apps in a strip of buttons across the top of a special browser window. On my laptop, with a tiling window manager, I have a tab for each X11 app I'm running and then one tab containing the Chrome tab strip within it.

One spiffy Chrome feature is that when you tear a tab off a window you can drag it to hot zones on the screen that will relayout your browser windows side by side. But these sorts of behaviors don't belong as application behavior; they're only forced there because the window management the system provides isn't adequate. (I believe Windows 7 introduces some features like this.)

What I'd like: It'd be a pretty small patch to Chrome to change the call to "open in new tab": rather than opening a new tab, it'd open a new window along with some extra hints (in the the window manager sense) about the relationship between this new window and the one that already exists. (Browsers have interesting logic related to whether new tabs appear rightmost or next to the current one, based on the interaction you've done.) My window manager is then responsible for what it's always done: managing windows, and I get to use the existing set of keys for swapping tabs, closing windows, etc.

Conclusions

I think the free software world is uniquely situated for these kinds of changes, because only there do we have enough control over our computers to be able to switch out fundamental pieces like these. (I tried and gave up on OS X due to frustrations over window management.)

On the other hand, it's wasted breath to post about this since I don't have time to implement any of it. And I've been sitting on this post for a month so I may as well publish it before I forget about it.

* Before anyone starts on it: very few ideas are "new"; selection, combination, and execution are relevant.