09:23 pm, 8 Oct 08
io
A nice post about the many hoops Chromium jumps through for performance -- not in the throughput sense, but latency.
From build.chromium.org, you can click the "perf" link in the upper left to see some of the metrics we track over time. This drill-down into startup shows around r2800 we regressed startup by about 3ms, and you'd be surprised to see how much effort has been put into getting those three milliseconds back. (Normally you'd just revert the commit, but that one in particular contained about six months of work on WebKit. Large jumps on other perf graphs are related to that as well.)
Linux apparently has no way to do non-blocking IO to the disk. (Many people hear that and begin with "What about the aio/myfavoritescheme functions?" but the answer as far as I can tell is that those don't work.) It's not too hard to put disk operations on yet another thread, but it's unclear to me which parameters of a thread pool would help -- there's a trade-off between thrashing between multiple spots on the disk vs getting more data in front of the disk scheduling algorithms, etc. Anyone have anything smart to say about it?
From build.chromium.org, you can click the "perf" link in the upper left to see some of the metrics we track over time. This drill-down into startup shows around r2800 we regressed startup by about 3ms, and you'd be surprised to see how much effort has been put into getting those three milliseconds back. (Normally you'd just revert the commit, but that one in particular contained about six months of work on WebKit. Large jumps on other perf graphs are related to that as well.)
Linux apparently has no way to do non-blocking IO to the disk. (Many people hear that and begin with "What about the aio/myfavoritescheme functions?" but the answer as far as I can tell is that those don't work.) It's not too hard to put disk operations on yet another thread, but it's unclear to me which parameters of a thread pool would help -- there's a trade-off between thrashing between multiple spots on the disk vs getting more data in front of the disk scheduling algorithms, etc. Anyone have anything smart to say about it?
But in response to the OP -- yeah, all the IO API implementations still suck. The only clever thing I've come up with is to mincore() and madavise(MADV_WILLNEED), but that doesn't necessarily even submit the IO -- you have to do an actual page touch in another thread (or process) to be sure the IO is submitted.
For many users (we don't collect stats, maybe we should) their start page is the new tab page, which is local, or about:blank. But even in the network-based-start-page case, moving the point at which the network stack is initialized shouldn't affect the end-to-end latency from start to page load finished. If you want to (for example) load a different page than your start page, I think it's better to have the entry box in front of you earlier (the latency of your typing will be the dominating factor there).
As for feeling sluggish due to lazy loading, the focus on doing no IO from the UI thread means that it should always be responsive under load unless it's CPU-starved or swapped out. (I haven't measured it, but I'd guess much of startup latency is due to waiting on the disk.)
Thus, on demand loading must feel subjectively faster since total wait time will be longer if data is lazily loaded. I'm curious if there is data to support that conjecture or if some of these perf tradeoffs are at least partly based on engineering wankery. :)
I also seem to recall IE and Mozilla at one point developing "loaders" to pre-load most of the app in an effort to make them feel more responsive. And certainly windows (and to some extent OS X) has shown me how frustrating it is to get halfway through an action before being blocked waiting for resources to load. Some of that is undoubtedly mitigated by doing no I/O in the main loop, but no amount of threading can hide the loading of a resource that is needed to continue.
O_DIRECT
Do not, under any circumstances, use O_DIRECT under linux. I've lost days of my life to it. Linus also does not approve:``
The whole notion of "direct IO" is totally braindamaged. Just say no.
This is your brain: O
This is your brain on O_DIRECT: .
Any questions?
''
takes all kinds
Not related in any practical sense, but I thought this was pretty funny.