November 16th, 2007

  • evan

content addressable

I recently read the Venti paper, which made me a little sad as all Plan9 stuff does because there were so many interesting ideas there that have all been forgotten.

Here's a thought: if links on the web were to SHA-1s instead of hostnames and paths, you could (a) be assured that the content on the other side of the link was always exactly what you linked to, and (b) reliably handle mirroring for free (anyone could replicate the data of a SHA-1 and you could verify it was a correct mirror). The only sticky part is resolving a SHA-1 to an IP that would return the data you requested, but there's a lot of research (DHTs, even DNS) in that space. The major weirdness is that it'd be impossible to "update" a page without everyone adjusting their links.

You could imagine creating links of the form sha1://[hex string], confident that in the future someone would come up with a way of resolving it. On the other hand, you can also be confident that SHA-1 collisions will be found, so maybe it's not so useful for archival at the web scale.