Suppose you have a large (on-disk, too large for memory) hash table that you'd like to pull n random entries out of. You don't know the total number of entries, just that there are more than n. Assume you have an API like "
for key, value in table.stream(): ..." How do you do it?
Update: made it more clear, and added an answer here, after the cut and in white-on-white. The comments discuss the solution, so don't read 'em unless you want spoilers.
(It is instructive to solve it for n=1, and the hint is to use induction on big-N, the length of the data.)
But the answer that rjpower gave me recently sorta made me feel dumb (or, perhaps, too clever for my own good -- I was busily implmenting something more complicated): If your hashing function is good (which is sort of the point of a hash table), the keys are already in "random" order; just grab the first n and you're done.
(I suppose you could argue this depends on the organization of your hash table and what it means to be "random"...)