Suppose you have a large (on-disk, too large for memory) hash table that you'd like to pull n random entries out of. You don't know the total number of entries, just that there are more than n. Assume you have an API like "
for key, value in table.stream(): ...
" How do you do it?Update: made it more clear, and added an answer here, after the cut and in white-on-white. The comments discuss the solution, so don't read 'em unless you want spoilers.
(It is instructive to solve it for n=1, and the hint is to use induction on big-N, the length of the data.)
But the answer that
(I suppose you could argue this depends on the organization of your hash table and what it means to be "random"...)