Suppose you have a large (on-disk, too large for memory) hash table that you'd like to pull

*n*random entries out of. You don't know the total number of entries, just that there are more than

*n*. Assume you have an API like "

`for key, value in table.stream(): ...`

" How do you do it?Update: made it more clear, and added an answer here, after the cut and in white-on-white. The comments discuss the solution, so don't read 'em unless you want spoilers.

This is related to a common interview question, and one that I think is sorta lame. You could maybe expect an interviewee to solve it with some probability and some induction when n=1, but it feels to me like it's a trick question.

(It is instructive to solve it for n=1, and the hint is to use induction on big-N, the length of the data.)

But the answer that

(I suppose you could argue this depends on the organization of your hash table and what it means to be "random"...)

(It is instructive to solve it for n=1, and the hint is to use induction on big-N, the length of the data.)

But the answer that

**rjpower**gave me recently sorta made me feel dumb (or, perhaps, too clever for my own good -- I was busily implmenting something more complicated): If your hashing function is good (which is sort of the point of a hash table), the keys are already in "random" order; just grab the first*n*and you're done.(I suppose you could argue this depends on the organization of your hash table and what it means to be "random"...)

## Error

You must follow the Privacy Policy and Google Terms of use.