So please forgive me for my intro-CS level of discussion, but I was curious about this the other day and enjoyed learning about it.
the algorithm
Here's an algorithm (credited to Hoare, not to be confused with
- Guess that the kth element is the kth smallest.
- Partition the array into smaller and greater halves than that guess.
- If the smaller half has k-1 elements, the guess was correct.
- Otherwise, the kth smallest element is in one of the two halves, so adjust k and recurse into the appropriate one.
observations
The simpler algorithm for this is, of course:
- Sort the array.
- Take the middle element.
Another way of interpreting this is like a depth-first build of a binary tree: the guess is your current node, the partition picks all the nodes that belong on the left and right branches, and the recursion continues building the tree on the branch.
implementation
This (or an equivalent function) is in C++, of course, as the function
nth_element
.Here's a Haskell implementation of it using lists. (Yeah, not quite the same, but it carries the idea.)
nth n l | n < ltlen = nth n lt | n > ltlen = nth (n-ltlen-1) gt | otherwise = pivot where (pivot,l') = select n l (lt, gt) = partition (< pivot) l' ltlen = length lt
partition
is a builtin; select
is a function that pulls out the nth element from a list.verifying
But how do I know it works? There's a neat library called "QuickCheck" that does randomized testing: you tell it what sorts of inputs your function expects and a way to test whether the output is correct, and it randomly generates inputs to verify proper functioning.
QuickCheck is one of those libraries that seems to work by magic: no code preprocessing or introspection, just (as always) fancy types.
For example, here's how I write the requirement for
nth
. I rely on the builtin sort
to give me the correct answer.prop_Nth n l = nth n l == sort l !! nThis is just a normal function definition, but you can read the first line as "the property Nth holds for inputs n and l, when ..."
(Here,
!!
is the nth-element-from-a-list operator. I assume it has a scary-looking name because it's O(n).)And then you check it with simply:
quickCheck prop_Nth
And QuickCheck can figure out (through type inference, again) that this particular property takes an integer and a list, and automatically generates test cases and runs them.
When I first ran this it found all sorts of places where it'd fail that would be easy to forget with a unit test: for example, when the input list is empty, or when n is not in the bounds of the list. So they provide all sorts of fancy higher-level operations that let you modify the random-input-generators or restrict the inputs, or even have it provide breakdowns of the sorts of inputs it gave ("32% three-element lists.") The manual is hokey-looking but quite impressive.
P.S
Here's an interesting thread about how laziness affects big-O.