12:16 pm, 5 Aug 06

### hoare's find algorithm, quickcheck

I generally find the study of algorithms at the low, which-sort()-is-the-fastest level to be uninteresting. I have no excuse, really: certain things appeal to certain people and this one just isn't for me. (Also, in the work I do, the real speed differences come from the larger rearrangements that the premature-optimization-haters always claim. Stuff like: which machines carry which data, getting rid of O(n^2), etc.)

So please forgive me for my intro-CS level of discussion, but I was curious about this the other day and enjoyed learning about it.

Here's an algorithm (credited to Hoare, not to be confused with

The simpler algorithm for this is, of course:

Another way of interpreting this is like a depth-first build of a binary tree: the guess is your current node, the partition picks all the nodes that belong on the left and right branches, and the recursion continues building the tree on the branch.

This (or an equivalent function) is in C++, of course, as the function

Here's a Haskell implementation of it using lists. (Yeah, not quite the same, but it carries the idea.)

But how do I know it works? There's a neat library called "QuickCheck" that does randomized testing: you tell it what sorts of inputs your function expects and a way to test whether the output is correct, and it randomly generates inputs to verify proper functioning.

QuickCheck is one of those libraries that seems to work by magic: no code preprocessing or introspection, just (as always) fancy types.

For example, here's how I write the requirement for

(Here,

And then you check it with simply:

And QuickCheck can figure out (through type inference, again) that this particular property takes an integer and a list, and automatically generates test cases and runs them.

When I first ran this it found all sorts of places where it'd fail that would be easy to forget with a unit test: for example, when the input list is empty, or when n is not in the bounds of the list. So they provide all sorts of fancy higher-level operations that let you modify the random-input-generators or restrict the inputs, or even have it provide breakdowns of the sorts of inputs it gave ("32% three-element lists.") The manual is hokey-looking but quite impressive.

Here's an interesting thread about how laziness affects big-O.

So please forgive me for my intro-CS level of discussion, but I was curious about this the other day and enjoyed learning about it.

**the algorithm**Here's an algorithm (credited to Hoare, not to be confused with

**graydon**) for finding the median of an unsorted array. Actually, it more generally gives you the kth smallest element, and getting the median is just k = length / 2.- Guess that the kth element is the kth smallest.
- Partition the array into smaller and greater halves than that guess.
- If the smaller half has k-1 elements, the guess was correct.
- Otherwise, the kth smallest element is in one of the two halves, so adjust k and recurse into the appropriate one.

**observations**The simpler algorithm for this is, of course:

- Sort the array.
- Take the middle element.

*is*quicksort, except that you only recurse into the subdivisions that have the one value you care about. (I guess quicksort was invented by the same fellow, so maybe this isn't a surprise.)Another way of interpreting this is like a depth-first build of a binary tree: the guess is your current node, the partition picks all the nodes that belong on the left and right branches, and the recursion continues building the tree on the branch.

**implementation**This (or an equivalent function) is in C++, of course, as the function

`nth_element`

.Here's a Haskell implementation of it using lists. (Yeah, not quite the same, but it carries the idea.)

nth n l|n<ltlen=nth n lt|n>ltlen=nth (n-ltlen-1) gt|otherwise=pivotwhere(pivot,l')=select n l (lt, gt)=partition (<pivot) l' ltlen=length lt

`partition`

is a builtin; `select`

is a function that pulls out the nth element from a list.**verifying**But how do I know it works? There's a neat library called "QuickCheck" that does randomized testing: you tell it what sorts of inputs your function expects and a way to test whether the output is correct, and it randomly generates inputs to verify proper functioning.

QuickCheck is one of those libraries that seems to work by magic: no code preprocessing or introspection, just (as always) fancy types.

For example, here's how I write the requirement for

`nth`

. I rely on the builtin `sort`

to give me the correct answer.prop_Nth n lThis is just a normal function definition, but you can read the first line as "the property Nth holds for inputs n and l, when ..."=nth n l==sort l!!n

(Here,

`!!`

is the nth-element-from-a-list operator. I assume it has a scary-looking name because it's O(n).)And then you check it with simply:

`quickCheck prop_Nth`

And QuickCheck can figure out (through type inference, again) that this particular property takes an integer and a list, and automatically generates test cases and runs them.

When I first ran this it found all sorts of places where it'd fail that would be easy to forget with a unit test: for example, when the input list is empty, or when n is not in the bounds of the list. So they provide all sorts of fancy higher-level operations that let you modify the random-input-generators or restrict the inputs, or even have it provide breakdowns of the sorts of inputs it gave ("32% three-element lists.") The manual is hokey-looking but quite impressive.

**P.S**Here's an interesting thread about how laziness affects big-O.

nibot_labevanedge_walkerQuickCheck sounds like a Haskell-enmagicked version of LectroTest. Cool.

gaal`##[ x <- Int, y <- Int ]##`

is very much Hs list comprehension style, and he manages nicely to Perlify QuickCheck's`class Arbitrary`

.edge_walkerHeh, I only just noticed that he credited QuickCheck as the inspiration for LectroTest in the docs.

I learned about LectroTest long before I knew of Haskell. Small world.