<< the question index distributing >>

not suprisingly it turns out doing a complete sort is a bit wasteful if we're only after the median.

all we want is the element that had the same number of elements on either side.

this type of operation, known as a partition, is actually the basis of that most famous sort of all, quicksort.

the basic method of the quicksort algorithm is to select an element, called a pivot and move it into it's correct sorted position by shuffling elements around it.

all elements less than the pivot are moved to the front of the list and the rest, corresponding to elements equal or greater, are left in place.

after this partitioning the pivot is in it's correct sorted position (though the rest of the list is still a shambles).

the rest can be sorted by recursively applying the same partitioning approach to both the elements
before the pivot and the elements after the pivot.

to find just the median though we can use the idea partitioning around the first pivot, without having to do the extra work of the complete sort.

first an informal definition: regarding order statistics

the n^{th} order statistic of a list is the element which is in the n^{th} postition when the list is sorted

eg the 1^{st} order statistic of a 9 element list is the minimum

eg the m^{th} order statistic of a m element list is the maximum

eg the m^{th} order statistic of a 2m element list is the median

let's work through an example

say we have the list

[4 3 1 1 5(we'll use an underline to denote the value in the mth order stat position for a 2m element list)92 6 5 3 5]

we are after the median, ie the 6th order stat

if we partitioned around the first list element, ie 4, the list would be

[3 1 1 2 3(we'll use red values to denote those less than a pivot, green values for those equal or greater than a pivot) since the 4 has been moved to position 6 we can say 4 is the 6th order statistic of this list.5 9 6 5 5]4

which, as luck would have it, is exactly what we were looking for!

so this pivot, 4, is the median.

this was much less work than quicksort since we didn't have to sort the sublists [3 1 1 2 3] or [5 9 6 5 5]

but it only worked because 4 was a good choice.

but what if after the parititioning the pivot isn't the 6th order stat?

consider a case then the pivot ends up "to the right" of the median postition

L1 = [6 3 1 1 5if we partitioned around the first element, 6, we have92 4 5 3 7]

L1 = [3 1 1 5 524 369 7]

this time we have calculated 6 to be the 9th order statistic.

since we were after the 6th order stat so know then that 6 is not the median.

furthermore since the 9th order stat is "to the right" of the 6th order stat we know then that the median must be less than 6.

therefore not only do we know 6 is not the median, it can't be 9 or 7 either.

this is very important since it means we can actually discard these values completely.

L2 = L1 - [6 9 7]

L2 = [3 1 1 5 524 3]

*L1's 6th order stat, it's median, is L2's 6th order stat*

(since we discarded values *after* the middle value we can continue by looking for the same order stat)

take note though that the 6th order stat of L2 is **not** L2's median.

L2's median would be it's 4th order stat.

consider a case then the pivot ends up "to the left" of the median postition

L1 = [2 3 1 1 5again we are after the median which is the 6th order stat93 6 5 3 5]

partitioning around the first element, 2, we have

L1 = [1 123 593 6 5 3 5]

in this case our pivot is the 3rd order stat, but we were after the 6th order stat

since the 3rd order stat is "to the left" of the 6th order stat we can additionally say the median isn't 1

discarding this time values less than the pivot

L2 = L1 - [1 1]

L2 = [2 3 593 6 5 3 5]

*L1's 6th order stat, it's median, is L2's 4th order stat*

(since we discarded values *before* the middle value we need to adjust what order stat we are now looking for)

if we are looking for the kth order stat and we choose a pivot that is it we've found the median.

otherwise we throw some values away, adjust what order stat we're looking for and continue.

what a surprise, there is a corner case....

say we have list

L1 = [2 3 1 1 5and we choose the first value, 2, as a pivot we discard some values (the 1's) and get93 6 5 3 5]

L2 = [2 3 5now if we continue and choose the first value, 2, as a pivot we end up discarding nothing more and get93 6 5 3 5]

L3 = [2 3 5and we've gotten nowhere...93 6 5 3 5]

so if the pivot ends up in the first place we have to rotate the list (ie move first element to end) and try again.

what a surprise, there's another corner case....

this algorithm doesn't solve that case though when all elements of the list are the same

L1 = [3 3 3 3 3]no amount of rotation will save us now.

so we need some special handling for this one;

it's simply; if list min = list max then the median is the first element

if we ever end up looking for the 1st order stat we can just scan for the minimum

if we ever end up looking for the kth order stat of a k element list we can just scan for the maximum

<< the question index distributing >>

nov 2008