<< the question index distributing >>
not suprisingly it turns out doing a complete sort is a bit wasteful if we're only after the median. all we want is the element that had the same number of elements on either side. this type of operation, known as a partition, is actually the basis of that most famous sort of all, quicksort.
the basic method of the quicksort algorithm is to select an element, called a pivot and move it into it's correct sorted position by shuffling elements around it. all elements less than the pivot are moved to the front of the list and the rest, corresponding to elements equal or greater, are left in place. after this partitioning the pivot is in it's correct sorted position (though the rest of the list is still a shambles). the rest can be sorted by recursively applying the same partitioning approach to both the elements before the pivot and the elements after the pivot.
to find just the median though we can use the idea partitioning around the first pivot, without having to do the extra work of the complete sort.
first an informal definition: regarding order statistics the nth order statistic of a list is the element which is in the nth postition when the list is sorted eg the 1st order statistic of a 9 element list is the minimum eg the mth order statistic of a m element list is the maximum eg the mth order statistic of a 2m element list is the median
let's work through an example say we have the list
[4 3 1 1 5 9 2 6 5 3 5](we'll use an underline to denote the value in the mth order stat position for a 2m element list)
we are after the median, ie the 6th order stat if we partitioned around the first list element, ie 4, the list would be
[3 1 1 2 3 4 5 9 6 5 5](we'll use red values to denote those less than a pivot, green values for those equal or greater than a pivot) since the 4 has been moved to position 6 we can say 4 is the 6th order statistic of this list. which, as luck would have it, is exactly what we were looking for! so this pivot, 4, is the median.
this was much less work than quicksort since we didn't have to sort the sublists [3 1 1 2 3] or [5 9 6 5 5] but it only worked because 4 was a good choice. but what if after the parititioning the pivot isn't the 6th order stat?
consider a case then the pivot ends up "to the right" of the median postition
L1 = [6 3 1 1 5 9 2 4 5 3 7]if we partitioned around the first element, 6, we have
L1 = [3 1 1 5 5 2 4 3 6 9 7]
this time we have calculated 6 to be the 9th order statistic. since we were after the 6th order stat so know then that 6 is not the median. furthermore since the 9th order stat is "to the right" of the 6th order stat we know then that the median must be less than 6. therefore not only do we know 6 is not the median, it can't be 9 or 7 either. this is very important since it means we can actually discard these values completely.
L2 = L1 - [6 9 7]
L2 = [3 1 1 5 5 2 4 3]
L1's 6th order stat, it's median, is L2's 6th order stat (since we discarded values after the middle value we can continue by looking for the same order stat)
take note though that the 6th order stat of L2 is not L2's median. L2's median would be it's 4th order stat.
consider a case then the pivot ends up "to the left" of the median postition
L1 = [2 3 1 1 5 9 3 6 5 3 5]again we are after the median which is the 6th order stat
partitioning around the first element, 2, we have
L1 = [1 1 2 3 5 9 3 6 5 3 5]
in this case our pivot is the 3rd order stat, but we were after the 6th order stat since the 3rd order stat is "to the left" of the 6th order stat we can additionally say the median isn't 1 discarding this time values less than the pivot
L2 = L1 - [1 1]
L2 = [2 3 5 9 3 6 5 3 5]
L1's 6th order stat, it's median, is L2's 4th order stat (since we discarded values before the middle value we need to adjust what order stat we are now looking for)
if we are looking for the kth order stat and we choose a pivot that is it we've found the median. otherwise we throw some values away, adjust what order stat we're looking for and continue.
what a surprise, there is a corner case....
say we have list
L1 = [2 3 1 1 5 9 3 6 5 3 5]and we choose the first value, 2, as a pivot we discard some values (the 1's) and get
L2 = [2 3 5 9 3 6 5 3 5]now if we continue and choose the first value, 2, as a pivot we end up discarding nothing more and get
L3 = [2 3 5 9 3 6 5 3 5]and we've gotten nowhere...
what a surprise, there's another corner case.... this algorithm doesn't solve that case though when all elements of the list are the same
L1 = [3 3 3 3 3]no amount of rotation will save us now. so we need some special handling for this one; it's simply; if list min = list max then the median is the first element
if we ever end up looking for the 1st order stat we can just scan for the minimum if we ever end up looking for the kth order stat of a k element list we can just scan for the maximum
<< the question index distributing >>
nov 2008