Put on Your Sorting Hat: Which Algorithm Makes Your Sort 500 Times Faster?
- Old Reliable: The Selection Sort
- The Merge-Sort Algorithm
- Why Is One Algorithm Faster Than Others?
- Other Algorithms
Most people who've done any programming have at least heard about sorting algorithms. You may have read that the selection-sort or bubble-sort algorithm is the easiest to write, but did you know that it's also the slowest performer? The quick-sort algorithm is usually much faster, especially for large arrays. But why?
Expert programmers know why one algorithm is faster than others, as well as how to quantify that difference. In the case of sorting, we're talking about large potential differences: For an array 10,000 elements long, a merge sort isn't just two or three times as fast as a selection sort. In this case, the better algorithm is a hundred times faster! If your database had to sort a million elements, the contrast would be far greater still.
See why it pays to understand algorithms?
Old Reliable: The Selection Sort
Suppose you were stranded on a desert island and wanted to sort a row of coconuts. You'd probably do the following:
- Scan all the coconuts and put the smallest one in the first (leftmost) position.
- Among the remaining N-1 coconuts, put the smallest in the second position.
- Among the remaining N-2 coconuts, put the smallest in the third position.
- Continue until the entire row has been sorted.
There is probably no better way to sort an array manually, because a more complex procedure might confuse a human worker. As you'll see, though, computers can do better.
Of course, the computer can't just glance at an unsorted array and find the smallest element. There's no way to carry out the instruction "Find the smallest element" without doing N-1 comparisons, where N is the size of the group. The computer has to follow this procedure:
- Assume that the leftmost element is the smallest. Let's record this position as iMin. In C/C++ terms, that means that the integer variable iMin is initialized to 0.
- Compare the element from step 1 to the next element in the array. If the next element is smaller than the current minimum (that is, the element at position iMin), change the value of iMin to index this new position.
- Repeat this process until you reach the end of the array.
- If iMin doesn't index the leftmost element, swap the leftmost element with the element at position iMin.
Bear in mind that this is the procedure for finding the smallest element during one pass of the algorithm. You'll need to make repeated passes to sort the entire array.
Let's put it all together. (I'll show the C++ code in the next section.) For an array of size N, the total number of comparisons is determined as follows:
- To find the smallest element, make N-1 comparisons.
- To find the second-smallest element, make N-2 comparisons.
- To find the third-smallest element, make N-3 comparisons.
So, for an array of size 6, the total number of comparisons required for sorting is as follows:
5 + 4 + 3 + 2 + 1 = 15
This may look familiar. It's an example of the "triangle number" calculation, which in this case can be expressed as follows:
( N (N—1) ) / 2
Computer scientists characterize this value as O(n2). You might object that this figure is really closer to O(n2/2), but in computer science, the factor of 2 is ignored as a measure of algorithm speed. Constant factors are ignored because what's of most interest here is how the work required increases with the number of elements.
Technically, this measurement is characterized by computer scientists with the term complexity—it doesn't measure time in any absolute sense, but rather measures how complex the work to be done is. Allowing for some constants we'll examine later, we can view O(n2) as a time-duration measurement.
A selection sort always requires O(n2) operations. Even if the array is already sorted, the algorithm has to make the same number of comparisons. Relatively speaking, O(n2) is expensive. What happens if we need to sort a million elements? Is it beginning to dawn on you that it's possible to do better?
Programming a Selection Sort
Although a selection sort is in many ways the worst algorithm, it's easy to program. The code takes each position in turn, starting with the first, and finds the lowest element in the range consisting of that position and everything to its right. In the C/C++ family of languages, of course, the first position is indexed as 0.
void sel_sort(int A[], int n) { int iMin; // Index of min. value within range int temp; // Temp. value used for swapping // Look at each position, starting w/ 0. for (int i = 0; i < n; i++) { iMin = i; // Assume i is position of minimum elem. // Note i is leftmost position in sub-range. // Compare min. value to each element to its RIGHT. // If smaller elem is found, it becomes the new min. for (int j = i; j < n; j++) { if (A[j] < A[iMin]) iMin = j; } // A[iMin] now contains min. value in sub-range. // If min. element is not i, then swap! if (i != iMin) { temp = A[i]; A[i] = A[iMin]; A[iMin] = temp; } } }