Little-Known Awesome Algorithms: O(N+K) Sorting — Counting Sort
Counting Sort, an O(N+K) sorting algorithm
Recently I came across a sorting method that I wasn’t taught back when I took my algorithms course in school. That algorithm is called ‘counting sort’. The idea behind counting sort is that if the number of distinct elements in our data-set is significantly smaller than the total number of elements in the data-set then we can sort in O(N+K), where N is the number of elements and K is the number of distinct elements. For example, If we count 10 occurrences of the number 0 in our data-set, we know we can simply replace the first 10 elements in our sorted array to be 0.
When would this be useful?
Let’s say you are sorting a list of ages, heights, or any metric with small set of possible values. The set of possible values for this type of data is pretty small, for ages it’s merely 0 – 120. This is an excellent use-case for bringing out counting sort (That is, if you need the speed improvement of O(n+k) over O(nlogn)).
How to implement?
This sorting algorithm is extremely simple to implement, a basic approach is as follows:
- Iterate over your data-set and count the frequency of each elements occurrence. (Just count the items)
- Iterate over your data-set from start to finish replacing values.
- For example if your frequency array contains a frequency of 10 for the element ’0′, then you overwrite the first ten elements in your sorted data-set with 0. Repeat for the frequency of 1, etc.
- Viola your data-set is sorted!
Code example
void basic_count_sort(int * arr, int size) {
// Find the largest element in our array
int max = *std::max_element(arr, arr+size);
// Make an array of that size, to store frequency of each element
int * count = new int[max+1];
//Initialize array to 0
std::fill(count, count+max+1, 0);
for(int i=0;i<size;++i) {
++count[arr[i]];
}
int currPos = 0;
for(int i=0;i<max+1;++i) {
if(count[i] != 0) {
// Fill array with i, count[i] times
// From array indices pos to pos+count[i]
std::fill(arr + currPos, arr + currPos + count[i], i);
currPos += count[i];
}
}
}
This is not a comparison sort
If you have a keen eye, you’ll notice that this algorithm does not contain any logic to check whether one element is ‘less than’ another element. This means that counting sort can’t be used on anything other than primitive types. We took advantage of the fact that each element in our data-set could be used as an index in our counting array, which created an implicitly sorted array when we iterated over it.
Why isn’t counting sort available in standard Java or C++ libraries?
Even though this sort seems like an amazing trick that could be used all the time, it turns out that it’s not as useful as it first sounds. The two main reasons why this isn’t part of java/c++ standards is first because of the fact that it doesn’t work on arbitrary objects, only primitive types. And second because it turns out this actual likelihood of someone really needing this performance gain is unlikely so it is simply left out of standard libraries. After all, if you happen to be in need of performance you should probably be rolling your own sorting methods instead of using the standard library (Don’t quote me on this, I’m sure there are tons of people who rely on std::sort for very performance intensive applications).
Notes
The code above only exists to demonstrate the idea, not to be as efficient as possible, nor comprehensive. For example, if we have negative values in our data-set we would have to implement an offset so we don’t access negative values in our count array. As well, if we only have 2 values in our data-set, 1 and 1000000, the above implementation would still create an array of 1000000, instead of a more involved method using a hash-map.
Also if you notice, this sort doesn’t require the data to be in a data-structure with random-access properties, this would work even with a linked list.
Here’s the above code being compiled and run with a small data-set:
Compile, run, and see that it works!
Thanks a lot for reading! Usually readers comment stating nitty details in my articles that are incorrect, if you find some let me know in the comments so I can correct them! Also, If you happen to find this useful or interesting, go ahead and thanks me by: like/upvote/submit it on your favorite social media site!
-
teferi
-
aes