Author Archives: Rodrigo Salazar

HipHop VM on EC2 – “could not allocate 1210089471 bytes for translation cache”

Facebook Engineering recently made a press release regarding the HipHop project, HHVM is finally faster than the hiphop compiler. They also made a post about how to get WordPress running on HipHop Virtual Machine! This is a pretty big deal, after all, it’s estimated that over 20% of the sites on the web today are actually WordPress installations. It has been reported that HHVM yields a 2.7x speedup over the regular php interpreter. I found this pretty interesting because I try to get the most out of my EC2 Micro instance before I have to upgrade to a m1.small (After all, I still have 3 months in my micro instance EC2 free tier!).

I went ahead and followed the instructions on the hiphop-php.com blog post to see how easy it was to get started with HHVM. I started up a fresh EC2 Micro instance, Ubuntu 12.04 Precise. The WordPress core needs only about 5 lines changed by hand and it took 3 hours to compile the HHVM source. I go ahead and try to start up the HHVM binary and nothing happens. I turn off the daemon flag to see if any errors are thrown and I get hit with the following: “could not allocate 1210089471 bytes for translation cache”.

This means that HipHopVM requires 1.12 Gigabytes of memory to start-up, this is beyond what an EC2 Micro instance can offer. To test the result I went ahead and upgraded the instance to a m1.small, HHVM started up without a problem and WordPress was working perfectly.

I went ahead and checked the docs and source to see if there was a way to lower the size of this ‘cache’, from the looks of the source where this error is printed, the code relies on the memory being that exact size, so it is far from an easy thing to change.

HHVM is not for pushing tiny servers to their limits.

Little-Known Awesome Algorithms: O(N+K) Sorting — Counting Sort

Counting Sort, an O(N+K) sorting algorithm

Recently I came across a sorting method that I wasn’t taught back when I took my algorithms course in school. That algorithm is called ‘counting sort’. The idea behind counting sort is that if the number of distinct elements in our data-set is significantly smaller than the total number of elements in the data-set then we can sort in O(N+K), where N is the number of elements and K is the number of distinct elements. For example, If we count 10 occurrences of the number 0 in our data-set, we know we can simply replace the first 10 elements in our sorted array to be 0.

 

When would this be useful?

Let’s say you are sorting a list of ages, heights, or any metric with small set of possible values. The set of possible values for this type of data is pretty small, for ages it’s merely 0 – 120. This is an excellent use-case for bringing out counting sort (That is, if you need the speed improvement of O(n+k) over O(nlogn)).

 

How to implement?

This sorting algorithm is extremely simple to implement, a basic approach is as follows:

  • Iterate over your data-set and count the frequency of each elements occurrence. (Just count the items)
  • Iterate over your data-set from start to finish replacing values.
    • For example if your frequency array contains a frequency of 10 for the element ’0′, then you overwrite the first ten elements in your sorted data-set with 0. Repeat for the frequency of 1, etc.
  • Viola your data-set is sorted!

 

Code example

void basic_count_sort(int * arr, int size) {
    // Find the largest element in our array
    int max = *std::max_element(arr, arr+size);

    // Make an array of that size, to store frequency of each element
    int * count = new int[max+1];

    //Initialize array to 0
    std::fill(count, count+max+1, 0);

    for(int i=0;i<size;++i) {
        ++count[arr[i]];
    }

    int currPos = 0;
    for(int i=0;i<max+1;++i) {
        if(count[i] != 0) {
            // Fill array with i, count[i] times
            // From array indices pos to pos+count[i]
            std::fill(arr + currPos, arr + currPos + count[i], i);
            currPos += count[i];
        }
    }
}

 

This is not a comparison sort

If you have a keen eye, you’ll notice that this algorithm does not contain any logic to check whether one element is ‘less than’ another element. This means that counting sort can’t be used on anything other than primitive types. We took advantage of the fact that each element in our data-set could be used as an index in our counting array, which created an implicitly sorted array when we iterated over it.

 

Why isn’t counting sort available in standard Java or C++ libraries?

Even though this sort seems like an amazing trick that could be used all the time, it turns out that it’s not as useful as it first sounds. The two main reasons why this isn’t part of java/c++ standards is first because of the fact that it doesn’t work on arbitrary objects, only primitive types. And second because it turns out this actual likelihood of someone really needing this performance gain is unlikely so it is simply left out of standard libraries. After all, if you happen to be in need of performance you should probably be rolling your own sorting methods instead of using the standard library (Don’t quote me on this, I’m sure there are tons of people who rely on std::sort for very performance intensive applications).

 

Notes

The code above only exists to demonstrate the idea, not to be as efficient as possible, nor comprehensive. For example, if we have negative values in our data-set we would have to implement an offset so we don’t access negative values in our count array. As well, if we only have 2 values in our data-set, 1 and 1000000, the above implementation would still create an array of 1000000, instead of  a more involved method using a hash-map.
 
Also if you notice, this sort doesn’t require the data to be in a data-structure with random-access properties, this would work even with a linked list.

Here’s the above code being compiled and run with a small data-set:
Compile, run, and see that it works!

Thanks a lot for reading! Usually readers comment stating nitty details in my articles that are incorrect, if you find some let me know in the comments so I can correct them! Also, If you happen to find this useful or interesting, go ahead and thanks me by: like/upvote/submit it on your favorite social media site!

How to program a Gaussian Blur without using 3rd party libraries

What is a Gaussian Blur?

Something I found fairly difficult to find online was a simple explanation on how to implement my own Gaussian Blur function. This article will explain how to implement one.

The basic idea behind a Gaussian Blur is that each pixel becomes the average of the pixels around it, sort of. Instead of simply taking the average of all the pixels around it, you take a weighted average. The weighting of each pixel is greater if it is closer to the pixel you are currently blurring. The Gaussian Blur technique simply describes how to weigh each neighboring pixel. Imagine the pixel you are currently blurring is located at the peak of the hump in the image below and the pixels around it are receiving less weight as they get farther away. You can consider the image below to be considering up to 5 pixels away, this means the Gaussian blur has a ‘window’ of size 10, also known as a kernel size.

This is where the Gaussian equation comes in, using it we can find out how much weight we want each pixel to receive and pixels receive less weight depending on its distance to the center pixel.

Let’s explain what everything in this equation means:

σ (lowercase sigma) – This is the blurring factor, the larger this number is, the smoother/blurrier the image becomes.

e - This is simply euler’s number, a constant, 2.71828182846

x – This is the distance from the origin — The horizontal distance to the center pixel.

y – This is the distance from the origin — The vertical distance to the center pixel.

This means that x and y in this equation will be zero for the center pixel (the current pixel we want to blur), and x^2 + y^2 increases as we get farther away from the center, causing lower weights for pixels farther away.

Calculating a Gaussian Matrix, also known as a Kernel

Let’s say we wanted to find out how we would weigh neighboring pixels if we wanted a ‘window’ or ‘kernel size’ of 3 for our Gaussian blur. Of course the center pixel (the pixel we are actually blurring) will receive the most weight. Lets choose a σ of 1.5 for how blurry we want our image.

 Here’s what our weight window would look like:

With each weighting evaluated it looks like this: (Notice that the weighting for the center pixel is greatest)

If you’re pretty observant you’ll notice that this matrix doesn’t add up 1. For this to represent a weights, all the weights when summed together will have to add up to 1. We can multiply each number by 1/sum to ensure this is true. The sum of this matrix is 0.4787147. This means we need to multiply the matrix by 1/0.4787147 so that all elements end up adding up to 1. We finally get the following matrix which represents how we will weight each pixel during a blur.

 

Applying this Gaussian Kernel Matrix to an image.

Lets say this is our image: (Each number can represent a pixel color from 0-255)

 

To blur this image we need to ‘apply’ our kernel matrix to each pixel, i.e. we need to blur each pixel.

Let’s say we want to blur pixel #25 (the pixel whose color is 25 in our image matrix). This means we get pixel 25 and replace 25 with the average of its neighbors. We weigh each of the neighbors (and 25 itself) with the kernel matrix we created earlier. So it would go as follows:

Now that we have weighed each neighbor appropriately, we need to add up all the values we have and replace 25 with this new value!

Loop this process with every single pixel and you will have a blurred image.

Caveats

Corner Pixels

When you need to apply this kernel matrix to a pixel in a corner, where you don’t have enough room to apply the matrix, a neat trick is to either ‘wrap around’ the image and use the pixels on the opposite side. This tends to work well if the image is intended to be tiled (typically not though) If you look at the image of the fish at the top of this article, you can tell that wrapping was used for the top row of the image since it is so dark. Another very simple solution is to just copy one of the nearest pixels into spots which are missing pixels so you can complete the process. The end result is definitely acceptable.

Time Complexity

It turns out that the simple procedure described above can be improved greatly. The time complexity of the above algorithm is O(rows*cols*kernelwidth*kernelheight). Gaussian blur has a special property called separability, the blur can be applied to each kernel row first in 1 pass, then each kernel column in another and you can achieve the same result. This means that you do not need to traverse the entire kernel matrix for each pixel. This lowers the time complexity to O(rows*cols*kernelheight + rows*cols*kernelwidth).

You can read more on the Separability property on the Gaussian Blur wikipedia page.

How a Scam Artist Can Use Divide and Conquer for Confidence Scamming

I heard about this interesting trick last week regarding how someone managed to scam hundreds of people out of money. (Not that I condone scamming, it’s just very clever!) Supposedly this happened in the 80′s with regular old snail mail. I can’t find a source for this so if someone can comment with one, that would be really great.

1) Mail a letter to 5000 people saying that the 49ers football team will win next week.
2) Mail a letter to 5000 other people saying that the 49ers football team will lose next week.
3) With the group that you guessed right on, divide them in two and mail them opposing predictions once again.
4) Repeat step 3 a couple more times.
5) The remaining people may believe that you have the ability to predict sports/finances/anything with high accuracy, now you ask them to invest/gamble money with you.

These days with email being virtually free to send, emailing a ‘letter’ to only 5000 people is fairly simple for con artists, sending millions of emails is even possible. I mentioned this procedure to someone that had already heard of this before and he said ‘Yea, that’s how to the financial industry works’.

P.S. I know ‘divide and conquer’ can’t possibly be the right term to describe this procedure since there’s no problem really being solved in each half, but it’s the best I could describe it as, having a CS background and all.

EDIT: Someone posted a pretty cool video in the comments, check it out:

 

Zeckendorf’s Theorem & Simple Code Implementation

Zeckendorf’s theorem says that every positive integer can be represented as the sum of non-consecutive Fibonacci integers and that such a representation is unique.

For example:
100 = 89 + 8 + 3
10,000 = 6765 + 2584 + 610 +34 + 5 + 2

Notice though that there are many ways to represent an integer as the sum of Fibonacci numbers but only 1 way to represent them using non-consecutive Fibonacci numbers.

Personally, I’ve always found theorems like this to be pretty neat, but I’ve also always felt that tediously reading through proofs to be less interesting than just discovering the end result. Through induction you can prove existence and uniqueness, if you’re so inclined to read through the proof, check out: Proof of Zeckendorf’s Theorem

It turns out that this type of representation is very easy to generate programmatically. Let’s say we want to break the integer N into fibonacci numbers, since we know every number can be broken into fibonacci numbers, if we substract some fibonacci number from N, the remaining difference can also be broken down into more fibonacci numbers. Through repeated subtraction of fibonacci numbers, we can build this representation. Though, to ensure that the fibonacci numbers that are being subtracted are part of the Zeckendorf representation (non-consecutive fibonacci numbers) we make sure to test the numbers from largest fib posible to smallest. This would be considered a ‘Greedy’ solution.

Here is a code implementation which reads in a number to break down and prints out the Zeckendorf fibonacci representation. We generate 35 fibonacci numbers, which is enough to break all numbers < Fib(35), which is around 9 million. Every time we find a fibonacci number which is less than or equal to our number, we substract it and decrement out current_fib counter an additional time so we do not choose consecutive fibonacci numbers.

We don’t ever make sure that out curr_fib counter goes below zero because we are ensured that every number has a Zeckendorf representation through the ‘existence’ property.

#include <iostream>
using namespace std;

int main()
{
  // Let's compute 35 fibonacci numbers, this is pretty large. 9,227,465
  int fib[35];
  fib[0] = 1;
  fib[1] = 1;
  for(int i=2;i<35;++i)
  {
    fib[i] = fib[i-1] + fib[i-2];
  }

  int number;
  cin >> number; // Read in a number to break down

  int curr_fib = 34; // Index of the largest fibonacci number in our fib array
  while(number > 0)
  {
    if(number >= fib[curr_fib])
    {
      number -= fib[curr_fib];
      cout << fib[curr_fib] << endl; // Print out
      --curr_fib; // Do not choose consecutive fibs, let's skip the next one
    }
    --curr_fib;
  }
}

Little-Known Awesome Algorithms: Adversary Discovery

Contrived Use-Case

Imagine you have a group of N people and you want all of them to know your secrets. The problem is that not everyone is trust-worthy, in fact, let’s estimate that K of them are not trust-worthy. How can you discover which of these people are the snitches or trouble-causing ones so you can exclude them from your list of people you can trust?

A trivial solution to this problem might be immediately clear to you! Just tell every single one a different secret and see which secrets get compromised. This means that you will have to reveal N secrets to finish your discovery. You don’t have that many secrets, I hope.

Problem: How can you find every adversary while attempting to minimize the number of secrets you divulge (or in a more general sense, the number of trials you run)?

 

Real Use-Case?

Before we continue on to a simple solution for this problem, lets consider a possible real-world use-case for this algorithm. Imagine we have N servers hooked together where the entire system fails if one of them fails. We estimate that there are K bad servers. If you let the system run for a whole day it will crash if it contains a bad server (adversary server). How can you find the servers which are bad? The trivial solution is to just run each server on its own for a day and check which ones crashed. (Lets assume we can only have one ‘system’ running at a time, so testing N servers individually would take N days). How would you find the bad servers in the minimum number of days?

Both problems described are analogous (except for one difference that I’ll describe later).

 

Simple Solution

Imagine that we only have 1 adversary in our set of N. How would you approach finding a single adversary? One thing to note is that in this problem you always have a way to ‘test’ if a given set contains an adversary. In the contrived example it was waiting to see if the secret gets out into the general public, and in the server example it was waiting to see if the system crashed. We can call this a confidence test, if a group passes a confidence test then it does not contain an adversary.

An easy way to approach this is with a recursive binary search approach. Split the N people in half and tell them each a different secret (apply the confidence test on both). Once one of the secrets gets out into the public, you now know which group the adversary belongs to (or in the server example, if one of the systems crashes). If you repeat this process you can detect the adversary in O(log(N)) time.

Now to extrapolate this solution to finding K adversaries we can split the N people into N/K buckets. Now you can test each bucket and split it in 2 only if you find that it fails your confidence test. Once this completes you will have discovered every adversary in a fairly speedy way! Let’s look at the time complexity of this.

Step by Step of our tree search

We split each bucket in two as long as it contains an adversary. Red dots represent adversaries and green check-marks mean we confirmed that there are no adversaries in the bucket.

In the worst case, we find ourselves with an adversary in each of our N/K buckets. This means that each one of this buckets will be split in 2 repeatedly until we single out the adversary (or adversaries). This splitting procedure takes Log(N/K) time and since there are K buckets we do this procedure K times. You can think of the time complexity as K*Log(N / K).

But how many secrets do we divulge (or how many days do we test the servers)? We initially divulge K secrets (one to each bucket) and then divulge two more every time we split a bucket. The maximum amount of secrets we divulge with this procedure is K + K*Log(N / K).

 

 

A More Involved Solution Exists

One huge optimization to this procedure is the fact that in certain cases of this problem, you can re-use secrets that you give to people in different branches/buckets (Note: The servers problem does not have this property). Taking this into account it is possible to reach a maximum of only log(N)/log(log(N)) secrets divulged.

 

If you found this article interesting/cool, go ahead and share or like it on your favorite social network!

Other article in this series: Fenwick Range Trees

 

Sources: Fighting Censorship with Algorithms

Thanks to ‘Paint Online’ for the awesome online paint I used to create the diagram.

Hotmail deletes old accounts — Allowing others to hijack your identity

One thing I don’t quite understand is why Microsoft Hotmail, now known as Outlook, decided it would be a good idea to delete old inactive email accounts. About 5 years ago, about the time when Gmail really got huge, I abandoned my Hotmail account which happened to be tied to quite a few websites. Since Hotmail has since deleted the email address (Because of inactivity), anyone can just go ahead and recreate the email address as their own. This means they can hijack any one of my old accounts by just using the recover password through email feature!

This is pretty terrible. You can pretty much just crawl the web for @hotmail.com accounts, check if they have been deleted by Hotmail and hijack plenty of old forum identities, and much more. This can be pretty disastrous for some people who have a lot invested in their online identity.

The only advantages I can come up with for deleting old email addresses is maybe so you don’t have to spend money storing my 2mbs of old emails or to make more email addresses available for new users (But most email addresses are fairly unique anyway).

C++11 Standard — What are Smart Pointers good for?

What is a shared_ptr?

A common problem in larger C++ projects is memory management. When a pointer is being passed from function to function, who has the responsibility of deleting the data at the end? I suppose you can come up with a few rules that can be followed to help manage this sort of delegation, but to solve this problem there’s the idea of shared pointers, otherwise known smart pointers. The shared_ptr is a way of giving new pointers ‘ownership’ over a piece of memory, so long as there is someone claiming ‘ownership’, the memory will not be deallocated.

 

Is this really a good idea?

“Use shared_ptr as a last resort”
-Bjarne Stroustrup

I definitely agree with Bjarne on this one, but I would also argue that it strongly depends on your coding style. Having this sort of system in place and making it common practice in your code-base could encourage an excess of shared ownership. Code with less shared ownership will definitely lead to an easier to comprehend code-base and far less bugs, this idea goes well with the idea of OOP de-coupling principles. Of course, shared ownership most of the time is something we just can not avoid; but it’s something we should actively try to avoid. Consider the following alternative — C++11 also supports scoped_ptr which automatically deallocates memory for you, similar to a shared_ptr, except that copying the memory address off the pointer yields a compile error!

 

The basic shared_ptr explanation, for the uninformed

C++ TR1 introduced the shared_ptr (from the boost library). You can think of this as a ‘reference counted pointer‘. Every time that someone takes ‘ownership‘ of this pointer, a counter is incremented in the background. Every time this pointer goes out of scope or is explicitly ‘reset‘ to null, the counter is decremented. If this counter ever reaches 0, the memory that it references is automatically deleted. This means that if we diligently use shared_ptr to hold all of our pointers, we can easily avoid memory leaks which are caused by letting pointers that go out of scope or are overwritten (of course, smart pointers will ensure that the proper virtual functions are called but they can’t protect you if destructors are not properly implemented!)

Here’s a minimal example of a shared_ptr:

#include <memory>

using namespace std;

void my_function() {
    shared_ptr<int> my_ptr(new int(9001));
    shared_ptr<int> also_ptr = my_ptr;
    // Internal counter is 2, the integer will not be deleted from memory
    // Until both of the shared_ptrs have been .reset() or go out of scope.

    my_ptr.reset();
    also_ptr.reset();
    // Note: even if we did not call reset, both of the shared_ptrs going
    // out of scope (end of function), would have caused an auto-deletion!
}

There are many nuances to using shared_ptrs, the most common one is circular references. In the case of having objects which contain pointers which eventually point back to themselves (back-reference), a shared_ptr can’t handle this because deletion, it just doesn’t know what to do. You fix this by making ‘back-references’ weak_ptrs, this means that when you assign the back-reference, it is not reference counted.

 

Sources:

http://programmers.stackexchange.com/questions/133302/stdshared-ptr-as-a-last-resort

http://www.codeproject.com/Articles/8394/Smart-Pointers-to-boost-your-code

http://www.boost.org/doc/libs/1_51_0/libs/smart_ptr/shared_ptr.htm

http://www.boost.org/doc/libs/1_51_0/libs/smart_ptr/weak_ptr.htm

Difficult Amazon interview question — Find all phrases you can build, given a set of letters.

Recently I had an interview at Amazon where we only had 20 minutes left to solve one more problem. The interviewer brought up the following problem:

“Are you familiar with anagrams? Take your name, “Rodrigo Salazar” and rearrange the letters in your name to make a new phrase, print out every possible phrase.”

For example, you can rearrange the letters in my name, Rodrigo Salazar, to form the phrase “A Lizard Roars Go”. We’re also given a dictionary of words that are valid. You don’t necessarily have to use all the letters.

A Lizard Roars Go

Possible Anagram

The obvious naive approach is to simply iterate over every permutation of the letters in “Rodrigo Salazar”, arbitrarily insert spaces, and then verify that each word in the phrase is contained in the dictionary. This solution is pretty terrible because of how many permutations there are, but of course, I explained this solution immediately to get the idea across during the interview and build off of it.

This was particularly difficult because of the pressure to find the solution in such a short amount of time. The basis of my final solution was the following:

We can work backwards off the dictionary to find words which are buildable using the letters that we have available in my name. Once we find a word that is buildable we recursively call this same procedure on the remaining letters and concatenate the previous word with the next word, continue till no more words are buildable using the remaining letters. The code could look something like the following:

 

// Can we build string 'word' using the letters we have?
bool word_buildable(string word, vector<int> letter_freq)
{
    for(int i=0;i<word.length();++i) {
        if(letter_freq[word[i]-'A']-- == 0)
          return false;
    }
    return true;
}

void findAllPhrases(
string partial_phrase, // Initially called with empty string ""
vector<int> letter_freq, // Freq of letters avail. letter_freq[0] is # of A's we have
vector<string> & output_phrases, //output vector, to record all the phrases we make.
vector<string> & dictionary) {

    for(int i=0;i<dictionary.size();++i) {
        if(word_buildable(dictionary[i], letter_freq)) {
            string next_phrase = partial_phrase;
            if(next_phrase != "")
              next_phrase.append(" ");
            next_phrase.append(dictionary[i]);
            output_phrases.push_back(next_phrase);

            vector<int> new_letter_freq = letter_freq;
            for(int j=0;j<dictionary[i].length();++j)
              new_letter_freq[dictionary[i][j] - 'A']--;

            findAllPhrases(next_phrase, new_letter_freq, output_phrases, dictionary);
        }
    }
}

Obviously there’s lots of things you can optimize in this code snippit. But it gets the idea across. Here’s the same code but running with a sample dictionary and driver int main(), along with output. Sample code with output

Now that’s the simple solution, during the interview I had a more complex answer. We can do some preprocessing on the array. Since currently we have to iterate over the entire dictionary to find out which words fit, instead what we can do is squash all words which use the same letters into independent lists and cross off the entire list of those words at a time if we determine we can’t make words with those letters. For example: TOPS, STOP, POTS all can’t be built using the letters in “RODRIGO SALAZAR”, instead of checking each of these against my name, we can just check one of them and eliminate the other automatically. This is done by sorting the letters and using it as a key in a hashmap. This hashmap would map these keys to all words that can be made with this key. The preprocessing of the dictionary to hashmap would look like this (Note, c++ map is actually a treemap, but the idea holds still):

 

map<string, vector<string> > preprocess(vector<string> dictionary)
{
    map<string, vector<string> > table;
    for(int i=0;i<dictionary.size();++i)
    {
        string sorted_word = dictionary[i];
        sort(sorted_word.begin(), sorted_word.end());
        if(table.count(sorted_word))
          table[sorted_word].push_back(dictionary[i]);
        else
          table[sorted_word] = vector<string>(1, dictionary[i]);
    }
    return table;
}

Now instead of iterating over the entire dictionary, we iterate over the keys of this map (using iterators of course), check if we have the letters we need, if so then we use all the words in the vector<string> value that it has and add it to our partial phrase!

So if we call our dictionary size D and the number of letters in our name N, what is the time complexity of this algorithm? I’m sure it’s terrible but what is it exactly?

If you thought this article was useful or interesting, go ahead and share/like it on your favorite social network!

Little-Known Awesome Algorithms: Fenwick Trees – Rapidly Find Cumulative Frequency Sums

Contrived Use Case

Imagine that we are creating an analytics tool for a financials brokerage, where orders are coming in all the time. Let’s say we’re interested in knowing how many orders fall between a certain value (a range query), in real-time. Lets say that orders can fall between $0.01 and $5M, and we want to let our clients query at any given time how many orders have gone through so far that fall between any $x and $y.

Order price frequency range

 

A fairly simple implementation of this would just mark down the price of each incoming order in a price array / container, and when a range query comes through we would do a simple iteration from $x to $y and sum the range together. If we want to find the range of orders that fall between $9.49 and $250 Thousand, we have to iterate from 949 pennies to 25 Million pennies (Since our data precision is to the penny), this can become a problem for some even more troublesome range queries. A solution to this is a Fenwick Tree, also known as a Binary Indexed Tree. Using this very clever data structure, we can sum the order price frequency between 949¢ and 25 million ¢ in Log time, which for a range of size 25 Million is almost nothing. Also note: Since new data is coming in all the time, a pre-calculation is not possible (unless you have a method of updating it, which this data structure happens to have as well).

The Algorithm

A Fenwick Tree, also known as a Binary Indexed Tree, allows you to calculate the sum of elements in an array for any given index range in O(Log(N)) time, updates on the array are slowed from O(1) to O(Log(N)). If we chose to go with the standard ‘naive’ implementation we would have O(N) range reads and O(1) writes/updates, calculating the sum for a given index range would involve iterating over every single element in the range and adding them up.

You may be familiar with the fact that any number can be broken down into its constituent powers of two. We want to take advantage of this fact and use it to be able to represent a range by summing several ‘subfrequencies’.

We will be representing our tree in the form of an array, similar to how a heap is implemented but with a completely different indexing scheme. We want to assign each element of this array a range that it is responsible for. Lets say we are interested in finding out the cumulative frequency sum for the range 1..7, there is a very clever way of assigning range responsibilities such that we can use the position of every ’1′ in the binary representation of 7 to find out which parts of the tree are responsible for the ranges we are interested in.

Range responsibilities are assigned as follows:

  • Let ctz() represent the position of the last 1 bit in a given number, also known as ‘count trailing zeroes’. ex. ctz(12) = 2, since 12 in binary is 1100b and the last ‘set’ bit is in position 2 (from the right, 0-indexed)
  • Index i in our tree/array is responsible for the range (i – 2^ctz(i) , i]
  • Our tree array is 1-based, tree[0] does not represent anything.

Examples:

  • 7 in binary is 111b, so it is responsible for the range (110b, 111b], or (6,7]  … only itself, 7.
  • 6 in binary is 110b, so it is responsible for the range (100b, 110b], or (4, 6] … the numbers 5,6
  • 4 in binary is 100b, so it is responsible for the range (000b, 100b] or (0, 4] … the numbers 1,2,3,4
  • Notice that tree[7] + tree[6] + tree[4] = cumulative frequency sum of 1..7, and that we acquired this by continuously getting rid of the last set bit in our binary representation.

Such a responsibility tree ends up looking like this (With how the range 1..7 is interpreted):

 

Tree of responsibilities

Tree of responsibilities

 

Given this sort of tree structure here’s how we would implement finding the cumulative frequency sum between x and y. Notice we can only find ranges starting at 1, so we take the difference between 2 ranges. To find the range between 10000 and 20000 with the following code: read(20000) – read(10000).

int read(int idx) {
  int sum = 0;
  while (idx > 0){
    sum += tree[idx];
    idx -= (1 << ctz(idx));
  }
  return sum;
}

Note: (1 << ctz(idx)) can be greatly optimized, what we’re really doing in that line is removing the least sig figure in idx.

Updating The Tree

Remember that we still have to be able to increment certain price frequencies as they come in, so we need the ability to update this tree. Lets say an order comes in for $25.50, we want to increment the frequency of 2550 in our tree by 1. This means we need to visit every single index whose responsibility range contains 2550 and increment its value by 1. (Or to put it in terms of the graph above, every index whose responsibility bar covers 2550.) Every index whose responsibility range contains 2550 is greater than 2550 (and 2550 itself).

We want to find the very next index greater than 2550 which whose responsibility range also contains 2550, increment its value by 1 and then repeat. Since the number 2550 is just too large to explain an example with, lets assume we’re working with the number 1. If you inspect the table below, you’ll notice that range size of a number is dependent specifically on it’s ctz() value. Also, notice the path that we follow to update the value for 1, we continue to higher values who have higher ctz, we find the next value by adding (1 << ctz(i)).

Responsibility Tree - CTZ Columns

Responsibility Tree – CTZ Columns

The code to update the tree is very similar to reading the tree:
Note: We stop going up the tree once we go beyond our maximum, which in our contrived example would be 5 Million dollars worth of pennies.

void update(int idx ,int val){
  while (idx <= MaxVal){
    tree[idx] += val;
    idx += (1 << ctz(idx));
  }
}

So that’s it, now you can read and update a Fenwick tree. Surprisingly the total lines of code is ~15.

If you found this article interesting/cool, go ahead and share or like it on your favorite social network!

Other article in this series: Adversary Discovery

Sources:
Original Research Paper
Topcoder