Imagine you have a group of N people and you want all of them to know your secrets. The problem is that not everyone is trust-worthy, in fact, let’s estimate that K of them are not trust-worthy. How can you discover which of these people are the snitches or trouble-causing ones so you can exclude them from your list of people you can trust?
A trivial solution to this problem might be immediately clear to you! Just tell every single one a different secret and see which secrets get compromised. This means that you will have to reveal N secrets to finish your discovery. You don’t have that many secrets, I hope.
Problem: How can you find every adversary while attempting to minimize the number of secrets you divulge (or in a more general sense, the number of trials you run)?
Before we continue on to a simple solution for this problem, lets consider a possible real-world use-case for this algorithm. Imagine we have N servers hooked together where the entire system fails if one of them fails. We estimate that there are K bad servers. If you let the system run for a whole day it will crash if it contains a bad server (adversary server). How can you find the servers which are bad? The trivial solution is to just run each server on its own for a day and check which ones crashed. (Lets assume we can only have one ‘system’ running at a time, so testing N servers individually would take N days). How would you find the bad servers in the minimum number of days?
Both problems described are analogous (except for one difference that I’ll describe later).
Imagine that we only have 1 adversary in our set of N. How would you approach finding a single adversary? One thing to note is that in this problem you always have a way to ‘test’ if a given set contains an adversary. In the contrived example it was waiting to see if the secret gets out into the general public, and in the server example it was waiting to see if the system crashed. We can call this a confidence test, if a group passes a confidence test then it does not contain an adversary.
An easy way to approach this is with a recursive binary search approach. Split the N people in half and tell them each a different secret (apply the confidence test on both). Once one of the secrets gets out into the public, you now know which group the adversary belongs to (or in the server example, if one of the systems crashes). If you repeat this process you can detect the adversary in O(log(N)) time.
Now to extrapolate this solution to finding K adversaries we can split the N people into N/K buckets. Now you can test each bucket and split it in 2 only if you find that it fails your confidence test. Once this completes you will have discovered every adversary in a fairly speedy way! Let’s look at the time complexity of this.
In the worst case, we find ourselves with an adversary in each of our N/K buckets. This means that each one of this buckets will be split in 2 repeatedly until we single out the adversary (or adversaries). This splitting procedure takes Log(N/K) time and since there are K buckets we do this procedure K times. You can think of the time complexity as K*Log(N / K).
But how many secrets do we divulge (or how many days do we test the servers)? We initially divulge K secrets (one to each bucket) and then divulge two more every time we split a bucket. The maximum amount of secrets we divulge with this procedure is K + K*Log(N / K).
A More Involved Solution Exists
One huge optimization to this procedure is the fact that in certain cases of this problem, you can re-use secrets that you give to people in different branches/buckets (Note: The servers problem does not have this property). Taking this into account it is possible to reach a maximum of only log(N)/log(log(N)) secrets divulged.
If you found this article interesting/cool, go ahead and share or like it on your favorite social network!
Other article in this series: Fenwick Range Trees
Sources: Fighting Censorship with Algorithms
Thanks to ‘Paint Online’ for the awesome online paint I used to create the diagram.