Each day, you must make a binary decision: to buy or sell a stock, whether to bring an umbrella to work, whether to take surface streets or the freeway on your commute, etc. Each time you make the wrong decision, you are penalized. Fortunately you have help in the form of $$n$$ “expert” advisors each of whom suggest a choice for you. However, some experts are more reliable than others, and you do not initially know which is the most reliable. How can you find the most reliable expert while incurring the fewest penalties? How few penalties can you incur compared to the best expert?

To make matters simple, first assume that the best expert never gives the wrong advice. How quickly can we find that expert?

Here is a simple procedure that allows us to find the best expert. Choose an expert $$a$$ arbitrarily on the first day, and follow their advice. Continue following $$a$$’s advice each day until they make a mistake. After your chosen expert errs, choose a different expert and follow their advice until they err. Continue this process indefinitely. Since we’ve assumed that the best expert never errs, you will eventually follow their advice.

Observe that in the above procedure, each day we either get the correct advice, or we eliminate one of the errant experts. Thus, the total number of errors we make before finding the best expert is $$n – 1$$. Is it possible to find the best expert with fewer penalties?

Consider the following “majority procedure.” Each day, look at all $$m$$ experts’ advice and take the majority opinion (choosing arbitrarily if there is a tie). At the end of the day, fire all incorrect experts, and continue the next day with the remaining experts. To analyze this procedure, observe that each day, either the majority answer is correct (and you are not penalized) or you fire at least half of the experts. Thus, the number of penalties you incur is at most $$\log n$$ before you find the best expert. That is pretty good!

Question 1 Is it possible to find the best expert while incurring fewer than $$\log n$$ penalties?

To this point, things have been relatively easy because we assumed that the best expert never errs. Thus, as soon as an expert makes an incorrect prediction, we know they are not the best expert. In practice, however, it is unlikely that the best expert is infallable. What if we are only guaranteed that the best expert errs at most $$m$$ times?

It is not hard to see that the majority procedure can be generalized to account for an expert that errs at most $$m$$ times. The idea is to follow the majority opinion, but to allow an expert to err $$m$$ times before firing them. We call the resulting procedure the “$$m + 1$$ strikes majority procedure.” Each day, follow the majority opinion (again breaking ties arbitrarily). Every time an expert is wrong, they receive a strike. If an expert ever exceeds $$m$$ strikes, they are immediately fired.

To analyze the $$m + 1$$ strikes procedure, observe that if the majority opinion is ever wrong, at least half of the remaining experts receive strikes. Since the total number of strikes of all remaining experts cannot exceed $$m n$$, after $$2 m + 1$$ majority errors, we know that at least half of the experts have been fired. Thus, we find the best expert with penalty at most $$(2 m + 1) \log n$$. (I suspect that a more clever analysis would give something like $$(m + 1) \log n$$, thus matching the majority procedure for $$m = 0$$.) Can we do better than the $$m + 1$$ strikes majority procedure?

One objection that one might have with the $$m + 1$$ strikes procedure is that an expert must err $$m + 1$$ times before any action is taken against them. Another reasonable approach would be to have a “confidence” parameter for each expert. Each time the expert errs, their confidence is decreased. When taking a majority decision, each expert’s prediction is then weighted by our confidence in that expert. If we adopt this strategy, we must specify how the confidence is decreased with each error.

Littlestone and Warmuth proposed the following “weighted majority algorithm” to solve this problem. Initially, each expert $$i$$ is assigned a weight $$w_i = 1$$, where a larger weight indicates a greater confidence in expert $$i$$. Each time expert $$i$$ errs, $$w_i$$ is multiplied by $$1/2$$. Each day, we take as our prediction the weighted majority–each expert $$i$$’s prediction carries $$w_i$$ votes, and we choose the option receiving the greatest sum of votes.

By way of analysis, consider what happens each time the majority answer is wrong. Let $$W_t$$ denote the total weight of all experts after $$t$$ penalties have been incurred. Observe that initially we have $$W_0 = n$$. When we incur the $$(t + 1)$$-st penalty, at least half of the $$W_t$$ total votes were cast in favor of the wrong answer. Since the total errant weight is cut in half, we find that $$W_{t + 1} \leq (3/4) W_t$$. Thus, $$W_t \leq (3/4)^t n$$. On the other hand, we know that some expert $$i$$ errs at most $$m$$ times. Therefore, $$2^{-m} \leq w_i \leq W_t$$ for all $$t$$. Combining this with the previous expression gives
$2^{-m} \leq W_t \leq (3/4)^t n.$
Taking logs of both sides of this expression, we find that
$t \leq \frac{(m + \log n)}{\log(4/3)}.$
It seems incredible to me that this simple and elegant algorithm gives such a powerful result. Even if we have initially no idea as to which expert is the best, we can still perform nearly as well as the best expert as long as $$m >> \log n$$. (Note that $$1 / \log(4/3) \approx 2.4$$.) Further, we do not even need to know what $$m$$ is in advance to apply this procedure!

Question 2 Is it possible to improve upon the bound of the weighted majority algorithm?

• Pietro Kreitlon Carolino

No clue about Q2, but some thoughts on Q1: great question! A naive hope would be that log(n) (base 2) would be optimal because of some “you need log(n) bits to distinguish between n experts”-type explanation. I’m thinking of m=0 here.

But what are the bits? On the one hand you get n bits on every round (the predictions of each expert), but on the other hand you only really get information when experts’ predictions are *different* — if everyone predicts the same, you get no info on who to discard! And also no penalty — so there is some tricky relationship here between penalties and bits of information.

Maybe you can do it by strong induction. For n=2 experts, you certainly can’t find the best expert with less than log(2) = 1 penalty. Now assume that for all k < n, you can't find the best expert with less than log(k) penalty, and consider the case of n experts.

Let x1,…,xn be the experts' (binary) predictions, and f(x1,…,xn) your "policy", ie what you do given the predictions (also binary). We have two cases:

(i) Sometimes your policy goes against the majority; or
(ii) You always go with the majority.

In case (i), consider an instance where you go against the majority. Then we can construct an adversarial scenario where the best expert was in the majority, and the majority was right. So you incur a penalty and eliminate at most n/2 experts. By induction we are done.

In case (ii), split the experts into 2 groups of size as close as possible. Have the best expert be in the slightly smaller group, or the group you don't pick in case of tie. Have the non-majority, non-picked group be right. Then you incur a penalty, and eliminate exactly, or almost exactly, n/2 experts. Finish by induction.

I think modulo some technicalities this proves you can't do m=0 in fewer than log(n) penalties. There may be a more elegant argument.