TLDR:

If you play n games, and find k keys, and assume that you don't believe Blizzard's claims about drop rates, the Bayesian estimate of the drop rate is:

(1+k)/(2+n)

Beyond that, I also show how to calculate the probability that the drop rate is or is not within a particular range (i.e whats the chance the rate actually is between 0.2 and 0.4 as Blizzard claims for mp 3, given what you have observed).

Estimating key drop rates is a text-book example of Bayesian statistics in what is called the "Beta-Binomial" model.

---------------

In this post, I discuss the statistics of estimating key drop rates. This seems like a subject of interest to the players of diablo 3, so hopefully this post can help to clarify the discussion. I am a graduate student in computer science/statistics.

We will assume a player sets mp 3 (for example), then collects 5 stacks of NV, and kills the key warden, then records whether a key dropped or not. We will repeat this process over multiple games. First, note that the outcome from each game is independent of every other games (what happens in one game doesn't depend on what happened in any other games).

Let us define some random variables and parameters:

n = the number of games played (a known constant)

k = the number of keys found out of n games (a random variable, which we observe through the experiment)

p = the probability of a key dropping (an unknown parameter that we are trying to estimate)

An appropriate statistical model for this experiment is to say that

k ~ Binomial(n, p)

In words, the distribution of k is binomial with parameters n and p. For more info on a Binomial distribution, check out Wikipedia.

A simple way of estimating p which most people will find intuitive is to use a maximum likelihood estimator. That means we will predict the value p that gives the highest probability to the data we observe. The maximum likelihood estimator for a binomial distribution is

p_MLE = k / n

Its pretty simple; count the number of keys found and divide by the number of games- that is the frequentist/maximum likelihood estimate of the key drop rate.

Maximum likelihood is not the only way to estimate parameters. Another approach is to use Bayesian statistics. The advantage is it better accounts for issues like (1) estimation based on a small number of game, (2) incorporation of prior knowledge or belief about key drop rate, and (3) it gives a range of probable values rather than a single most probable value for the drop rate. So without further ado, lets get into the Bayesian analysis of key drop rates.

In Bayesian statistics, we typically associate a prior distribution with unknown parameters that represents our belief before running the experiment. In this case, an appropriate prior is:

p ~ Uniform(0,1)

This prior represents the belief that any value of p in the range (0,1) is equally likely before running the experiment (in other words, we don't believe Blizzard's claim that p = mp/10). If we wanted to do an analysis where we instead assume Blizzard is probably telling the truth, we might assume instead that

p ~ Normal(mean=mp/10, stddev=0.1)

This kind of prior roughly expresses the belief that the true drop rate is within +/- 0.1 of what they claim.

Generally speaking, it makes our calculations simpler if we assume that

p ~ Beta(a,b)

and note that when a = b = 1, the Beta(a,b) distribution is the same as the Uniform(0,1) distribution. So we will assume p~Beta(a=1,b=1).

Ok so lets take a step back- again we are going to play n games and we will observe that we collected k keys. Now we will see what Bayesian statistics has to say about the key drop rate p.

According to Bayes theorem and some well known calculations (see "Beta-Binomial Conjugate Prior"), the posterior distribution of p given k is:

p | k ~ Beta(1 + k, 1 + n - k)

To make this a bit more concrete, lets say I play 6 games and find 0 keys. Then k = 0, n = 6, and p | k ~ Beta(1, 7). Take a look at some analysis of this distribution through Wolfram Alpha: http://www.wolframalpha.com/input/?i=beta+distribution%2C+alpha%3D1%2C+beta%3D7

If you look at the "Plot of PDF" that is showing on the x-axis, a hypothetical key drop rate, and on the y-axis, the probability at that key rate (note the rate is a continuous parameter, so the probability of any particular value is 0; it is only meaningful to talk about the probability that it falls within a particular range, which is equal to the area under this plot through this range; that is why it is OK to have a number greater than 1 on the y-axis of this graph).

There is more relevant information on this page. For example, the mean is listed at 1/8 = 0.125. The mean refers to the mean of the posterior distribution, which is a beta distribution. For any beta distribution Beta(a, b), the mean is a/(a+b), therefore:

Suppose you play n games and find k keys. Then the Bayesian posterior estimate of the key drop rate (assuming a uniform prior) is:

p_posterior_mean = (1+k)/(2+n) (TLDR: this is the estimated key drop rate in a Bayesian analysis assuming a uniform prior)

Lets rage an example:

k = 0, n = 6 ----> p_posterior_mean = 1/8

Notice that is the same result we got on Wolfram Alpha. It is also worth comparing our estimate of the key rate using Bayesian statistics to the simpler maximum likelihood estimator. Recall

p_MLE = k/n

Notice that for large values of k and n, the +1 and +2 terms in p_posterior_mean are dominated by k and n. This means if you run enough repetitions, you will end up with very similar results in either analysis. Also note that if n = 0 and k = 0 (i.e. we haven't played a game yet), then p_posterior_mean = 1/2. This is a consequence of our assumed prior p~Uniform(0,1) - it is like saying that before we see any results, we believe the drop rate is 50-50.

Wrapping up, lets do one more analysis: what is the probability that Blizzard is stating the wrong values for key drop rates? Go back to the Wolfram Alpha page and look at the "Probability density function" panel. For our example here, it says:

7(1-x)^6 if 0<x<1

Note that x is actually p. Suppose we want to know the probability that the true key drop rate is between 0.2 and 0.4, given the observation of 0 keys in 6 games. You can compute this probability with Wolfram Alpha by entering

integral(7(1-p)^6, 0.2, 0.4)

The result is 0.181722. Thus given these observations, the posterior probability is 0.818278 that the key drop rate is NOT between 0.2 and 0.4.

Note that n = 6, k = 0, at mp3 (on act 3 not that it should matter) is my observation over the last couple of days. This is a very small sample and certainly not enough to draw a strong conclusion. However, I hope this analysis will be helpful to others on this form as we continue to asses key drop rates. I invite others to check my analysis and run through the calculations with their own numbers. If I have made a statistical mistake, feel free to point it out. Also, if anyone is feeling really savvy, it would be reasonable to do this calculation with a prior that this more generous to Blizzard than uniform.

If you play n games, and find k keys, and assume that you don't believe Blizzard's claims about drop rates, the Bayesian estimate of the drop rate is:

(1+k)/(2+n)

Beyond that, I also show how to calculate the probability that the drop rate is or is not within a particular range (i.e whats the chance the rate actually is between 0.2 and 0.4 as Blizzard claims for mp 3, given what you have observed).

Estimating key drop rates is a text-book example of Bayesian statistics in what is called the "Beta-Binomial" model.

---------------

In this post, I discuss the statistics of estimating key drop rates. This seems like a subject of interest to the players of diablo 3, so hopefully this post can help to clarify the discussion. I am a graduate student in computer science/statistics.

We will assume a player sets mp 3 (for example), then collects 5 stacks of NV, and kills the key warden, then records whether a key dropped or not. We will repeat this process over multiple games. First, note that the outcome from each game is independent of every other games (what happens in one game doesn't depend on what happened in any other games).

Let us define some random variables and parameters:

n = the number of games played (a known constant)

k = the number of keys found out of n games (a random variable, which we observe through the experiment)

p = the probability of a key dropping (an unknown parameter that we are trying to estimate)

An appropriate statistical model for this experiment is to say that

k ~ Binomial(n, p)

In words, the distribution of k is binomial with parameters n and p. For more info on a Binomial distribution, check out Wikipedia.

A simple way of estimating p which most people will find intuitive is to use a maximum likelihood estimator. That means we will predict the value p that gives the highest probability to the data we observe. The maximum likelihood estimator for a binomial distribution is

p_MLE = k / n

Its pretty simple; count the number of keys found and divide by the number of games- that is the frequentist/maximum likelihood estimate of the key drop rate.

Maximum likelihood is not the only way to estimate parameters. Another approach is to use Bayesian statistics. The advantage is it better accounts for issues like (1) estimation based on a small number of game, (2) incorporation of prior knowledge or belief about key drop rate, and (3) it gives a range of probable values rather than a single most probable value for the drop rate. So without further ado, lets get into the Bayesian analysis of key drop rates.

In Bayesian statistics, we typically associate a prior distribution with unknown parameters that represents our belief before running the experiment. In this case, an appropriate prior is:

p ~ Uniform(0,1)

This prior represents the belief that any value of p in the range (0,1) is equally likely before running the experiment (in other words, we don't believe Blizzard's claim that p = mp/10). If we wanted to do an analysis where we instead assume Blizzard is probably telling the truth, we might assume instead that

p ~ Normal(mean=mp/10, stddev=0.1)

This kind of prior roughly expresses the belief that the true drop rate is within +/- 0.1 of what they claim.

Generally speaking, it makes our calculations simpler if we assume that

p ~ Beta(a,b)

and note that when a = b = 1, the Beta(a,b) distribution is the same as the Uniform(0,1) distribution. So we will assume p~Beta(a=1,b=1).

Ok so lets take a step back- again we are going to play n games and we will observe that we collected k keys. Now we will see what Bayesian statistics has to say about the key drop rate p.

According to Bayes theorem and some well known calculations (see "Beta-Binomial Conjugate Prior"), the posterior distribution of p given k is:

p | k ~ Beta(1 + k, 1 + n - k)

To make this a bit more concrete, lets say I play 6 games and find 0 keys. Then k = 0, n = 6, and p | k ~ Beta(1, 7). Take a look at some analysis of this distribution through Wolfram Alpha: http://www.wolframalpha.com/input/?i=beta+distribution%2C+alpha%3D1%2C+beta%3D7

If you look at the "Plot of PDF" that is showing on the x-axis, a hypothetical key drop rate, and on the y-axis, the probability at that key rate (note the rate is a continuous parameter, so the probability of any particular value is 0; it is only meaningful to talk about the probability that it falls within a particular range, which is equal to the area under this plot through this range; that is why it is OK to have a number greater than 1 on the y-axis of this graph).

There is more relevant information on this page. For example, the mean is listed at 1/8 = 0.125. The mean refers to the mean of the posterior distribution, which is a beta distribution. For any beta distribution Beta(a, b), the mean is a/(a+b), therefore:

Suppose you play n games and find k keys. Then the Bayesian posterior estimate of the key drop rate (assuming a uniform prior) is:

p_posterior_mean = (1+k)/(2+n) (TLDR: this is the estimated key drop rate in a Bayesian analysis assuming a uniform prior)

Lets rage an example:

k = 0, n = 6 ----> p_posterior_mean = 1/8

Notice that is the same result we got on Wolfram Alpha. It is also worth comparing our estimate of the key rate using Bayesian statistics to the simpler maximum likelihood estimator. Recall

p_MLE = k/n

Notice that for large values of k and n, the +1 and +2 terms in p_posterior_mean are dominated by k and n. This means if you run enough repetitions, you will end up with very similar results in either analysis. Also note that if n = 0 and k = 0 (i.e. we haven't played a game yet), then p_posterior_mean = 1/2. This is a consequence of our assumed prior p~Uniform(0,1) - it is like saying that before we see any results, we believe the drop rate is 50-50.

Wrapping up, lets do one more analysis: what is the probability that Blizzard is stating the wrong values for key drop rates? Go back to the Wolfram Alpha page and look at the "Probability density function" panel. For our example here, it says:

7(1-x)^6 if 0<x<1

Note that x is actually p. Suppose we want to know the probability that the true key drop rate is between 0.2 and 0.4, given the observation of 0 keys in 6 games. You can compute this probability with Wolfram Alpha by entering

integral(7(1-p)^6, 0.2, 0.4)

The result is 0.181722. Thus given these observations, the posterior probability is 0.818278 that the key drop rate is NOT between 0.2 and 0.4.

Note that n = 6, k = 0, at mp3 (on act 3 not that it should matter) is my observation over the last couple of days. This is a very small sample and certainly not enough to draw a strong conclusion. However, I hope this analysis will be helpful to others on this form as we continue to asses key drop rates. I invite others to check my analysis and run through the calculations with their own numbers. If I have made a statistical mistake, feel free to point it out. Also, if anyone is feeling really savvy, it would be reasonable to do this calculation with a prior that this more generous to Blizzard than uniform.

Edited by Afar#1572 on 12/10/2012 7:15 PM PST