[Latexpage]
Estimating binomial using normal: This makes use of the DeMoivre-Laplace Theorem: essentially, what this theorem says is that under certain conditions (see below) a binomial probability distribution with parameters n, p is very close to a normal probability density with parameters $\mu = n\theta$, $\sigma = \sqrt{n\theta(1-\theta)}$. In the past this was used to estimate binomial probabilities for computational purposes: this has become less important as computers are used to perform such computations now. The importance of this theorem is that it will be used in the estimation techniques when we do inferential statistics (coming right up!)
The conditions required are that n should be large and p should not be too close to either 0 or 1. A common rule of thumb is to require that np and n(1-p) both be at least 5, but 10 is a better choice for the cutoff – see the computation below.
————–
Here is the example that I worked in class estimating binomial by normal: obviously we do not need to do this estimation since we can easily compute the binomial probability itself. The example is merely to illustrate a little tricky part to the method, and also to show that this gives excellent estimates when we meet the conditions of the rule of thumb given above, especially if we stay away from the lower bound 5 for the mean and “failure mean”.
Suppose we flip a coin 20 times. This is an unbalanced coin, so that the probability of heads showing is 40%. What is the probability that we will get heads exactly 6 times?
This is binomial: $P(X = 6) = b(6; 20, 0.4) = \binom{20}{4}(0.4)^{6}(0.6)^{14} \approx 0.1244$
Check that it is OK to approximate it by a normal probability:
Is $\mu = n\theta \ge 5$? $n\theta = 20(0.4) = 8$, yes.
Is $n(1-\theta) \ge 5$? $n(1-\theta) = 20(0.6) = 12$, yes.
Now to estimate it, we have to be careful because we are using a continuous density to estimate a discrete probability distribution! So if you think about estimating the area of the bar of the probability distribution histogram by the area under a normal curve, you can see that we need to find the normal probability that X is between 5.5 and 6.5 (the edges of the bar). We estimate by a normal probability with mean $\mu = 8$ and standard deviation $\sigma = \sqrt{8(0.4)(0.6)} = \sqrt{4.8}$.
Therefore $P(X=6) \approx n(5.5<X<6.5; 8, \sqrt{4.8})$
$\approx 0.1212$ by using the normal cdf function.
In R, we compute this by using pnorm(6.5, 8, sqrt(4.8)) – pnorm(5.5, 8, sqrt(8.8))
In your calculator, locate the normal cdf functions. Instructions for some calculators are linked on this page.
So the normal approximation gives us two decimal places which are correct, and the third decimal place is off.
This is good enough for many purposes, but not always good enough. Because the mean 8 is close to the cutoff 5, we are seeing the limits of this kind of estimate. The reason for using the higher cutoff of 10 for $np$ and $n(1-p)$ is to get more accuracy (a third decimal place).
Some more examples, with pictures of the graphs, are worked out here.