1.6. Binomial PMF and CDF#
In this chapter we explored how we would simulate a binomial variable (
There is also an analytical solution to calculating the expected number of hits if
In practice, you don’t need to use the actual equations yourself, as there are built in functions in scipy.stats
that do it for you. However it is desirable to understand conceptually where the equations ‘come from’ (how they are derived), as covered in the lecture.
However you may wish to use the PMF (in Python) to get the probability of a certain value of
1.6.1. Analytical vs numerical solutions#
In this section we see The PMF and CDF are worked out from an equation rather than by random sampling
Therefore the probability values (eg
) given bystats.binom.pmf()
andstats.binom.cdf()
never change (for given values of )In contrast the values given by our simluations (how many random samples
) did vary slightly each time we ran our simulation
1.6.2. Set up Python libraries#
As usual, run the code cell below to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
1.6.3. Binomial PMF#
The probability mass function (PMF) tells us the probability of obtaining
The equation for the binomial PMF is as follows:
where
… is an expression that accounts for the fact that a even balance of hits and misses is more likely (as discussed in the lecture)
Note unsurprisingly, many people find the notation
(which is standard in statistical theory) confusing. In this case is a variable (the number of hits) and is a specific value of it - so in a coin-tossing example, where we are interested in the chance of getting 8 heads out of 10 tosses, =number of heads and =8; we read as “the probability the number of heads is 8”
PMF (home-baked)#
Let’s implement the equation for the PDF to get the exact probability that k==8, and compare to the results to the proportion of 10,000 reps in which k==8 (as on the previous worksheet)
n=10 # values of n, p, k as in previous exercise
k=8
p=0.5
n_choose_k = np.math.factorial(n) / (np.math.factorial(k) * np.math.factorial(n-k))
prob_k = ((p)**(k)) * ((1-p)**(n-k)) * n_choose_k
print(prob_k)
0.0439453125
Hopefully this should be roughly the same as the proportion of our 10,000 coin tosses in which k=8 (it wont exactly match, as the simulation was subject to random noise) - let’s check!
k = np.random.binomial(10, 0.5, size=10000)
np.mean(k==8)
0.0445
yep, not a bad match!
PMF (built-in function)#
We can also use a built-in function to give the PDF:
stats.binom.pmf(8,10,0.5)
0.04394531250000004
Comprehension questions
Can you work out how to change this to get the probability of 7 heads out of 12 coin tosses?
# your code here!
Can you change the code so that the probability of heads is 0.75?
# your code here!
1.6.4. Binomial CDF#
The CDF or cumulative distribution function tells us the probability of obtaining less than or equal to
As we have seen, we often want to know this cumulative value - for example if we want to know if a coin is fair, and have observed 8 heads out of 10 coin tosses, we would ask how likely, with a fair coin, we would get a value as extreme as
pmf = stats.binom.pmf(range(11),10,0.5)
cdf = stats.binom.cdf(range(11),10,0.5)
plt.figure(figsize=(8,4))
plt.subplot(1,2,1)
plt.plot(range(11), pmf, 'k.-')
plt.xlabel('k')
plt.ylabel('probability of $k$')
plt.subplot(1,2,2)
plt.plot(range(11), cdf, 'k.-')
plt.xlabel('k')
plt.ylabel('cumulative probability')
plt.tight_layout()
plt.show()

So we have:
PMF(8) is the probability of obtaining exactly 7 heads ie
CDF(8) is the probability of obbtaining 8 or fewer heads, ie
Area under the PMF sums to 1#
Noting that the probabilities of all the possible outcomes must sum to 1, we can also say:
CDF(10) = 1 because we always get 10 or fewer heads in 10 coin tosses
The function CDF always gives us the area under the curve to the left of a given value, for example CDF(7) gives us
If we want to know the probability of getting more than (say) 7 heads, we use the fact that the area under the curve sums to 1, so
= = = 1-cdf(7)
Careful here about >=
vs >
etc. Since
= and =
the function stats.binom.cdf(x)
gives us the probability
Comprehension questions
You will need to choose the correct function (
stats.binom.pmf()
orstats.binom.cdf()
and the right values of n,p,k to match the numerical answer given in the comments of each cell
# Find the probability of exactly 9 heads out of 12 coin tosses,
# when the probability of heads is 0.75
# answer = 0.2581
# Find the probability of exactly 5 tails out of 6 coin tosses,
# when the probability of heads is 0.6
# answer = 0.0386
# Find the probability of at least 7 heads out of 20 coin tosses,
# when the probability of heads is 0.55
# answer = 0.9785
# Find the probability of fewer than 10 heads out of 20 coin tosses,
# when the probability of heads is 0.4
# answer = 0.7553
# Find the probability of more than 10 tails out of 15 coin tosses,
# when the probability of heads is 0.3
# answer = 0.515