2.6. Changing and #
The binomial has two parameters,
What happens to the frequency of each value of k, if we change the probability of a hit
2.6.1. Set up Python libraries#
As usual, run the code cell below to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings
warnings.simplefilter('ignore', category=FutureWarning)
2.6.2. , probability of a hit#
Think back to our home-baked code to generate a random number with a probability
# check if it is less than p - this should happen on a proportion of trials equal to p
x = np.random.uniform(0,1)
p=0.5
if x>p:
hit = 1
else:
hit = 0
print(x)
print('is it a hit?: ' + str(hit))
0.9444967182525487
is it a hit?: 1
If we change the value of np.random.uniform()
, with those that match the criterion x<p
highlighted in red

can you see why we used
as a ‘hit’, rather than ?
Distribution of depends on #
But how does changing
Here is the code for the simulation again, now with
n=10
p=0.5
nReps = 10000
k = np.random.binomial(n,p, size=nReps)
sns.countplot(x=k)
plt.show()

What if we change
n=10
p=0.7
nReps = 10000
k = np.random.binomial(n,p, size=nReps)
sns.countplot(x=k, order=range(n+1))
# the argument 'order' is doing a similar job to 'bins' in a histogram
# here I am forcing sns to plot all the possible values of k from 0 to 10,
# even though some of them didn't occur in the simulation
plt.show()

You should notice after modifying the simulation so that
the most common value for k is 7, ie 7/10 hits.
The distribution gets skewed, as we can’t have more than 10/10 hits
Try some other values of
2.6.3. Mean of #
The expected value of
In the following code block, we generate 10000 random samples from the binomial distribution
Hopefully it should match!
n=10
p=0.7
nReps = 10000
k = np.random.binomial(n,p, size=nReps)
print('mean(k) = ' + str(k.mean()))
print('np = ' + str(n*p))
mean(k) = 6.9943
np = 7.0
2.6.4. , number of trials#
If we increase the number of trials to 100, what happens to the frequency distribution of
Here we modify the simulation so that
n=100
p=0.5
nReps = 10000
k = np.random.binomial(n,p, size=nReps)
sns.countplot(x=k, order=range(n+1))
# the argument 'order' is doing a similar job to 'bins' in a histogram
# here I am forcing sns to plot all the possible values of k from 0 to 100,
# even though some of them didn't occur in the simulation
plt.xlabel('k')
plt.xticks(range(0, n+1, 10));

We can see that the peak of the histogram is where we expect (the most common value of
2.6.5. Standard deviation of #
The standard deviation of
… where
In the following code block, we generate 10000 random samples from the binomial distribution
Hopefully it should match!
n=100
p=0.5
nReps = 10000
k = np.random.binomial(n,p, size=nReps)
print('std(k) = ' + str(k.std()))
print('sqrt(npq) = ' + str((n*p*(1-p))**0.5))
std(k) = 5.042392352842051
sqrt(npq) = 5.0
2.6.6. Standard deviation of #
We noted above that that spread of the distribution of
The proportion of hits is
where
This has the interesting consequence that for a given value of p, the standard deviation of the proportion of hits is proprtional to
In other words, as
in other words, my estimate of
but only in proportion to