5.3. Simulated coin toss#
To get a feel for how likely different outcomes are, we are going to simulate the data generating process
Here is an example of how we could simulate the data generating process in real life:
To work out how likely we are to get 5 heads out of 10 coin tosses, we could…
- get a real coin (but who has cash on them these days?)
- assume it is fair (p = 0.5)
- toss it 10 times (because n = 10)
- count the number of heads (k)
Then we could repeat that whole process many times (say, 100 times) and count how often we get exactly 5 heads.
Or…. we could get the computer to do that.
Yes, let’s get the computer to do it. That will be less hassle.
Set up Python libraries#
As usual, run the code cell below to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas
import seaborn as sns
sns.set_theme() # use pretty defaults
Simulate a single coin toss#
The computer doesn’t really toss a coin.
It does someting mathematically equivalent, namely generates a random number called x and applies a test to it that will give a “hit” a certain proportion of the time, defined by p.
If the outcome is a hit, the value of the variable hit is set to 1, otherwise hit is set to zero
Try running the code block below several times and see if you understand what it does.
# generate a random number between 0 and 1
x = np.random.uniform(0,1)
print('value of random number: ' + str(x))
value of random number: 0.5162312743689347
What happened?
We used numpy’s random number generator (np.random) to get a number (with decimal places) between 0 and 1. The numbers are drawn from a uniform distribution, which means that any number between 0 and 1 is equally likely.
Re run the code block above a few times - you should get a different random number every time.
How do we convert this to a virtual ‘coin toss’? We need to randomly generate “hits” and “misses” rather than decimal numbers.
To do this we simply add a piece of code that checks whether our random number is greater or less than some number - in this case 0.5, as we should get equal frequencies of random numbers greater than 0.5 and less than 0.5, thus simulating a fair coin.
# check if it is less than p - this should happen on a proportion of trials equal to p
p=0.5
if x>p:
hit = 1
else:
hit = 0
print('is it a hit?: ' + str(hit))
is it a hit?: 1
Simulate 10 coin tosses#
In our coin tossing example, we need to toss the coin 10 times (n = 10) and count how many hits we get (k = ?)
To do this we will create a loop to repeat the coin toss 10 times
for i in np.arange(10):
# generate a random number between 0 and 1
x = np.random.uniform(0,1)
print('value of random number: ' + str(x))
# check if it is less than p - this should happen on a proportion of trials equal to p
p=0.7
if x>p:
hit = 1
else:
hit = 0
print('is it a hit?: ' + str(hit))
value of random number: 0.4210884318657173
is it a hit?: 0
value of random number: 0.3482462517804953
is it a hit?: 0
value of random number: 0.5762467361839384
is it a hit?: 0
value of random number: 0.3865405524783163
is it a hit?: 0
value of random number: 0.0152430257489089
is it a hit?: 0
value of random number: 0.8740325040404152
is it a hit?: 1
value of random number: 0.09672483029956014
is it a hit?: 0
value of random number: 0.535443621287985
is it a hit?: 0
value of random number: 0.36706894290708625
is it a hit?: 0
value of random number: 0.48530473549159414
is it a hit?: 0
OK, well the output of that code block was not really user friendly.
Use an array to store the outcomes#
Now that we know how the virtual coin toss works, we can dispense with printing out the actual value of the random number x and just give the 10 binary outcomes (1/0 for hit/miss)
for i in np.arange(10):
# generate a random number between 0 and 1
x = np.random.uniform(0,1)
# check if it is less than p - this should happen on a proportion of trials equal to p
p=0.7
if x>p:
hit = 1
else:
hit = 0
print(hit)
1
0
1
0
0
1
0
0
0
0
… but we also want to count the number of hits, so we need to store the outcomes (0/1) in an array
outcomes = np.empty(10) # create an empty array to store the outcomes
for i in np.arange(10):
# generate a random number between 0 and 1
x = np.random.uniform(0,1)
# check if it is less than p - this should happen on a proportion of trials equal to p
p=0.7
if x>p:
hit = 1
else:
hit = 0
outcomes[i] = hit # store the valuee of 'hit' on this trial in place 'i' in the array 'outcomes'
print('outcomes = ' + str(outcomes))
outcomes = [0. 0. 1. 0. 0. 1. 1. 1. 0. 0.]
… and then we need to count the hits:
k = np.sum(outcomes)
print('k = ' + str(k))
k = 4.0
Try running the code above a few times and check you understand what is happening.
Note that the outcome values 0. and 1. have dots after them just because they are floating point numbers (numbers that could potentially have something after the decimal place rather than being rounded to the neareest whole number)
This looks a little awkward but it is just the Python way of writing 0.0 and 1.0 rather than 0 and 1
Use a built in function#
Simulating outcomes is actually something coders do a lot so there is a package for it in numpy, called numpy.random
numpy.random draws a random sample from a probability distriution (you have to tell it which distribution to use though - binomial, normal or many more). In this case, the number $k$ of heads in $n$ coin tosses follows the binomial distribution as introduced in the lecture:
$$ k \sim \mathcal{B}(n,p) $$
… where $n=10$ and $p=0.5$, ie
$$ k \sim \mathcal{B}(10,0.5) $$
We therefore use numpy.random.binomial
Let’s try - it makes the code a lot more compact
k = np.random.binomial(10,0.5) # generate 10 samples with a 0.5 chance of a hit, and return the number of hits
print('k = ' + str(k))
k = 3
The single line of code above does everything that our previous code block did (it doesn’t return the outcomes themselves, but we don’t actually need them, do we?)