2.7. Tutorial Exercises 1: Probability Jargon in Python#

In this section we will revise the terms for combinations of events and how they relate to frequencies in a pandas dataframe.

You should be able to answer the following questions with the help of the pandas function query (to find the rows matching some criterion) and the function len(), which finds the length of the dataframe within the parentheses.

2.7.1. Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf

2.7.2. Event combinations#

Let’s work with the (made up) data on students from Beaufort and Lonsdale college.

wb = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/WellbeingSample.csv')
wb
ID_code College Subject Score_preVac Score_postVac
0 247610 Lonsdale PPE 60 35
1 448590 Lonsdale PPE 43 44
2 491100 Lonsdale engineering 79 69
3 316150 Lonsdale PPE 55 61
4 251870 Lonsdale engineering 62 65
... ... ... ... ... ...
296 440570 Beaufort history 75 70
297 826030 Beaufort maths 52 49
298 856260 Beaufort Biology 83 84
299 947060 Beaufort engineering 62 65
300 165780 Beaufort PPE 48 56

301 rows × 5 columns

a) Plot the data

First of all, plot the number of students taking each subject at each college, using sns.countplot()

# Your code here

b) Probability of college membership

Let’s start by working out the probability that a student picked from this sample is at each college

  • Let \(B\) be the event that a randomly chosen student is a member of Beaufort College

  • Let \(L\) be the event that a randomly chosen student is a member of Lonsdale College

What are the values of \(p(B)\) and \(p(L)\)?

#p(B)
# Your code here
#p(L)
# Your code here

c) Joint probability

  • Let \(PPE\) be the event that a randomly chosen student is studying PPE

Find \(p(B \cap PPE)\)

# Your code here

d) Union

  • Let \(Psy\) be the event that a randomly chosen student is studying psychology

  • Let \(Bio\) be the event that a randomly chosen student is studying biology

Find \(p(Psy \cup Bio)\)

# Your code here

e) Conditional probability

  • Let \(Bio\) be the event that a randomly chosen student is studying Biology

  • Let \(Hist\) be the event that a randomly chosen student is studying history

What is \(p(L|Bio)\)?

# p(L|Bio)
# Your code here

What is \(p(L|Hist)\)?

# p(L|Hist)
# Your code here