1.9. Python Exercises#
In this section we will revise the terms for combinations of events and how they relate to frequencies in a pandas
dataframe.
You should be able to answer the following questions with the help of the pandas
function query
(to find the rows matching some criterion) and the function len()
, which finds the length of the dataframe within the parentheses.
1.9.1. Set up Python libraries#
As usual, run the code cell below to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
1.9.2. Event combinations#
Let’s work with the (made up) data on students from Beaufort and Lonsdale college.
wb = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/WellbeingSample.csv')
wb
ID_code | College | Subject | Score_preVac | Score_postVac | |
---|---|---|---|---|---|
0 | 247610 | Lonsdale | PPE | 60 | 35 |
1 | 448590 | Lonsdale | PPE | 43 | 44 |
2 | 491100 | Lonsdale | engineering | 79 | 69 |
3 | 316150 | Lonsdale | PPE | 55 | 61 |
4 | 251870 | Lonsdale | engineering | 62 | 65 |
... | ... | ... | ... | ... | ... |
296 | 440570 | Beaufort | history | 75 | 70 |
297 | 826030 | Beaufort | maths | 52 | 49 |
298 | 856260 | Beaufort | Biology | 83 | 84 |
299 | 947060 | Beaufort | engineering | 62 | 65 |
300 | 165780 | Beaufort | PPE | 48 | 56 |
301 rows × 5 columns
a) Plot the data
First of all, plot the number of students taking each subject at each college, using sns.countplot()
# Your code here
b) Probability of college membership
Let’s start by working out the probability that a student picked from this sample is at each college
Let \(B\) be the event that a randomly chosen student is a member of Beaufort College
Let \(L\) be the event that a randomly chosen student is a member of Lonsdale College
What are the values of \(p(B)\) and \(p(L)\)?
#p(B)
# Your code here
#p(L)
# Your code here
c) Joint probability
Let \(PPE\) be the event that a randomly chosen student is studying PPE
Find \(p(B \cap PPE)\)
# Your code here
d) Union
Let \(Psy\) be the event that a randomly chosen student is studying psychology
Let \(Bio\) be the event that a randomly chosen student is studying biology
Find \(p(Psy \cup Bio)\)
# Your code here
e) Conditional probability
Let \(Bio\) be the event that a randomly chosen student is studying Biology
Let \(Hist\) be the event that a randomly chosen student is studying history
What is \(p(L|Bio)\)?
# p(L|Bio)
# Your code here
What is \(p(L|Hist)\)?
# p(L|Hist)
# Your code here