Tutorial exercises

1.11. Tutorial exercises#

We again use the wellbeing dataset, to practice running permutation tests.

1.11.1. Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf

1.11.2. Import and view the data#

wb = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/WellbeingSample.csv')
wb

	ID_code	College	Subject	Score_preVac	Score_postVac
0	247610	Lonsdale	PPE	60	35
1	448590	Lonsdale	PPE	43	44
2	491100	Lonsdale	engineering	79	69
3	316150	Lonsdale	PPE	55	61
4	251870	Lonsdale	engineering	62	65
...	...	...	...	...	...
296	440570	Beaufort	history	75	70
297	826030	Beaufort	maths	52	49
298	856260	Beaufort	Biology	83	84
299	947060	Beaufort	engineering	62	65
300	165780	Beaufort	PPE	48	56

301 rows × 5 columns

1.11.3. Questions#

Test the following hypotheses:#

Wellbeing scores pre- and post-vac are correlated in engineering students
There is a difference in the wellbeing scores of PPE students between Beaufort or Lonsdale (before the vacation)?
Wellbeing over all students increases across the vacation

Slightly harder one:#

Wellbeing increases more across the vacation for Beaufort students than Lonsdale students

Detailed Instructions#

In each case 1-4, you will need to decide what to do, carry it out and and write it up:

a. Hypotheses

what is our null hypothesis
what is our alternative hypothesis?

Is it a paired or unpaired test for difference of means, or a correlation test?

therefore which permutation_type is needed, samples, pairings or independent?

Is it a one- or two-tailed test?

therefore which alternative hypothesis type is needed, two-sided, greater or less?

What $α$ value will you use?

what value must $p$ be smaller than, to reject the null hypothesis?
this is the experimenter’s choice but usually 0.05 is used (sometimes 0.001 or 0.001)

b. Test statistic and descriptive statistics

What is your test statistic?

Report appropriate descriptive statstics and plot the data (you should choose an appropriate plot type)

c. Carry out the permutation test

Carry out the test. Plot the null distribution. Report the $p$ -value.

d. Report your conclusion

Will you reject the null hypothesis, or fail to reject it? What is your cnclusion in plain English?

e. Finally, write it up

In each case, include a final cell in which you write the test up as if for a journal article