10.8. Tutorial exercises#
We again use the wellbeing dataset, to practice running permutation tests.
Set up Python libraries#
As usual, run the code cell below to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas
import seaborn as sns
Colab users#
You need to use a more recent version of scipy.stats than the default. To do this run the following code block and after it has run, go to the menus at the top of colab and click runtime-->Restart Runtime
# Set-up Python libraries - you need to run this but you don't need to change it
!pip install scipy==1.10.0
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas
import seaborn as sns
Collecting scipy==1.10.0
Using cached scipy-1.10.0-cp39-cp39-macosx_10_15_x86_64.whl (35.2 MB)
Requirement already satisfied: numpy<1.27.0,>=1.19.5 in /Users/joreilly/opt/anaconda3/lib/python3.9/site-packages (from scipy==1.10.0) (1.24.3)
Installing collected packages: scipy
Attempting uninstall: scipy
Found existing installation: scipy 1.9.3
Uninstalling scipy-1.9.3:
Successfully uninstalled scipy-1.9.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.0 requires FuzzyTM>=0.4.0, which is not installed.
Successfully installed scipy-1.10.0
Import and view the data#
wb = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/WellbeingSample.csv')
wb
ID_code | College | Subject | Score_preVac | Score_postVac | |
---|---|---|---|---|---|
0 | 247610 | Lonsdale | PPE | 60 | 35 |
1 | 448590 | Lonsdale | PPE | 43 | 44 |
2 | 491100 | Lonsdale | engineering | 79 | 69 |
3 | 316150 | Lonsdale | PPE | 55 | 61 |
4 | 251870 | Lonsdale | engineering | 62 | 65 |
... | ... | ... | ... | ... | ... |
296 | 440570 | Beaufort | history | 75 | 70 |
297 | 826030 | Beaufort | maths | 52 | 49 |
298 | 856260 | Beaufort | Biology | 83 | 84 |
299 | 947060 | Beaufort | engineering | 62 | 65 |
300 | 165780 | Beaufort | PPE | 48 | 56 |
301 rows × 5 columns
Questions#
In each case, you will need to decide:
what is our null hypothesis
what is our alternative hypothesis?
Is it a paired or unpaired test for difference of means, or a correlation test?
therefore which
permutation_type
is needed,samples
,pairings
orindependent
?
Is it a one- or two-tailed test?
therefore which
alternative
hypothesis type is needed,two-sided
,greater
orless
?
What $\alpha$ value will you use?
what value must $p$ be smaller than, to reject the null hypothesis?
this is the experimenter’s choice but usually 0.05 is used (sometimes 0.001 or 0.001)
Test the following hypotheses:#
Wellbeing scores pre- and post-vac are correlated in engineering students
There is a difference in the wellbeing scores of PPE students between Beaufort or Lonsdale (before the vacation)?
Wellbeing over all students increases across the vacation
Slightly harder one:#
Wellbeing increases more across the vacation for Beaufort students than Lonsdale students