Tutorial Exercises: t-test and non-parametric equivalents

3.8. Tutorial Exercises: $t$ -test and non-parametric equivalents#

Here are some more exercises on comparing means using the t-test and non-parametric equivalents

These exercises are very similar to what you did in the t-test and Mann-Whitney/Wilcoxon examples so in most cases you will be able to copy and adapt code and text from the examples.

3.8.1. Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings 
warnings.simplefilter('ignore', category=FutureWarning)

3.8.2. 1. Whose peaches are heavier?#

There should be a picture of some peaches here

As last week:

Mr Robinson’s juice factory buys peaches from farmers by the tray. Each tray contains 50 peaches. Farmer MacDonald claims that this is unfair as his peaches are juicier and therefore weigh more than the peaches of his rival, Mr McGregor.

Mr Robinson weighs eight trays of Farmer MacDonald’s peaches and 8 trays of Mr McGregor’s peaches. The weights, in kilograms are given in the file peaches.csv

Investigate whether McDonald’s claim is justified by testing for a difference in weight between MacDonald and McGregor’s peaches. Use both a parametric and non-parametric test.

a) Load the data into a Pandas dataframe

peaches = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2025/main/data/peaches.csv')
peaches

	McGregor	MacDonald
0	7.867	8.289
1	7.637	7.972
2	7.652	8.237
3	7.772	7.789
4	7.510	7.345
5	7.743	7.861
6	7.356	7.779
7	7.944	7.974

b) Plot the data and comment on whether they are noramlly distributed.

A KDE plot (to show the distribution) and rug plot (to show individual data points) would be a good choice here. You should comment on whether the data appear to be Normally distriubted and hence the suitability of the t-test.

# your code here to plot the data

d) We can assume (based on the Central Limit Theorem) that these data points are normally distributed. Explain why.

Your text here explaining why the data should be Normal according to the CLT

e) Conduct a t-test to test Farmer MacDonald’s claim

State your hypotheses
State relevant descriptive statistics
Carry out the test using the built in function from scipy.stats with appropriate option choices
State your conclusions

Your answer here! You will need to add additional cells

f) Look back at the rank-based and permutation tests we carried out on the same data in the previous section. How do the results of the three tests differ? Which test was the best choice, and why?

3.8.3. 2. IQ and vitamins#

There should be a picture of some vitamin pills here

The VitalVit company claim that after taking their VitalVit supplement, IQ is increased.

They run a trial in which 22 participants complete a baseline IQ test, then take VitalVit for six weeks, then complete another IQ test.

a) Load the data into a Pandas dataframe

vitamin = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2025/main/data/vitalVit.csv')
vitamin

	ID_code	before	after
0	688870	82.596	83.437
1	723650	117.200	119.810
2	445960	85.861	83.976
3	708780	125.640	127.680
4	109960	96.751	99.103
5	968530	105.680	106.890
6	164930	142.410	145.550
7	744410	109.650	109.320
8	499380	128.210	125.110
9	290560	84.773	87.249
10	780690	110.470	112.650
11	660820	100.870	99.074
12	758780	94.117	95.951
13	363320	96.952	96.801
14	638840	86.280	87.669
15	483930	89.413	94.379
16	102800	85.283	88.316
17	581620	94.477	96.300
18	754980	90.649	94.158
19	268960	103.190	104.300
20	314040	92.880	94.556
21	324960	97.843	97.969

b) The requirement for a paired t-test is that the pairwise differences in scores are normally distributed. Plot the data in such a way as to check this assumption. Comment on your plot.

A KDE plot of the pairwise differences, after-before, would be a good choice hereA scatterplot would be a good choice as these are paired data.

# Your code here

In real IQ tests, IQ scores are normally distributed by design (the tests are designed to yeild a normal distribution of scores). Therefore we should be able to use a t-test to compare the scores from before and after taking VitalVit.

e) Conduct a t-test to test VitalVit’s claim

State your hypotheses
State relevant descriptive statistics
Carry out the test using the biilt in function from scipy.stats with appropriate option choices
State your conclusions

Your answer here.

f) Look back to the rank-based and permutation tests on the same data, which you carried out last week. How do the results differ? Why test was the best choice, and why?

Your answer here.

3.8.4. 3. Who has the tallest students?#

A student from Lonsdale college claims that Lonsdale students are taller than students from Beaufort college.

Heights of 30 randomly selected male undergraduates from each college are found in the file heightsCollege.csv

Test the student’s hypothesis using a t-test (this is justified as heights are generally normally distributed) and write up your report as if for a scientific publication. Your report should include the following elements:

A plot of the data to show the data distribution
The relevant descriptive statistics
The results of the t-test
A conclusion

You can use the write-up sections of the t-test example notebooks as a model

# Load the data
heights = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2025/main/data/heightsCollege.csv')