Tutorial Exercises

1.10. Tutorial Exercises#

This week’s tutorial exercises focus on indexing and obtaining descriptive statistics

1.10.1. Set up Python Libraries#

As usual you will need to run this code block to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf

1.10.2. Import a dataset to work with#

You will need to download the file OxfordWeather.csv from Canvas to your computer, then import it

weather = pd.read_csv("https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/OxfordWeather.csv")
display(weather)

	YYYY	Month	MM	DD	DD365	Tmax	Tmin	Tmean	Trange	Rainfall_mm
0	1827	Jan	1	1	1	8.3	5.6	7.0	2.7	0.0
1	1827	Jan	1	2	2	2.2	0.0	1.1	2.2	0.0
2	1827	Jan	1	3	3	-2.2	-8.3	-5.3	6.1	9.7
3	1827	Jan	1	4	4	-1.7	-7.8	-4.8	6.1	0.0
4	1827	Jan	1	5	5	0.0	-10.6	-5.3	10.6	0.0
...	...	...	...	...	...	...	...	...	...	...
71338	2022	Apr	4	26	116	15.2	4.1	9.7	11.1	0.0
71339	2022	Apr	4	27	117	10.7	2.6	6.7	8.1	0.0
71340	2022	Apr	4	28	118	12.7	3.9	8.3	8.8	0.0
71341	2022	Apr	4	29	119	11.7	6.7	9.2	5.0	0.0
71342	2022	Apr	4	30	120	17.6	1.0	9.3	16.6	0.0

71343 rows × 10 columns

1.10.3. Exercises#

In the following questions, we descriptive statistics and indexing to answer some questions about the weather and climate in Oxford.

Where you are asked to calculate a value (such as the mean) rather than output a table, you should report your answer in words in the text box below the code block.

Where the question asks you to “comment”, you are simply being asked to engage with the data/ explain what you notice in plain English. Please discuss with your fellow students and your tutor as this is a really important skill for data analysis.

Part 1: Heat#

a. What was the hottest temperature on record?#

Note that the dataset ends in April 2022 and therefore does not include the record heatwave of summer 2022.

# Your code here

Your text here

b. On what date did the hottest temperature occur?#

Hint: you could use df.query() to help you here

# Your code here

Your text here

c. Display the 10 hottest days on record and comment#

Hint: you can use df.sort_values() and df.head() or df.tail() to help you here

# Your code here

Your comment here

d. Find the mean of maximum daily temperature (Tmax) for each month and comment#

Hint: you can use df.groupby() to help you here

# Your code here

Your comment here

e. Make a table displaying the mean and standard deviation of Tmax in each month#

Hint: A combination of df.agg() and df.groupby() will help you here

# Your code here

e. Make a table displaying the mean of Tmax and Tmin in each month#

Hint: A combination of df.agg() and df.groupby() will help you here

# Your code here

Part 2: Rain#

a. Run this code block to add a column called `wet` containing a `True` for days on which it rained and `False` otherwise#

We will practice adding columns in a later session

# Your code here
weather['wet']=weather.Rainfall_mm>0
weather

	YYYY	Month	MM	DD	DD365	Tmax	Tmin	Tmean	Trange	Rainfall_mm	wet
0	1827	Jan	1	1	1	8.3	5.6	7.0	2.7	0.0	False
1	1827	Jan	1	2	2	2.2	0.0	1.1	2.2	0.0	False
2	1827	Jan	1	3	3	-2.2	-8.3	-5.3	6.1	9.7	True
3	1827	Jan	1	4	4	-1.7	-7.8	-4.8	6.1	0.0	False
4	1827	Jan	1	5	5	0.0	-10.6	-5.3	10.6	0.0	False
...	...	...	...	...	...	...	...	...	...	...	...
71338	2022	Apr	4	26	116	15.2	4.1	9.7	11.1	0.0	False
71339	2022	Apr	4	27	117	10.7	2.6	6.7	8.1	0.0	False
71340	2022	Apr	4	28	118	12.7	3.9	8.3	8.8	0.0	False
71341	2022	Apr	4	29	119	11.7	6.7	9.2	5.0	0.0	False
71342	2022	Apr	4	30	120	17.6	1.0	9.3	16.6	0.0	False

71343 rows × 11 columns

b. What is the proportion of wet days overall?#

Hint: The values True and False can be treated as 1 and 0 respectively.

To get the proportion of days on which wet==True, we can use a programmming trick which is to simply take the mean of the column wet:

say there are 100 days in my sample
- say 66 of them, wet==True==1
- for the other 44, wet==False==0
If we take the mean, this gives us the proportion of wet days because we:
- add up all the values (answer=66)
- divide by the number of cases (100)
- result is 66/100 = 0.66 or 66%, the proportion of wet days

# your code here

Your text here

c. What is the proportion of wet days in each month? Comment on your findings#

Hint: use df.groupby()

# your code here

Your comments here

d. What is the mean quantity of rainfall (in mm) in each month? Comment on your findings#

# your code here

Your comment here

e. Display the 10 wettest days on record and comment#

# Your code here

Your comment here

f. Compare and contrast the different findings in part 2 c,d, and e#

Different descriptive statistics tell us different things about the same data!

Your comments here!

Snow#

a. Create a dataframe `WhiteChristmas` containing the weather on Christmas day, for all the years in which there was a White Christmas#

Hint: we don’t have a column telling us when is has snowed, but it is reasonable to assume this happens when the minimum temperature dips below zero, and Rainfall_mm is above zero.

# Your code here
# WhiteChristmas = 

b. Sort the dataframe `WhiteChristmas` by year and comment#

# Your code here

Your comments here

c. Any issues with our definition of ‘snow’?#

We defined snow as when the Tmin falls below zero and Rainfall is non-zero.

Do you think this over- or under- estiamtes the number of snowy days?
Why?

Your comments here

d. How common is ‘proper’ snowfall in Oxford?#

Let’s focus on days with enough snowfall to make at least a tiny snowman! Assume that this happens when TMin is below zero and there is more than 4mm of rainfall

4mm of rain makes about 5cm of soggy snow in Oxford conditions, although it would make a uch greater depth of powder in a cold dry atmosphere like Utah or Colorado

Create a dataframe called SnowDays containing only days with enough snow to make a snowman.

You can check how often this happened in recent years using df.tail()

# Your code here

Your comments here

Tutorial Exercises

Contents

1.10. Tutorial Exercises#

1.10.1. Set up Python Libraries#

1.10.2. Import a dataset to work with#

1.10.3. Exercises#

Part 1: Heat#

a. What was the hottest temperature on record?#

b. On what date did the hottest temperature occur?#

c. Display the 10 hottest days on record and comment#

d. Find the mean of maximum daily temperature (Tmax) for each month and comment#

e. Make a table displaying the mean and standard deviation of Tmax in each month#

e. Make a table displaying the mean of Tmax and Tmin in each month#

Part 2: Rain#

a. Run this code block to add a column called wet containing a True for days on which it rained and False otherwise#

b. What is the proportion of wet days overall?#

c. What is the proportion of wet days in each month? Comment on your findings#

d. What is the mean quantity of rainfall (in mm) in each month? Comment on your findings#

e. Display the 10 wettest days on record and comment#

f. Compare and contrast the different findings in part 2 c,d, and e#

Snow#

a. Create a dataframe WhiteChristmas containing the weather on Christmas day, for all the years in which there was a White Christmas#

b. Sort the dataframe WhiteChristmas by year and comment#

c. Any issues with our definition of ‘snow’?#

d. How common is ‘proper’ snowfall in Oxford?#

a. Run this code block to add a column called `wet` containing a `True` for days on which it rained and `False` otherwise#

a. Create a dataframe `WhiteChristmas` containing the weather on Christmas day, for all the years in which there was a White Christmas#

b. Sort the dataframe `WhiteChristmas` by year and comment#