{ "cells": [ { "cell_type": "markdown", "id": "8752e0fa", "metadata": {}, "source": [ "# Tutorial Exercises\n", "\n", "This week's tutorial exercises focus on indexing and obtaining descriptive statistics\n", "\n", "### Set up Python Libraries\n", "\n", "As usual you will need to run this code block to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "8366c45d", "metadata": {}, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "id": "6bf1597e", "metadata": {}, "source": [ "### Import a dataset to work with\n", "\n", "You will need to download the file OxfordWeather.csv from Canvas to your computer, then import it" ] }, { "cell_type": "code", "execution_count": 2, "id": "8e80e994", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YYYYMMDDTmaxTminTmeanTrangeRainfall_mm
01827118.35.67.02.70.0
11827122.20.01.12.20.0
2182713-2.2-8.3-5.36.19.7
3182714-1.7-7.8-4.86.10.0
41827150.0-10.6-5.310.60.0
...........................
71338202242615.24.19.711.10.0
71339202242710.72.66.78.10.0
71340202242812.73.98.38.80.0
71341202242911.76.79.25.00.0
71342202243017.61.09.316.60.0
\n", "

71343 rows × 8 columns

\n", "
" ], "text/plain": [ " YYYY MM DD Tmax Tmin Tmean Trange Rainfall_mm\n", "0 1827 1 1 8.3 5.6 7.0 2.7 0.0\n", "1 1827 1 2 2.2 0.0 1.1 2.2 0.0\n", "2 1827 1 3 -2.2 -8.3 -5.3 6.1 9.7\n", "3 1827 1 4 -1.7 -7.8 -4.8 6.1 0.0\n", "4 1827 1 5 0.0 -10.6 -5.3 10.6 0.0\n", "... ... .. .. ... ... ... ... ...\n", "71338 2022 4 26 15.2 4.1 9.7 11.1 0.0\n", "71339 2022 4 27 10.7 2.6 6.7 8.1 0.0\n", "71340 2022 4 28 12.7 3.9 8.3 8.8 0.0\n", "71341 2022 4 29 11.7 6.7 9.2 5.0 0.0\n", "71342 2022 4 30 17.6 1.0 9.3 16.6 0.0\n", "\n", "[71343 rows x 8 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "weather = pd.read_csv(\"https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/OxfordWeather.csv\")\n", "display(weather)" ] }, { "cell_type": "markdown", "id": "fe20b066", "metadata": {}, "source": [ "## Exercises\n", "\n", "In the following questions, we descriptive statistics and indexing to answer some questions about the weather and climate in Oxford.\n", "\n", "Where you are asked to calculate a value (such as the mean) rather than output a table, you should **report your answer in words** in the text box below the code block.\n", "\n", "Where the question asks you to \"comment\", you are simply being asked to engage with the data/ explain what you notice in plain English. Please discuss with your fellow students and your tutor as this is a really important skill for data analysis.\n", "\n", "### Part 1: Heat" ] }, { "cell_type": "markdown", "id": "b58c7c95", "metadata": {}, "source": [ "#### a. What was the hottest temperature on record?\n", "\n", "Note that the dataset ends in April 2022 and therefore does not include the record heatwave of summer 2022." ] }, { "cell_type": "code", "execution_count": 3, "id": "66b09e4b", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "31fe6a75", "metadata": {}, "source": [ "*Your text here*" ] }, { "cell_type": "markdown", "id": "14c8c66a", "metadata": {}, "source": [ "#### b. On what date did the hottest temperature occur?\n", "\n", "Hint: you could use `df.query()` to help you here" ] }, { "cell_type": "code", "execution_count": 4, "id": "cd2525c4", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "a9447c85", "metadata": {}, "source": [ "*Your text here*" ] }, { "cell_type": "markdown", "id": "7b5b2d01", "metadata": {}, "source": [ "#### c. Display the 10 hottest days on record and comment\n", "\n", "Hint: you can use `df.sort_values()` and `df.head()` or `df.tail()` to help you here" ] }, { "cell_type": "code", "execution_count": 5, "id": "f78e1c16", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "889a28cf", "metadata": {}, "source": [ "*Your comment here*" ] }, { "cell_type": "markdown", "id": "21f1fef8", "metadata": {}, "source": [ "#### d. Find the mean of maximum daily temperature (Tmax) for each month and comment\n", "\n", "Hint: you can use `df.groupby()` to help you here" ] }, { "cell_type": "code", "execution_count": 6, "id": "e9599bac", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "f0c017f3", "metadata": {}, "source": [ "*Your comment here*" ] }, { "cell_type": "markdown", "id": "1cd6748f", "metadata": {}, "source": [ "#### e. Make a table displaying the mean and standard deviation of Tmax in each month\n", "\n", "Hint: A combination of `df.agg()` and `df.groupby()` will help you here" ] }, { "cell_type": "code", "execution_count": 7, "id": "6c6ff384", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "8702df29", "metadata": {}, "source": [ "#### e. Make a table displaying the mean of Tmax and Tmin in each month\n", "\n", "Hint: A combination of `df.agg()` and `df.groupby()` will help you here" ] }, { "cell_type": "code", "execution_count": 8, "id": "e2f4e1f2", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "362931b5", "metadata": {}, "source": [ "### Part 2: Rain" ] }, { "cell_type": "markdown", "id": "14aca7b0", "metadata": {}, "source": [ "#### a. Run this code block to add a column called wet containing a True for days on which it rained and False otherwise\n", "\n", "We will practice adding columns in a later session" ] }, { "cell_type": "code", "execution_count": 9, "id": "9070c6fb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YYYYMMDDTmaxTminTmeanTrangeRainfall_mmwet
01827118.35.67.02.70.0False
11827122.20.01.12.20.0False
2182713-2.2-8.3-5.36.19.7True
3182714-1.7-7.8-4.86.10.0False
41827150.0-10.6-5.310.60.0False
..............................
71338202242615.24.19.711.10.0False
71339202242710.72.66.78.10.0False
71340202242812.73.98.38.80.0False
71341202242911.76.79.25.00.0False
71342202243017.61.09.316.60.0False
\n", "

71343 rows × 9 columns

\n", "
" ], "text/plain": [ " YYYY MM DD Tmax Tmin Tmean Trange Rainfall_mm wet\n", "0 1827 1 1 8.3 5.6 7.0 2.7 0.0 False\n", "1 1827 1 2 2.2 0.0 1.1 2.2 0.0 False\n", "2 1827 1 3 -2.2 -8.3 -5.3 6.1 9.7 True\n", "3 1827 1 4 -1.7 -7.8 -4.8 6.1 0.0 False\n", "4 1827 1 5 0.0 -10.6 -5.3 10.6 0.0 False\n", "... ... .. .. ... ... ... ... ... ...\n", "71338 2022 4 26 15.2 4.1 9.7 11.1 0.0 False\n", "71339 2022 4 27 10.7 2.6 6.7 8.1 0.0 False\n", "71340 2022 4 28 12.7 3.9 8.3 8.8 0.0 False\n", "71341 2022 4 29 11.7 6.7 9.2 5.0 0.0 False\n", "71342 2022 4 30 17.6 1.0 9.3 16.6 0.0 False\n", "\n", "[71343 rows x 9 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Your code here\n", "weather['wet']=weather.Rainfall_mm>0\n", "weather" ] }, { "cell_type": "markdown", "id": "6c973ea7", "metadata": {}, "source": [ "#### b. What is the proportion of wet days overall?\n", "\n", "Hint: The values True and False can be treated as 1 and 0 respectively.\n", " \n", "To get the proportion of days on which wet==True, we can use a programmming trick which is to simply take the mean of the column wet:\n", " \n", "* say there are 100 days in my sample\n", " * say 66 of them, wet==True==1\n", " * for the other 44, wet==False==0\n", "* If we take the mean, this gives us the proportion of wet days because we:\n", " * add up all the values (answer=66) \n", " * divide by the number of cases (100)\n", " * result is 66/100 = 0.66 or 66%, the proportion of wet days" ] }, { "cell_type": "code", "execution_count": 10, "id": "26baa946", "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "id": "98581de1", "metadata": {}, "source": [ "*Your text here*" ] }, { "cell_type": "markdown", "id": "096e665d", "metadata": {}, "source": [ "#### c. What is the proportion of wet days in each month? Comment on your findings\n", "\n", "Hint: use `df.groupby()`" ] }, { "cell_type": "code", "execution_count": 11, "id": "51153a39", "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "id": "f138aeea", "metadata": {}, "source": [ "*Your comments here*" ] }, { "cell_type": "markdown", "id": "2df8dae1", "metadata": {}, "source": [ "#### d. What is the mean quantity of rainfall (in mm) in each month? Comment on your findings" ] }, { "cell_type": "code", "execution_count": 12, "id": "da4b585b", "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "id": "ff270f70", "metadata": {}, "source": [ "*Your comment here*" ] }, { "cell_type": "markdown", "id": "621a10d9", "metadata": {}, "source": [ "#### e. Display the 10 wettest days on record and comment" ] }, { "cell_type": "code", "execution_count": 13, "id": "db492ec7", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "da7a8e91", "metadata": {}, "source": [ "*Your comment here*" ] }, { "cell_type": "markdown", "id": "38798a43", "metadata": {}, "source": [ "#### f. Compare and contrast the different findings in part 2 c,d, and e\n", "\n", "Different descriptive statistics tell us different things about the same data!" ] }, { "cell_type": "markdown", "id": "62e5de6a", "metadata": {}, "source": [ "*Your comments here!*" ] }, { "cell_type": "markdown", "id": "21480a01", "metadata": {}, "source": [ "### Snow\n", "\n", "#### a. Create a dataframe WhiteChristmas containing the weather on Christmas day, for all the years in which there was a White Christmas \n", "\n", "Hint: we don't have a column telling us when is has snowed, but it is reasonable to assume this happens when the minimum temperature dips below zero, and Rainfall_mm is above zero." ] }, { "cell_type": "code", "execution_count": 14, "id": "29ec6741", "metadata": {}, "outputs": [], "source": [ "# Your code here\n", "# WhiteChristmas = " ] }, { "cell_type": "markdown", "id": "a00193d2", "metadata": {}, "source": [ "#### b. Sort the dataframe WhiteChristmas by year and comment" ] }, { "cell_type": "code", "execution_count": 15, "id": "8bf2da6b", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "f62045b3", "metadata": {}, "source": [ "*Your comments here*" ] }, { "cell_type": "markdown", "id": "e3656051", "metadata": {}, "source": [ "#### c. Any issues with our definition of 'snow'?\n", "\n", "We defined snow as when the Tmin falls below zero and Rainfall is non-zero. \n", "\n", "* Do you think this over- or under- estiamtes the number of snowy days?\n", "* Why?" ] }, { "cell_type": "markdown", "id": "bb5a99c8", "metadata": {}, "source": [ "*Your comments here*" ] }, { "cell_type": "markdown", "id": "5f818f4e", "metadata": {}, "source": [ "#### d. How common is 'proper' snowfall in Oxford?\n", "\n", "Let's focus on days with enough snowfall to make at least a tiny snowman! Assume that this happens when TMin is below zero and there is more than 4mm of rainfall \n", "\n", "* 4mm of rain makes about 5cm of soggy snow in Oxford conditions, although it would make a uch greater depth of powder in a cold dry atmosphere like Utah or Colorado\n", "\n", "Create a dataframe called SnowDays containing only days with enough snow to make a snowman.\n", "\n", "You can check how often this happened in recent years using `df.tail()`" ] }, { "cell_type": "code", "execution_count": 16, "id": "8a3a8710", "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "078e4501", "metadata": {}, "source": [ "*Your comments here*\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "48af32e3", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }