{
"cells": [
{
"cell_type": "markdown",
"id": "f4e47f5f",
"metadata": {},
"source": [
"# Boxplot\n",
"\n",
"Sometimes less is more!\n",
"\n",
"We saw in the lecture that if we want to compare several data distributions, it can be useful to have a plot that highlights key features (the median and quartiles) whilst eliminating unnecessary detail\n",
"\n",
"The boxplot can do this job\n",
"\n",
"## Oxford Weather example\n",
"\n",
"We will work with historical data from the Oxford weather centre\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "c368db3e",
"metadata": {},
"source": [
"### Set up Python libraries\n",
"\n",
"As usual, run the code cell below to import the relevant Python libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "159964a1",
"metadata": {},
"outputs": [],
"source": [
"# Set-up Python libraries - you need to run this but you don't need to change it\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n",
"import pandas \n",
"import seaborn as sns\n",
"sns.set_theme()"
]
},
{
"cell_type": "markdown",
"id": "d9ce162e",
"metadata": {},
"source": [
"### Load and inspect the data\n",
"\n",
"Let's load some historical data about the weather in Oxford, from the file \"OxfordWeather.csv\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0691586e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/q4/twg1yll54y142rc02m5wwbt40000gr/T/ipykernel_17637/3980356552.py:1: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" weather = pandas.read_csv(\"https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/OxfordWeather.csv\")\n"
]
},
{
"data": {
"text/html": [
"
\n", " | YYYY | \n", "MM | \n", "DD | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "1827 | \n", "1 | \n", "1 | \n", "8.3 | \n", "5.6 | \n", "7.0 | \n", "2.7 | \n", "0.0 | \n", "
1 | \n", "1827 | \n", "1 | \n", "2 | \n", "2.2 | \n", "0.0 | \n", "1.1 | \n", "2.2 | \n", "0.0 | \n", "
2 | \n", "1827 | \n", "1 | \n", "3 | \n", "-2.2 | \n", "-8.3 | \n", "-5.3 | \n", "6.1 | \n", "9.7 | \n", "
3 | \n", "1827 | \n", "1 | \n", "4 | \n", "-1.7 | \n", "-7.8 | \n", "-4.8 | \n", "6.1 | \n", "0.0 | \n", "
4 | \n", "1827 | \n", "1 | \n", "5 | \n", "0.0 | \n", "-10.6 | \n", "-5.3 | \n", "10.6 | \n", "0.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
71338 | \n", "2022 | \n", "4 | \n", "26 | \n", "15.2 | \n", "4.1 | \n", "9.7 | \n", "11.1 | \n", "0 | \n", "
71339 | \n", "2022 | \n", "4 | \n", "27 | \n", "10.7 | \n", "2.6 | \n", "6.7 | \n", "8.1 | \n", "0 | \n", "
71340 | \n", "2022 | \n", "4 | \n", "28 | \n", "12.7 | \n", "3.9 | \n", "8.3 | \n", "8.8 | \n", "0 | \n", "
71341 | \n", "2022 | \n", "4 | \n", "29 | \n", "11.7 | \n", "6.7 | \n", "9.2 | \n", "5 | \n", "0 | \n", "
71342 | \n", "2022 | \n", "4 | \n", "30 | \n", "17.6 | \n", "1.0 | \n", "9.3 | \n", "16.6 | \n", "0 | \n", "
71343 rows × 8 columns
\n", "\n", " | YYYY | \n", "MM | \n", "DD | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "CCCC | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "1827 | \n", "1 | \n", "1 | \n", "8.3 | \n", "5.6 | \n", "7.0 | \n", "2.7 | \n", "0.0 | \n", "18thC | \n", "
1 | \n", "1827 | \n", "1 | \n", "2 | \n", "2.2 | \n", "0.0 | \n", "1.1 | \n", "2.2 | \n", "0.0 | \n", "18thC | \n", "
2 | \n", "1827 | \n", "1 | \n", "3 | \n", "-2.2 | \n", "-8.3 | \n", "-5.3 | \n", "6.1 | \n", "9.7 | \n", "18thC | \n", "
3 | \n", "1827 | \n", "1 | \n", "4 | \n", "-1.7 | \n", "-7.8 | \n", "-4.8 | \n", "6.1 | \n", "0.0 | \n", "18thC | \n", "
4 | \n", "1827 | \n", "1 | \n", "5 | \n", "0.0 | \n", "-10.6 | \n", "-5.3 | \n", "10.6 | \n", "0.0 | \n", "18thC | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
71338 | \n", "2022 | \n", "4 | \n", "26 | \n", "15.2 | \n", "4.1 | \n", "9.7 | \n", "11.1 | \n", "0 | \n", "20thC | \n", "
71339 | \n", "2022 | \n", "4 | \n", "27 | \n", "10.7 | \n", "2.6 | \n", "6.7 | \n", "8.1 | \n", "0 | \n", "20thC | \n", "
71340 | \n", "2022 | \n", "4 | \n", "28 | \n", "12.7 | \n", "3.9 | \n", "8.3 | \n", "8.8 | \n", "0 | \n", "20thC | \n", "
71341 | \n", "2022 | \n", "4 | \n", "29 | \n", "11.7 | \n", "6.7 | \n", "9.2 | \n", "5 | \n", "0 | \n", "20thC | \n", "
71342 | \n", "2022 | \n", "4 | \n", "30 | \n", "17.6 | \n", "1.0 | \n", "9.3 | \n", "16.6 | \n", "0 | \n", "20thC | \n", "
71343 rows × 9 columns
\n", "