{ "cells": [ { "cell_type": "markdown", "id": "69252ce6", "metadata": {}, "source": [ "# Python: Descriptives and Indexing\n", "\n", "In this notebook we cover some key Pandas syntax and functions:\n", "\n", "* Syntax for getting various descriptive statistics from a `Pandas` dataframe\n", "* Syntax for **indexing** a dataframe - finding the rows and columns that you need\n", " * this allows you to get descriptives for specific rows and columns\n", "\n", "Here we meet **indexing** in the context of descriptive statistics, but indexing is something you will do every single time you write code for data analysis. Incorrect syntax in indexing is the number one biggest source of bugs for student on this course, so it is well worth spending the time to get to grips with it.\n", "\n", "You absolutely should work through all the exercises in this notebook in advance of the tutorial.\n", "\n", "## Set up Python Libraries\n", "\n", "As usual you will need to run this code block to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "7420037e", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "id": "14d3d992", "metadata": {}, "source": [ "## Import a dataset to work with\n", "\n", "We will work with weather data from the Oxford weather station. This code block will read it automatically from the internet." ] }, { "cell_type": "code", "execution_count": 2, "id": "53648053", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | YYYY | \n", "Month | \n", "MM | \n", "DD | \n", "DD365 | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1827 | \n", "Jan | \n", "1 | \n", "1 | \n", "1 | \n", "8.3 | \n", "5.6 | \n", "7.0 | \n", "2.7 | \n", "0.0 | \n", "
1 | \n", "1827 | \n", "Jan | \n", "1 | \n", "2 | \n", "2 | \n", "2.2 | \n", "0.0 | \n", "1.1 | \n", "2.2 | \n", "0.0 | \n", "
2 | \n", "1827 | \n", "Jan | \n", "1 | \n", "3 | \n", "3 | \n", "-2.2 | \n", "-8.3 | \n", "-5.3 | \n", "6.1 | \n", "9.7 | \n", "
3 | \n", "1827 | \n", "Jan | \n", "1 | \n", "4 | \n", "4 | \n", "-1.7 | \n", "-7.8 | \n", "-4.8 | \n", "6.1 | \n", "0.0 | \n", "
4 | \n", "1827 | \n", "Jan | \n", "1 | \n", "5 | \n", "5 | \n", "0.0 | \n", "-10.6 | \n", "-5.3 | \n", "10.6 | \n", "0.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
71338 | \n", "2022 | \n", "Apr | \n", "4 | \n", "26 | \n", "116 | \n", "15.2 | \n", "4.1 | \n", "9.7 | \n", "11.1 | \n", "0.0 | \n", "
71339 | \n", "2022 | \n", "Apr | \n", "4 | \n", "27 | \n", "117 | \n", "10.7 | \n", "2.6 | \n", "6.7 | \n", "8.1 | \n", "0.0 | \n", "
71340 | \n", "2022 | \n", "Apr | \n", "4 | \n", "28 | \n", "118 | \n", "12.7 | \n", "3.9 | \n", "8.3 | \n", "8.8 | \n", "0.0 | \n", "
71341 | \n", "2022 | \n", "Apr | \n", "4 | \n", "29 | \n", "119 | \n", "11.7 | \n", "6.7 | \n", "9.2 | \n", "5.0 | \n", "0.0 | \n", "
71342 | \n", "2022 | \n", "Apr | \n", "4 | \n", "30 | \n", "120 | \n", "17.6 | \n", "1.0 | \n", "9.3 | \n", "16.6 | \n", "0.0 | \n", "
71343 rows × 10 columns
\n", "\n", " | YYYY | \n", "Month | \n", "MM | \n", "DD | \n", "DD365 | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "
---|---|---|---|---|---|---|---|---|---|---|
34333 | \n", "1921 | \n", "Dec | \n", "12 | \n", "31 | \n", "1 | \n", "9.1 | \n", "3.3 | \n", "6.2 | \n", "5.8 | \n", "0.0 | \n", "
34334 | \n", "1921 | \n", "Jan | \n", "1 | \n", "1 | \n", "2 | \n", "12.2 | \n", "7.1 | \n", "9.7 | \n", "5.1 | \n", "9.7 | \n", "
34335 | \n", "1921 | \n", "Jan | \n", "1 | \n", "2 | \n", "3 | \n", "11.9 | \n", "8.7 | \n", "10.3 | \n", "3.2 | \n", "6.1 | \n", "
34336 | \n", "1921 | \n", "Jan | \n", "1 | \n", "3 | \n", "4 | \n", "9.9 | \n", "5.9 | \n", "7.9 | \n", "4.0 | \n", "0.0 | \n", "
34337 | \n", "1921 | \n", "Jan | \n", "1 | \n", "4 | \n", "5 | \n", "12.3 | \n", "9.8 | \n", "11.1 | \n", "2.5 | \n", "5.5 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
34693 | \n", "1921 | \n", "Dec | \n", "12 | \n", "26 | \n", "361 | \n", "10.8 | \n", "-3.1 | \n", "3.9 | \n", "13.9 | \n", "5.6 | \n", "
34694 | \n", "1921 | \n", "Dec | \n", "12 | \n", "27 | \n", "362 | \n", "11.2 | \n", "5.4 | \n", "8.3 | \n", "5.8 | \n", "1.3 | \n", "
34695 | \n", "1921 | \n", "Dec | \n", "12 | \n", "28 | \n", "363 | \n", "13.9 | \n", "3.9 | \n", "8.9 | \n", "10.0 | \n", "0.0 | \n", "
34696 | \n", "1921 | \n", "Dec | \n", "12 | \n", "29 | \n", "364 | \n", "7.6 | \n", "1.7 | \n", "4.7 | \n", "5.9 | \n", "0.8 | \n", "
34697 | \n", "1921 | \n", "Dec | \n", "12 | \n", "30 | \n", "365 | \n", "11.7 | \n", "1.9 | \n", "6.8 | \n", "9.8 | \n", "4.5 | \n", "
365 rows × 10 columns
\n", "\n", " | YYYY | \n", "Month | \n", "MM | \n", "DD | \n", "DD365 | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "
---|---|---|---|---|---|---|---|---|---|---|
34333 | \n", "1921 | \n", "Dec | \n", "12 | \n", "31 | \n", "1 | \n", "9.1 | \n", "3.3 | \n", "6.2 | \n", "5.8 | \n", "0.0 | \n", "
34334 | \n", "1921 | \n", "Jan | \n", "1 | \n", "1 | \n", "2 | \n", "12.2 | \n", "7.1 | \n", "9.7 | \n", "5.1 | \n", "9.7 | \n", "
34335 | \n", "1921 | \n", "Jan | \n", "1 | \n", "2 | \n", "3 | \n", "11.9 | \n", "8.7 | \n", "10.3 | \n", "3.2 | \n", "6.1 | \n", "
34336 | \n", "1921 | \n", "Jan | \n", "1 | \n", "3 | \n", "4 | \n", "9.9 | \n", "5.9 | \n", "7.9 | \n", "4.0 | \n", "0.0 | \n", "
34337 | \n", "1921 | \n", "Jan | \n", "1 | \n", "4 | \n", "5 | \n", "12.3 | \n", "9.8 | \n", "11.1 | \n", "2.5 | \n", "5.5 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
34693 | \n", "1921 | \n", "Dec | \n", "12 | \n", "26 | \n", "361 | \n", "10.8 | \n", "-3.1 | \n", "3.9 | \n", "13.9 | \n", "5.6 | \n", "
34694 | \n", "1921 | \n", "Dec | \n", "12 | \n", "27 | \n", "362 | \n", "11.2 | \n", "5.4 | \n", "8.3 | \n", "5.8 | \n", "1.3 | \n", "
34695 | \n", "1921 | \n", "Dec | \n", "12 | \n", "28 | \n", "363 | \n", "13.9 | \n", "3.9 | \n", "8.9 | \n", "10.0 | \n", "0.0 | \n", "
34696 | \n", "1921 | \n", "Dec | \n", "12 | \n", "29 | \n", "364 | \n", "7.6 | \n", "1.7 | \n", "4.7 | \n", "5.9 | \n", "0.8 | \n", "
34697 | \n", "1921 | \n", "Dec | \n", "12 | \n", "30 | \n", "365 | \n", "11.7 | \n", "1.9 | \n", "6.8 | \n", "9.8 | \n", "4.5 | \n", "
365 rows × 10 columns
\n", "\n", " | YYYY | \n", "MM | \n", "DD | \n", "DD365 | \n", "Tmax | \n", "Tmin | \n", "Tmean | \n", "Trange | \n", "Rainfall_mm | \n", "
---|---|---|---|---|---|---|---|---|---|
YYYY | \n", "1.000000 | \n", "-0.003411 | \n", "-0.000059 | \n", "-0.003372 | \n", "0.071631 | \n", "0.089683 | \n", "0.083044 | \n", "-0.001257 | \n", "0.008117 | \n", "
MM | \n", "-0.003411 | \n", "1.000000 | \n", "0.010567 | \n", "0.995580 | \n", "0.179681 | \n", "0.235401 | \n", "0.213082 | \n", "-0.018820 | \n", "0.043672 | \n", "
DD | \n", "-0.000059 | \n", "0.010567 | \n", "1.000000 | \n", "0.092771 | \n", "0.001217 | \n", "0.002876 | \n", "0.002035 | \n", "-0.002055 | \n", "0.005315 | \n", "
DD365 | \n", "-0.003372 | \n", "0.995580 | \n", "0.092771 | \n", "1.000000 | \n", "0.177016 | \n", "0.233248 | \n", "0.210545 | \n", "-0.020536 | \n", "0.043925 | \n", "
Tmax | \n", "0.071631 | \n", "0.179681 | \n", "0.001217 | \n", "0.177016 | \n", "1.000000 | \n", "0.841480 | \n", "0.967881 | \n", "0.593339 | \n", "-0.008807 | \n", "
Tmin | \n", "0.089683 | \n", "0.235401 | \n", "0.002876 | \n", "0.233248 | \n", "0.841480 | \n", "1.000000 | \n", "0.950248 | \n", "0.064379 | \n", "0.086181 | \n", "
Tmean | \n", "0.083044 | \n", "0.213082 | \n", "0.002035 | \n", "0.210545 | \n", "0.967881 | \n", "0.950248 | \n", "1.000000 | \n", "0.371965 | \n", "0.035037 | \n", "
Trange | \n", "-0.001257 | \n", "-0.018820 | \n", "-0.002055 | \n", "-0.020536 | \n", "0.593339 | \n", "0.064379 | \n", "0.371965 | \n", "1.000000 | \n", "-0.144654 | \n", "
Rainfall_mm | \n", "0.008117 | \n", "0.043672 | \n", "0.005315 | \n", "0.043925 | \n", "-0.008807 | \n", "0.086181 | \n", "0.035037 | \n", "-0.144654 | \n", "1.000000 | \n", "