{
"cells": [
{
"cell_type": "markdown",
"id": "8501b536",
"metadata": {},
"source": [
"# Visualizing Distributions\n",
"\n",
"- `sns.histplot()`\n",
"- `sns.kdeplot()`\n",
"\n",
"If we want to see the shape of a data distribution, the **histogram** can be a good choice. From a histogram we can easily see if a data distribution:\n",
"\n",
"* is unimodal or multimodel\n",
"* has skew, or is symmetrical\n",
"* differs between two samples\n",
"\n",
"In this section we will see how to plot a histogram using Python and what choices we can make to show the data distribution clearly and accurately\n",
"\n",
"We will also consider some of the limitations of the histogram for small datasets, and explore a related plot, the **Kernel Density Estimate (KDE)** plot, which can mitigate these limitations.\n",
"\n",
"To summarize the conceptual content of this page, when plotting a histogram we should consider:\n",
"\n",
"* the width of the bins - narrow bins give more detail but may make it harder to perceive the overall pattern\n",
" * the KDE-plot equivalent is **bandwidth** which determines the smoothness of the KDE shape\n",
"* the bin boundaries - do we want to place them at round numbers or some other meaningful point?\n",
"\n",
"When using histograms (and KDE plots) to compare distributions, we should consider:\n",
"\n",
"* matching the scale on the axes to facilitate comparison\n",
"* whether to place the two plots next to each other (horizontally), above one another (vertically) or overlaid (on the same axis), to facilitate comparison\n"
]
},
{
"cell_type": "markdown",
"id": "a77a81dc",
"metadata": {},
"source": [
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "06a3540a",
"metadata": {},
"source": [
"## Example\n",
"\n",
"We will look at a small sample of height data (these are made-up data designed for the exercise).\n",
"\n",
"\n",
"\n",
"### Set up Python libraries\n",
"\n",
"As usual, run the code cell below to import the relevant Python libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7f1d34e0",
"metadata": {},
"outputs": [],
"source": [
"# Set-up Python libraries - you need to run this but you don't need to change it\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"sns.set_theme(style='white')\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "markdown",
"id": "fb218a2a",
"metadata": {},
"source": [
"### Load and inspect the data"
]
},
{
"cell_type": "markdown",
"id": "3bbd70d4",
"metadata": {},
"source": [
"Load the file BodyData.csv which contains body measurements for 50 (fictional) people. The code block below will load the data automatically from the internet."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5b37c633",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | ID | \n", "sex | \n", "height | \n", "weight | \n", "age | \n", "
---|---|---|---|---|---|
0 | \n", "101708 | \n", "M | \n", "161 | \n", "64.8 | \n", "35 | \n", "
1 | \n", "101946 | \n", "F | \n", "165 | \n", "68.1 | \n", "42 | \n", "
2 | \n", "108449 | \n", "F | \n", "175 | \n", "76.6 | \n", "31 | \n", "
3 | \n", "108796 | \n", "M | \n", "180 | \n", "81.0 | \n", "31 | \n", "
4 | \n", "113449 | \n", "F | \n", "179 | \n", "80.1 | \n", "31 | \n", "
5 | \n", "114688 | \n", "M | \n", "172 | \n", "74.0 | \n", "42 | \n", "
6 | \n", "119187 | \n", "F | \n", "148 | \n", "54.8 | \n", "45 | \n", "
7 | \n", "120679 | \n", "F | \n", "160 | \n", "64.0 | \n", "44 | \n", "
8 | \n", "120735 | \n", "F | \n", "188 | \n", "88.4 | \n", "32 | \n", "
9 | \n", "124269 | \n", "F | \n", "172 | \n", "74.0 | \n", "29 | \n", "
10 | \n", "124713 | \n", "M | \n", "175 | \n", "76.6 | \n", "26 | \n", "
11 | \n", "127076 | \n", "M | \n", "180 | \n", "81.0 | \n", "28 | \n", "
12 | \n", "131626 | \n", "M | \n", "162 | \n", "65.6 | \n", "35 | \n", "
13 | \n", "132218 | \n", "M | \n", "170 | \n", "72.3 | \n", "29 | \n", "
14 | \n", "132609 | \n", "F | \n", "172 | \n", "74.0 | \n", "41 | \n", "
15 | \n", "134660 | \n", "F | \n", "159 | \n", "63.2 | \n", "34 | \n", "
16 | \n", "135195 | \n", "M | \n", "169 | \n", "71.4 | \n", "42 | \n", "
17 | \n", "140073 | \n", "F | \n", "168 | \n", "70.6 | \n", "34 | \n", "
18 | \n", "140114 | \n", "M | \n", "195 | \n", "95.1 | \n", "41 | \n", "
19 | \n", "145185 | \n", "F | \n", "157 | \n", "61.6 | \n", "45 | \n", "
20 | \n", "146279 | \n", "F | \n", "180 | \n", "81.0 | \n", "30 | \n", "
21 | \n", "146519 | \n", "F | \n", "172 | \n", "74.0 | \n", "34 | \n", "
22 | \n", "151451 | \n", "F | \n", "171 | \n", "73.1 | \n", "37 | \n", "
23 | \n", "152597 | \n", "M | \n", "172 | \n", "74.0 | \n", "27 | \n", "
24 | \n", "154672 | \n", "M | \n", "167 | \n", "69.7 | \n", "39 | \n", "
25 | \n", "155594 | \n", "F | \n", "165 | \n", "68.1 | \n", "25 | \n", "
26 | \n", "158165 | \n", "M | \n", "175 | \n", "76.6 | \n", "45 | \n", "
27 | \n", "159457 | \n", "F | \n", "176 | \n", "77.4 | \n", "36 | \n", "
28 | \n", "162323 | \n", "M | \n", "173 | \n", "74.8 | \n", "31 | \n", "
29 | \n", "166948 | \n", "M | \n", "174 | \n", "75.7 | \n", "28 | \n", "
30 | \n", "168411 | \n", "M | \n", "175 | \n", "76.6 | \n", "29 | \n", "
31 | \n", "168574 | \n", "F | \n", "163 | \n", "66.4 | \n", "30 | \n", "
32 | \n", "169209 | \n", "F | \n", "159 | \n", "63.2 | \n", "45 | \n", "
33 | \n", "171236 | \n", "F | \n", "164 | \n", "67.2 | \n", "34 | \n", "
34 | \n", "172289 | \n", "M | \n", "181 | \n", "81.9 | \n", "27 | \n", "
35 | \n", "173925 | \n", "M | \n", "189 | \n", "89.3 | \n", "25 | \n", "
36 | \n", "176598 | \n", "F | \n", "169 | \n", "71.4 | \n", "37 | \n", "
37 | \n", "177002 | \n", "F | \n", "180 | \n", "81.0 | \n", "36 | \n", "
38 | \n", "178659 | \n", "M | \n", "181 | \n", "81.9 | \n", "26 | \n", "
39 | \n", "180992 | \n", "F | \n", "177 | \n", "78.3 | \n", "31 | \n", "
40 | \n", "183304 | \n", "F | \n", "176 | \n", "77.4 | \n", "30 | \n", "
41 | \n", "184706 | \n", "M | \n", "183 | \n", "83.7 | \n", "40 | \n", "
42 | \n", "185138 | \n", "M | \n", "169 | \n", "71.4 | \n", "28 | \n", "
43 | \n", "185223 | \n", "F | \n", "170 | \n", "72.3 | \n", "41 | \n", "
44 | \n", "186041 | \n", "M | \n", "175 | \n", "76.6 | \n", "25 | \n", "
45 | \n", "186887 | \n", "M | \n", "154 | \n", "59.3 | \n", "26 | \n", "
46 | \n", "187016 | \n", "M | \n", "161 | \n", "64.8 | \n", "32 | \n", "
47 | \n", "198157 | \n", "M | \n", "180 | \n", "81.0 | \n", "33 | \n", "
48 | \n", "199112 | \n", "M | \n", "172 | \n", "74.0 | \n", "33 | \n", "
49 | \n", "199614 | \n", "F | \n", "164 | \n", "67.2 | \n", "31 | \n", "