{
"cells": [
{
"cell_type": "markdown",
"id": "1a7c05e7",
"metadata": {},
"source": [
"# 2D Data: Correlation and Pairwise Effects\n",
"\n",
"In some datasets, the key point of interest is the relationship between two variables. Important experimental examples would be:\n",
"\n",
"* paried designs (where pairs of participants are compared, to balance out external variables - for example: \n",
" * patients and control participants may be matched on age and sex\n",
"* Repeated measures designs, where the ame participant completes all conditions in the experiment\n",
" * A patient's blood pressure before and after taking a drug\n",
" * Reaction time on the same task with and without distraction\n",
" \n",
"\n",
"If we want to see the relationship between paired measurements, we need a type of plot that shows that relationship. Good examples are:\n",
"\n",
"* scatterplot `sns.scatterplot()`\n",
"* scatterplot with regression line `sns.regplot()`\n",
"* 2D histogram `sns.histplot()`\n",
"* 2D KDE plot `sns.kde()`"
]
},
{
"cell_type": "markdown",
"id": "1c691267",
"metadata": {},
"source": [
"## Example: brother/sister heights\n",
"\n",
"\n",
"\n",
"\n",
"A researcher hypothesises that men are taller than women.\n",
"\n",
"He also notices that there is a considerable genetic influence on height, with some families being taller than others\n",
"\n",
"He decides to control for this by comparing the heights of brothers and sisters (shared genetic influence, shared upbringing). This is a paired design.\n",
"\n",
"I have provided some made-up data\n",
"\n",
"### Set up Python libraries\n",
"\n",
"As usual, run the code cell below to import the relevant Python libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2d8ecdb6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Set-up Python libraries - you need to run this but you don't need to change it\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import scipy.stats as stats\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"sns.set_theme(style='white')\n",
"import statsmodels.api as sm\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "markdown",
"id": "cfea7d05",
"metadata": {},
"source": [
"### Load and inspect the data\n",
"\n",
"Load the file BrotherSisterData.csv which contains heights in cm for 25 fictional brother-sister pairs"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "abe0d2bc",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | brother | \n", "sister | \n", "
---|---|---|
0 | \n", "174 | \n", "172 | \n", "
1 | \n", "183 | \n", "180 | \n", "
2 | \n", "154 | \n", "148 | \n", "
3 | \n", "172 | \n", "180 | \n", "
4 | \n", "172 | \n", "165 | \n", "
5 | \n", "161 | \n", "159 | \n", "
6 | \n", "167 | \n", "159 | \n", "
7 | \n", "172 | \n", "164 | \n", "
8 | \n", "195 | \n", "188 | \n", "
9 | \n", "189 | \n", "175 | \n", "
10 | \n", "161 | \n", "160 | \n", "
11 | \n", "181 | \n", "177 | \n", "
12 | \n", "175 | \n", "168 | \n", "
13 | \n", "170 | \n", "169 | \n", "
14 | \n", "175 | \n", "165 | \n", "
15 | \n", "169 | \n", "164 | \n", "
16 | \n", "169 | \n", "163 | \n", "
17 | \n", "180 | \n", "176 | \n", "
18 | \n", "180 | \n", "176 | \n", "
19 | \n", "180 | \n", "172 | \n", "
20 | \n", "175 | \n", "170 | \n", "
21 | \n", "162 | \n", "157 | \n", "
22 | \n", "175 | \n", "172 | \n", "
23 | \n", "181 | \n", "179 | \n", "
24 | \n", "173 | \n", "171 | \n", "
\n", " | Gender | \n", "Height | \n", "Weight | \n", "
---|---|---|---|
0 | \n", "Male | \n", "73.847017 | \n", "241.893563 | \n", "
1 | \n", "Male | \n", "68.781904 | \n", "162.310473 | \n", "
2 | \n", "Male | \n", "74.110105 | \n", "212.740856 | \n", "
3 | \n", "Male | \n", "71.730978 | \n", "220.042470 | \n", "
4 | \n", "Male | \n", "69.881796 | \n", "206.349801 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
9995 | \n", "Female | \n", "66.172652 | \n", "136.777454 | \n", "
9996 | \n", "Female | \n", "67.067155 | \n", "170.867906 | \n", "
9997 | \n", "Female | \n", "63.867992 | \n", "128.475319 | \n", "
9998 | \n", "Female | \n", "69.034243 | \n", "163.852461 | \n", "
9999 | \n", "Female | \n", "61.944246 | \n", "113.649103 | \n", "
10000 rows × 3 columns
\n", "