{ "cells": [ { "cell_type": "markdown", "id": "c8868457", "metadata": {}, "source": [ "# Permutation test for correlation\n", "\n", "In the previous examples we used permutation testing to assess the significance of a difference between groups (difference of means or mean difference).\n", "\n", "Permutation testing can also be used to assess the statistical significance of a correlation.\n", "\n", "As a reminder, a correlation can occur only in paired designs, as when two variables are correlated, it means that an individual's score on one variable is related to their score on the other variable. \n", "\n", "Correlations can be interesting in themselves (do students who score highly on English tests also score highly on maths tests?; do people who eat more broccli have greater bone density?). \n", "\n", "They can also reflect the fact that experimental measures often depend on factors other than the one we are manipulating (sometimes called confounding factors), which are what we try to control for by using a paired design. For example if we are interested in whether men earn more than women, we might use a paired design comparing brothers and sisters to take into account the very important effects of parental occupation and education on earnings which mean that high-earning brothers often have high-earning sisters. The fact that brothers' and sisters' earnings are correlated actually reflects the confounds that we want to 'cancel out' by using a paired design to test gender differences.\n", "\n", "\n", "### Set up Python libraries\n", "\n", "As usual, run the code cell below to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "5fb0416d", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf\n", "import warnings \n", "warnings.simplefilter('ignore', category=FutureWarning)" ] }, { "cell_type": "markdown", "id": "f0806e8c", "metadata": {}, "source": [ "## Toy example\n", "\n", "[A toy example is an example with a very small dataset, just to show how it works]\n", "\n", "We are interested in whether people who eat more broccoli have higher IQs.\n", "\n", "#### Question & design\n", "\n", "We hypothesise that those wo eat more broccoli ave higher IQs \n", "\n", "This is tecnically a *repeated measures design* as we have two measurements (broccoli consumption and IQ) for each individual. \n", "\n", "#### Hypotheses\n", "\n", "We can state our hypotheses as follows:\n", "\n", "$\\mathcal{H_o}:$ There is no relationsip between broccoli consumption and IQ\n", "* the correlation, Pearson's $r=0$\n", "\n", "$\\mathcal{H_a}:$ Those with higher broccoli consumption have higher IQ\n", "* the correlation, Pearson's $r>0$\n", "\n", "This is a one-tailed (directional) alternative hypothesis\n", "\n", "#### Data\n", "\n", "The following made-up data give weekly broccoli consumption in grams and IQ for 25 individuals:" ] }, { "cell_type": "code", "execution_count": 2, "id": "537072c8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | broccoli_g | \n", "IQ | \n", "
---|---|---|
0 | \n", "0 | \n", "87 | \n", "
1 | \n", "28 | \n", "91 | \n", "
2 | \n", "0 | \n", "101 | \n", "
3 | \n", "20 | \n", "92 | \n", "
4 | \n", "0 | \n", "96 | \n", "
5 | \n", "92 | \n", "95 | \n", "
6 | \n", "88 | \n", "92 | \n", "
7 | \n", "128 | \n", "94 | \n", "
8 | \n", "0 | \n", "96 | \n", "
9 | \n", "22 | \n", "99 | \n", "
10 | \n", "114 | \n", "99 | \n", "
11 | \n", "0 | \n", "96 | \n", "
12 | \n", "146 | \n", "99 | \n", "
13 | \n", "255 | \n", "108 | \n", "
14 | \n", "131 | \n", "100 | \n", "
15 | \n", "255 | \n", "107 | \n", "
16 | \n", "390 | \n", "114 | \n", "
17 | \n", "402 | \n", "107 | \n", "
18 | \n", "216 | \n", "108 | \n", "
19 | \n", "719 | \n", "104 | \n", "
20 | \n", "395 | \n", "107 | \n", "
21 | \n", "485 | \n", "114 | \n", "
22 | \n", "553 | \n", "116 | \n", "
23 | \n", "682 | \n", "116 | \n", "
24 | \n", "815 | \n", "111 | \n", "