{ "cells": [ { "cell_type": "markdown", "id": "56ce9f7f-a824-4f54-966a-64d6fcc2be3d", "metadata": {}, "source": [ "# Effect size (Cohen's d)\n", "\n", "The first ingredient in a power analysis is **effect size**. \n", "\n", "Power analysis determines the sample size needed to detect an effect of a certain size.\n", "\n", "What is **effect size**? It is a measure of whether the effect (difference of means, correlation) of interest is big or small, *relative to the random noise or variability in the data*.\n", "\n", "In this notebook we look at the effect size for the $t$-test and for Pearson's correlation. We will see that:\n", "\n", "* The effect size for the $t$-test is Cohen's $d$, where\n", "\n", "$$ d = \\frac{\\bar{x_1}-\\bar{x_2}}{s} $$\n", "\n", "* The effect size for Pearson's correlation is simply the correlation coefficient, $r$\n", "\n", "### Set up Python libraries\n", "\n", "As usual, run the code cell below to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 3, "id": "c4b005e8-fa09-46ab-a67f-5be8eef52546", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "id": "2e61e927-cbf1-4dae-bf36-7151494bae71", "metadata": {}, "source": [ "## Effect size for the $t$-test\n", "\n", "**Example:**\n", "\n", "A researcher hypothesises that geography students are taller than psychology students.\n", "\n", "$\\mathcal{H_o}:$ The mean heights of psychology ($\\mu_p$) and geography ($\\mu_g$) students are the same; $\\mu_p = \\mu_g$\n", "\n", "$\\mathcal{H_a}:$ The mean heights of geography students is greater than the mean height of psychology students; $\\mu_g > \\mu_p$\n", "\n", "\n", "He measures the heights of 12 geography students an 10 psychology students, which are given in the dataframe below:" ] }, { "cell_type": "code", "execution_count": 4, "id": "11bf9185-8f0e-45e4-943f-f99e97ac6ef8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | studentID | \n", "subject | \n", "height | \n", "
---|---|---|---|
0 | \n", "186640 | \n", "psychology | \n", "154.0 | \n", "
1 | \n", "588140 | \n", "psychology | \n", "156.3 | \n", "
2 | \n", "977390 | \n", "psychology | \n", "165.6 | \n", "
3 | \n", "948470 | \n", "psychology | \n", "162.0 | \n", "
4 | \n", "564360 | \n", "psychology | \n", "162.0 | \n", "
5 | \n", "604180 | \n", "psychology | \n", "159.0 | \n", "
6 | \n", "770760 | \n", "psychology | \n", "166.1 | \n", "
7 | \n", "559170 | \n", "psychology | \n", "165.9 | \n", "
8 | \n", "213240 | \n", "psychology | \n", "163.7 | \n", "
9 | \n", "660220 | \n", "psychology | \n", "165.6 | \n", "
10 | \n", "311550 | \n", "psychology | \n", "163.1 | \n", "
11 | \n", "249170 | \n", "psychology | \n", "176.6 | \n", "
12 | \n", "139690 | \n", "geography | \n", "171.6 | \n", "
13 | \n", "636160 | \n", "geography | \n", "171.5 | \n", "
14 | \n", "649650 | \n", "geography | \n", "154.6 | \n", "
15 | \n", "595280 | \n", "geography | \n", "162.6 | \n", "
16 | \n", "772880 | \n", "geography | \n", "164.4 | \n", "
17 | \n", "174880 | \n", "geography | \n", "168.6 | \n", "
18 | \n", "767580 | \n", "geography | \n", "175.3 | \n", "
19 | \n", "688870 | \n", "geography | \n", "168.4 | \n", "
20 | \n", "723650 | \n", "geography | \n", "183.5 | \n", "
21 | \n", "445960 | \n", "geography | \n", "164.1 | \n", "