{ "cells": [ { "cell_type": "markdown", "id": "42b9ccaa-fc57-446a-8168-5253839647e8", "metadata": {}, "source": [ "# Spearman's Rank Correlation\n", "\n", "In Chapter 1: Describing Data we looked at Spearman's Rank correlation coefficient, which is a robust correlation based on ranks.\n", "\n", "**If you are unsure about correlation coefficients, please revisit the page on correlation in Chapter 1: Describing Data**\n", "\n", "In this section on rank-based tests, we revisit Spearman's $r$ and see how to get a $p$-value for it using `scipy.stats`\n", "\n", "The reasons for using Spearman'srank correlation rather than Pearson's correlation are recapped there." ] }, { "cell_type": "code", "execution_count": 2, "id": "be001071-90c6-462a-ba35-2e731a4096a3", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "id": "c6d7c73e-abbe-4d5a-9ce3-01bd155a993c", "metadata": {}, "source": [ "## Load the data\n", "\n", "Let's use the CO2 data discussed in the section on correlation in Chapter 1: Describing Data. The dataset contains GDP (weath) and carbon emissions per person for 164 countries." ] }, { "cell_type": "code", "execution_count": 3, "id": "dc0ce091-df7a-4796-bbc9-9b6a752189d8", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | Country | \n", "CO2 | \n", "GDP | \n", "population | \n", "
---|---|---|---|---|
0 | \n", "Afghanistan | \n", "0.2245 | \n", "1934.555054 | \n", "36686788 | \n", "
1 | \n", "Albania | \n", "1.6422 | \n", "11104.166020 | \n", "2877019 | \n", "
2 | \n", "Algeria | \n", "3.8241 | \n", "14228.025390 | \n", "41927008 | \n", "
3 | \n", "Angola | \n", "0.7912 | \n", "7771.441895 | \n", "31273538 | \n", "
4 | \n", "Argentina | \n", "4.0824 | \n", "18556.382810 | \n", "44413592 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
159 | \n", "Venezuela | \n", "4.1602 | \n", "10709.950200 | \n", "29825652 | \n", "
160 | \n", "Vietnam | \n", "2.3415 | \n", "6814.142090 | \n", "94914328 | \n", "
161 | \n", "Yemen | \n", "0.3503 | \n", "2284.889893 | \n", "30790514 | \n", "
162 | \n", "Zambia | \n", "0.4215 | \n", "3534.033691 | \n", "17835898 | \n", "
163 | \n", "Zimbabwe | \n", "0.8210 | \n", "1611.405151 | \n", "15052191 | \n", "
164 rows × 4 columns
\n", "\n", " | CO2 | \n", "GDP | \n", "population | \n", "
---|---|---|---|
CO2 | \n", "1.000000 | \n", "0.914369 | \n", "-0.098554 | \n", "
GDP | \n", "0.914369 | \n", "1.000000 | \n", "-0.122920 | \n", "
population | \n", "-0.098554 | \n", "-0.122920 | \n", "1.000000 | \n", "