{ "cells": [ { "cell_type": "markdown", "id": "93fda237-0977-46c0-8035-a2f8f06d61fb", "metadata": { "tags": [] }, "source": [ "# Tutorial Exercises 1: Probability Jargon in Python\n", "\n", "In this section we will revise the terms for combinations of events and how they relate to frequencies in a `pandas` dataframe.\n", "\n", "You should be able to answer the following questions with the help of the `pandas` function `query` (to find the rows matching some criterion) and the function `len()`, which finds the length of the dataframe within the parentheses.\n", "\n" ] }, { "cell_type": "markdown", "id": "1f6ca970-bb56-4098-8d3e-4dbcb15deaaa", "metadata": {}, "source": [ "### Set up Python libraries\n", "\n", "As usual, run the code cell below to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": null, "id": "5912aa02-3e45-4e81-9a26-a45334c31c26", "metadata": {}, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme(style='white')\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "id": "a5c7b791-2ea1-4d77-8f0b-1ce77c719e81", "metadata": {}, "source": [ "## Event combinations\n", "\n", "Let's work with the (made up) data on students from Beaufort and Lonsdale college." ] }, { "cell_type": "code", "execution_count": 15, "id": "d8a2688f-6f39-488a-b53a-fd4f1535e430", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ID_codeCollegeSubjectScore_preVacScore_postVac
0247610LonsdalePPE6035
1448590LonsdalePPE4344
2491100Lonsdaleengineering7969
3316150LonsdalePPE5561
4251870Lonsdaleengineering6265
..................
296440570Beauforthistory7570
297826030Beaufortmaths5249
298856260BeaufortBiology8384
299947060Beaufortengineering6265
300165780BeaufortPPE4856
\n", "

301 rows × 5 columns

\n", "
" ], "text/plain": [ " ID_code College Subject Score_preVac Score_postVac\n", "0 247610 Lonsdale PPE 60 35\n", "1 448590 Lonsdale PPE 43 44\n", "2 491100 Lonsdale engineering 79 69\n", "3 316150 Lonsdale PPE 55 61\n", "4 251870 Lonsdale engineering 62 65\n", ".. ... ... ... ... ...\n", "296 440570 Beaufort history 75 70\n", "297 826030 Beaufort maths 52 49\n", "298 856260 Beaufort Biology 83 84\n", "299 947060 Beaufort engineering 62 65\n", "300 165780 Beaufort PPE 48 56\n", "\n", "[301 rows x 5 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wb = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/WellbeingSample.csv')\n", "wb" ] }, { "cell_type": "markdown", "id": "5e2fe684-84b0-47f9-8202-228da3cdc37c", "metadata": { "tags": [] }, "source": [ "**a) Plot the data**\n", "\n", "First of all, plot the number of students taking each subject at each college, using `sns.countplot()`" ] }, { "cell_type": "code", "execution_count": 1, "id": "8a230c7b-d6c6-4327-b1c2-f874ce4e7ada", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "95b80cb8-0142-415e-a1bf-edb4cd9afc0d", "metadata": {}, "source": [ "**b) Probability of college membership**\n", "\n", "Let's start by working out the probability that a student picked from this sample is at each college\n", "\n", "* Let $B$ be the event that a randomly chosen student is a member of Beaufort College\n", "* Let $L$ be the event that a randomly chosen student is a member of Lonsdale College\n", "\n", "What are the values of $p(B)$ and $p(L)$?" ] }, { "cell_type": "code", "execution_count": 2, "id": "7b2dd99b-851d-44e2-9a22-74bd9c3e67c2", "metadata": { "tags": [] }, "outputs": [], "source": [ "#p(B)\n", "# Your code here" ] }, { "cell_type": "code", "execution_count": 3, "id": "b524e2a4-a01d-4a94-8293-bdc657efca6b", "metadata": { "tags": [] }, "outputs": [], "source": [ "#p(L)\n", "# Your code here" ] }, { "cell_type": "markdown", "id": "09425264-497e-4dcd-a36a-fff531326421", "metadata": { "tags": [] }, "source": [ "**c) Joint probability**\n", "\n", "* Let $PPE$ be the event that a randomly chosen student is studying PPE\n", "\n", "Find $p(B \\cap PPE)$\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "61e75520-08ca-45ff-8598-0caad9cde391", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "54c2af62-2a24-460d-95a5-ae5e9291ea7d", "metadata": { "tags": [] }, "source": [ "d) Union\n", "\n", "* Let $Psy$ be the event that a randomly chosen student is studying psychology\n", "* Let $Bio$ be the event that a randomly chosen student is studying biology\n", "\n", "Find $p(Psy \\cup Bio)$" ] }, { "cell_type": "code", "execution_count": 5, "id": "0d7d1c45-4c0f-45ec-a466-8bbce0a6544f", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "id": "4b498af6-095c-4695-b88b-bf2b46b50696", "metadata": { "tags": [] }, "source": [ "**e) Conditional probability**\n", "\n", "* Let $Bio$ be the event that a randomly chosen student is studying Biology\n", "* Let $Hist$ be the event that a randomly chosen student is studying history\n", "\n", "What is $p(L|Bio)$?" ] }, { "cell_type": "code", "execution_count": 7, "id": "027a7aa0-b497-4e3f-908b-eab51b436e92", "metadata": { "tags": [] }, "outputs": [], "source": [ "# p(L|Bio)\n", "# Your code here" ] }, { "cell_type": "markdown", "id": "a92577de-59bc-4c0a-aa8e-055ef16743ce", "metadata": { "tags": [] }, "source": [ "What is $p(L|Hist)$?" ] }, { "cell_type": "code", "execution_count": 6, "id": "19d24990-b86c-4c48-88ec-16cfb22aa3d2", "metadata": { "tags": [] }, "outputs": [], "source": [ "# p(L|Hist)\n", "# Your code here" ] }, { "cell_type": "code", "execution_count": null, "id": "352c5c75-5c70-428f-9aeb-763b417c7390", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "bcf8919c-a159-4072-b8e7-f8f8e0866131", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }