{ "cells": [ { "cell_type": "markdown", "id": "dac0a1e3", "metadata": {}, "source": [ "# Python skills check\n", "\n", "Here we will review all the Python skills you should know by the end of this week" ] }, { "cell_type": "markdown", "id": "554bd33b", "metadata": {}, "source": [ "### Set up Python libraries\n", "\n", "As usual, run the code cell below to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 6, "id": "6e215164", "metadata": {}, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas \n", "import seaborn as sns\n", "sns.set_theme()" ] }, { "cell_type": "markdown", "id": "10405553", "metadata": {}, "source": [ "### Load the data\n", "\n", "\"Picture\n", "\n", "
\n", "
\n", "\n", "Let's load some data about the passengers of the Titanic from the file \"data/titanic.csv\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "efd6c290", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
00103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
11211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
22313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
33411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
44503Allen, Mr. William Henrymale35.0003734508.0500NaNS
..........................................
88688688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88988989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 13 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 PassengerId Survived Pclass \\\n", "0 0 1 0 3 \n", "1 1 2 1 1 \n", "2 2 3 1 3 \n", "3 3 4 1 1 \n", "4 4 5 0 3 \n", ".. ... ... ... ... \n", "886 886 887 0 2 \n", "887 887 888 1 1 \n", "888 888 889 0 3 \n", "889 889 890 1 1 \n", "890 890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 13 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "titanic = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/titanic.csv')\n", "display(titanic)" ] }, { "cell_type": "markdown", "id": "3e7aafdf", "metadata": {}, "source": [ "You can find some information abbout this dataset on Kaggle including explanations of the less obvious column headers\n", "\n", "### Get descriptives\n", "\n", "Let's get some descriptive statistics, just for practice:" ] }, { "cell_type": "code", "execution_count": 25, "id": "b97861c3", "metadata": {}, "outputs": [], "source": [ "# How many people were in each class? Hint - use df.value_counts() which we saw on the page on data cleaning" ] }, { "cell_type": "code", "execution_count": 26, "id": "a7c6c2f4", "metadata": {}, "outputs": [], "source": [ "# What was the mean fare in each class? Hint- use .mean() and .groupby()" ] }, { "cell_type": "code", "execution_count": 32, "id": "173b1660", "metadata": {}, "outputs": [], "source": [ "# What was the standard deviation of fare in each class? Hint- use .std() and .groupby()" ] }, { "cell_type": "code", "execution_count": 33, "id": "214c7855", "metadata": {}, "outputs": [], "source": [ "# What was the 10th and 90th centile of age overall?" ] }, { "cell_type": "code", "execution_count": 39, "id": "4642f6b8", "metadata": {}, "outputs": [], "source": [ "# display rows 400-420 of the dataframe" ] }, { "cell_type": "code", "execution_count": 58, "id": "8472eea7", "metadata": {}, "outputs": [], "source": [ "# display only passengers under 12 years old" ] }, { "cell_type": "code", "execution_count": 45, "id": "bb8eb080", "metadata": {}, "outputs": [], "source": [ "# display only passengers whose age is unknown (NaN)" ] }, { "cell_type": "code", "execution_count": 60, "id": "3a992d21", "metadata": {}, "outputs": [], "source": [ "# count how many passengers' age was unknown" ] }, { "cell_type": "code", "execution_count": 61, "id": "1f9ca5d8", "metadata": {}, "outputs": [], "source": [ "# display only passengers over 70 years old" ] }, { "cell_type": "markdown", "id": "b46f3365", "metadata": {}, "source": [ "Wait a minute!\n", "\n", "There was something strange in that last dataframe. Maybe someone's age was mis-recorded?" ] }, { "cell_type": "code", "execution_count": 106, "id": "c6ac34e2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
42042042103Gheorgheff, Mr. StaniomaleNaN003492547.8958NaNC
42142142203Charters, Mr. Davidmale21.000A/5. 130327.7333NaNQ
42242242303Zimmerman, Mr. Leomale290.0003150827.8750NaNS
42342342403Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria ...female28.01134708014.4000NaNS
42442442503Rosblom, Mr. Viktor Richardmale18.01137012920.2125NaNS
\n", "
" ], "text/plain": [ " Unnamed: 0 PassengerId Survived Pclass \\\n", "420 420 421 0 3 \n", "421 421 422 0 3 \n", "422 422 423 0 3 \n", "423 423 424 0 3 \n", "424 424 425 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "420 Gheorgheff, Mr. Stanio male NaN 0 \n", "421 Charters, Mr. David male 21.0 0 \n", "422 Zimmerman, Mr. Leo male 290.0 0 \n", "423 Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria ... female 28.0 1 \n", "424 Rosblom, Mr. Viktor Richard male 18.0 1 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "420 0 349254 7.8958 NaN C \n", "421 0 A/5. 13032 7.7333 NaN Q \n", "422 0 315082 7.8750 NaN S \n", "423 1 347080 14.4000 NaN S \n", "424 1 370129 20.2125 NaN S " ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# replace the misrecorded age with NaN - hint - check the page on data cleaning\n", "\n", "# and display the relevant part of the dataframe to check\n", "titanic[420:425]" ] }, { "cell_type": "code", "execution_count": null, "id": "f1acd658", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }