{ "cells": [ { "cell_type": "markdown", "id": "e5a2a6bc", "metadata": {}, "source": [ "# Tutorial exercises I\n", "\n", "You should work through this is the tutorial. The idea is to bring together the skills you have learned (and highlight any gaps to discuss with your tutor)" ] }, { "cell_type": "markdown", "id": "75e93be0", "metadata": {}, "source": [ "## Car park exercise\n", "\n", "In this exercise, you will plan car parking at a ferry terminal and inside the ferry itself. \n", "\n", "You will be given data about the lengths of vehicles in a .csv file. By plotting the data and calculating descriptive statististics, you will produce a short report recommending the size and number of parking spots required." ] }, { "cell_type": "markdown", "id": "abc9bef7", "metadata": {}, "source": [ "
\n", " \n", "

The brief:

\n", "\n", "The SpeedyFerry Company are planning a new terminal. Vehicles will arrive at the terminal in advance of their sailing time and be parked in a car park to await boarding.\n", "\n", "SpeedyFerry would like to know how to mark out the car park. They want to fit as many parking spaces into their land as possible, whilst still making sure that the vehicles fit in the spaces\n", " \n", " \n", "Your task is to produce a report answering these questions, justifying you answer with plots and descriptive statistics based on the sample data provided by SpeedyFerry, introduced below\n", "
\n", "\n", "\"Picture" ] }, { "cell_type": "markdown", "id": "554bd33b", "metadata": {}, "source": [ "### Set up Python libraries\n", "\n", "As usual, run the code cell below to import the relevant Python libraries" ] }, { "cell_type": "code", "execution_count": 6, "id": "6e215164", "metadata": {}, "outputs": [], "source": [ "# Set-up Python libraries - you need to run this but you don't need to change it\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "import pandas \n", "import seaborn as sns\n", "sns.set_theme()" ] }, { "cell_type": "markdown", "id": "7fe15a08", "metadata": {}, "source": [ "### Load and view the data\n", "\n", "To make our plan for car parking, we need some information about the vehicles to be accommodated.\n", "\n", "SpeedyFerry have provided a data file with a complete list of the vehicles parked at a vehicle-ferry terminal at 1pm on Sunday 24th April 2022, which they regard as a representative sample.\n", "\n", "Let's load the datafile \"data/vehicles.csv\" and have a look what information we have in the dataset" ] }, { "cell_type": "code", "execution_count": 11, "id": "71268ed1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthheightwidthtype
03.91871.53201.8030car
14.64861.59361.6463car
23.57851.54471.7140car
33.55631.55491.7331car
44.03211.50691.7320car
...............
135915.50004.20652.5112truck
136014.49604.19652.5166truck
13619999.00004.19642.4757truck
136214.37004.20092.5047truck
136314.23504.20162.5212truck
\n", "

1364 rows × 4 columns

\n", "
" ], "text/plain": [ " length height width type\n", "0 3.9187 1.5320 1.8030 car\n", "1 4.6486 1.5936 1.6463 car\n", "2 3.5785 1.5447 1.7140 car\n", "3 3.5563 1.5549 1.7331 car\n", "4 4.0321 1.5069 1.7320 car\n", "... ... ... ... ...\n", "1359 15.5000 4.2065 2.5112 truck\n", "1360 14.4960 4.1965 2.5166 truck\n", "1361 9999.0000 4.1964 2.4757 truck\n", "1362 14.3700 4.2009 2.5047 truck\n", "1363 14.2350 4.2016 2.5212 truck\n", "\n", "[1364 rows x 4 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "vehicles = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/vehicles_2.csv')\n", "display(vehicles)" ] }, { "cell_type": "markdown", "id": "cb03e0d0", "metadata": {}, "source": [ "That was a long list of vehicles!\n", "\n", "* What information do we have about each vehicle?" ] }, { "cell_type": "markdown", "id": "96aa6d60", "metadata": {}, "source": [ "### Data cleaning\n", "\n", "Some implausible vehicle lengths are in the sample. They must be data entry errors.\n", "\n", "Find them and replace them with NaNs." ] }, { "cell_type": "code", "execution_count": 22, "id": "bb22298b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthheightwidthtype
10229999.00002.90102.2571towing
13619999.00004.19642.4757truck
11219999.00003.88342.4869truck
10939999.00003.91732.5168truck
100894.72302.88832.2566towing
...............
63.21691.57081.7401car
4693.19571.53721.7438car
8113.16821.58881.7338car
5123.11971.49321.7817car
6533.11091.55121.7912car
\n", "

1364 rows × 4 columns

\n", "
" ], "text/plain": [ " length height width type\n", "1022 9999.0000 2.9010 2.2571 towing\n", "1361 9999.0000 4.1964 2.4757 truck\n", "1121 9999.0000 3.8834 2.4869 truck\n", "1093 9999.0000 3.9173 2.5168 truck\n", "1008 94.7230 2.8883 2.2566 towing\n", "... ... ... ... ...\n", "6 3.2169 1.5708 1.7401 car\n", "469 3.1957 1.5372 1.7438 car\n", "811 3.1682 1.5888 1.7338 car\n", "512 3.1197 1.4932 1.7817 car\n", "653 3.1109 1.5512 1.7912 car\n", "\n", "[1364 rows x 4 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code here to find the long vehicles\n", "\n", "vehicles.sort_values(by='length', ascending=False)\n", "\n", "\n", "# replace the incorrect vehicle lengths with NaNs\n", "# vehicles.loc(.....)=np.nan" ] }, { "cell_type": "markdown", "id": "7bfb01b7", "metadata": {}, "source": [ "# Your report for SpeedyFerry" ] }, { "cell_type": "markdown", "id": "c597c8d9", "metadata": {}, "source": [ "
\n", " \n", "This is a stub for your report to SpeedyFerry. \n", "\n", "The text in each markdown cell is given to guide you. You will replace this with your own text.\n", "\n", "Similarly, you will edit the code in each code cell to produce the necessary plots and statistics.\n", "\n", "This stub is quite structured to guide you through the process. Later in the course, you will develop your reports with less structured guidance.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "9d717d77", "metadata": {}, "outputs": [], "source": [ "# load the data\n", "vehicles = ### your code here to load the csv file\n", "\n", "# replace bad values with NaNs\n" ] }, { "cell_type": "markdown", "id": "24dd61a6", "metadata": {}, "source": [ "## Description of vehicle types and sizes\n", "\n", "Based on the sample data recorded at 1pm on Sunday 24th April 2022, the vehicles to be accommodated fall into XXX categories:\n", "* cars\n", "* xxx\n", "* xxx\n", "\n", "The majority of vehicles are cars." ] }, { "cell_type": "code", "execution_count": 18, "id": "c5d6e063", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthheightwidth
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
type
car981.04.2698011.6834563.11093.818104.12164.5202045.438981.01.580810...1.61191.8993981.01.7919250.0469211.62411.76021.790401.820901.9580
towing53.0198.7860941372.1017397.25618.132308.70129.234709999.00053.02.897838...2.90642.944553.02.2483260.0082222.22922.24422.247902.254002.2642
truck330.0104.695470949.14261011.148012.5772514.400515.083259999.000330.04.072725...4.20094.2137330.02.5013040.0158712.46292.48982.501452.511552.5467
\n", "

3 rows × 24 columns

\n", "
" ], "text/plain": [ " length \\\n", " count mean std min 25% 50% 75% \n", "type \n", "car 981.0 4.269801 1.683456 3.1109 3.81810 4.1216 4.52020 \n", "towing 53.0 198.786094 1372.101739 7.2561 8.13230 8.7012 9.23470 \n", "truck 330.0 104.695470 949.142610 11.1480 12.57725 14.4005 15.08325 \n", "\n", " height ... width \\\n", " max count mean ... 75% max count mean \n", "type ... \n", "car 45.438 981.0 1.580810 ... 1.6119 1.8993 981.0 1.791925 \n", "towing 9999.000 53.0 2.897838 ... 2.9064 2.9445 53.0 2.248326 \n", "truck 9999.000 330.0 4.072725 ... 4.2009 4.2137 330.0 2.501304 \n", "\n", " \n", " std min 25% 50% 75% max \n", "type \n", "car 0.046921 1.6241 1.7602 1.79040 1.82090 1.9580 \n", "towing 0.008222 2.2292 2.2442 2.24790 2.25400 2.2642 \n", "truck 0.015871 2.4629 2.4898 2.50145 2.51155 2.5467 \n", "\n", "[3 rows x 24 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code to count vehicles by type - \n", "# hint use groupby() and describe(), or use value_counts()\n", "vehicles.groupby(['type']).describe()" ] }, { "cell_type": "markdown", "id": "2eac220f", "metadata": {}, "source": [ "The length and width of vehicles differs substantially between classes" ] }, { "cell_type": "code", "execution_count": 55, "id": "e4064664", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# produce a plot to illustrate the distribution of lengths and widths for each class\n", "plt.subplot(1,2,1)\n", "sns.histplot(data=vehicles, x=\"length\", bins = np.arange(0,16,0.5), hue=\"type\")\n", "plt.xlabel('vehicle length (m)')\n", "\n", "plt.subplot(1,2,2)\n", "sns.histplot(data=vehicles, x=\"width\", bins = np.arange(1.5,3,0.1), hue=\"type\")\n", "plt.xlabel('vehicle width (m)')\n", "\n", "plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap" ] }, { "cell_type": "markdown", "id": "ccb2ab2e", "metadata": {}, "source": [ "The mean length of cars is 4.20m (sd 0.51), the mean length of trucks \\< your text here \\> and tows \\< your text here \\>. " ] }, { "cell_type": "code", "execution_count": null, "id": "c2ee07ed", "metadata": {}, "outputs": [], "source": [ "# Your code here to output the mean and s.d. of length for each class" ] }, { "cell_type": "markdown", "id": "cf3ddfab", "metadata": {}, "source": [ "The mean width of cars is \\< your text here giving descriptives for width of each class \\> " ] }, { "cell_type": "code", "execution_count": null, "id": "2335f3bc", "metadata": {}, "outputs": [], "source": [ "# Your code here to output the mean and s.d. of width for each class" ] }, { "cell_type": "markdown", "id": "24f7f12b", "metadata": {}, "source": [ "therefore we would recommend .....[your comment on how to segregate the parking areas for vehicle classes]......:" ] }, { "cell_type": "markdown", "id": "18d2d9b2", "metadata": {}, "source": [ "## Size and number of parking spaces in each zone\n", "\n", "We recommend that the parking spaces in each zone should be sized to fit the 95th centile in length and width of each vehicle class. \n", "The exact lengths are: /" ] }, { "cell_type": "code", "execution_count": 78, "id": "5a966d6a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthheightwidth
type
car4.611301.624801.83000
towing9.279822.911962.25472
truck15.212604.202122.51552
\n", "
" ], "text/plain": [ " length height width\n", "type \n", "car 4.61130 1.62480 1.83000\n", "towing 9.27982 2.91196 2.25472\n", "truck 15.21260 4.20212 2.51552" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# edit this code to give the 95th percentile (0.95 quantile) of measurements for each vehicle type\n", "#\n", "\n", "vehicles.groupby(['type']).q# complete the line!......\n" ] }, { "cell_type": "markdown", "id": "eb5b0f5e", "metadata": {}, "source": [ "Given the observed frequencies in each vehicle class, we recommend the following minimum number of spaces in each zone, which is our observed vehicle counts +10% /< your text here - />" ] }, { "cell_type": "code", "execution_count": 79, "id": "597f0ac2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lengthheightwidth
type
car981981981
towing535353
truck330330330
\n", "
" ], "text/plain": [ " length height width\n", "type \n", "car 981 981 981\n", "towing 53 53 53\n", "truck 330 330 330" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code to give the number of vehicles in each class - \n", "# hint - similar to the code above but use .count() instead of .quantile()" ] }, { "cell_type": "markdown", "id": "4e7d3fef", "metadata": {}, "source": [ "### The end!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }