Tutorial exercises I

2.9. Tutorial exercises I#

You should work through this is the tutorial. The idea is to bring together the skills you have learned (and highlight any gaps to discuss with your tutor)

Car park exercise#

In this exercise, you will plan car parking at a ferry terminal and inside the ferry itself.

You will be given data about the lengths of vehicles in a .csv file. By plotting the data and calculating descriptive statististics, you will produce a short report recommending the size and number of parking spots required.

The brief:

The SpeedyFerry Company are planning a new terminal. Vehicles will arrive at the terminal in advance of their sailing time and be parked in a car park to await boarding.

SpeedyFerry would like to know how to mark out the car park. They want to fit as many parking spaces into their land as possible, whilst still making sure that the vehicles fit in the spaces

How long and wide should the parking spots be?
Should different vehicle types be separated in different sections of the car park?
If so, what ratio of long vehicle places to short vehicle places is needed?

Your task is to produce a report answering these questions, justifying you answer with plots and descriptive statistics based on the sample data provided by SpeedyFerry, introduced below

Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas 
import seaborn as sns
sns.set_theme()

Load and view the data#

To make our plan for car parking, we need some information about the vehicles to be accommodated.

SpeedyFerry have provided a data file with a complete list of the vehicles parked at a vehicle-ferry terminal at 1pm on Sunday 24th April 2022, which they regard as a representative sample.

Let’s load the datafile “data/vehicles.csv” and have a look what information we have in the dataset

vehicles = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/vehicles_2.csv')
display(vehicles)

	length	height	width	type
0	3.9187	1.5320	1.8030	car
1	4.6486	1.5936	1.6463	car
2	3.5785	1.5447	1.7140	car
3	3.5563	1.5549	1.7331	car
4	4.0321	1.5069	1.7320	car
...	...	...	...	...
1359	15.5000	4.2065	2.5112	truck
1360	14.4960	4.1965	2.5166	truck
1361	9999.0000	4.1964	2.4757	truck
1362	14.3700	4.2009	2.5047	truck
1363	14.2350	4.2016	2.5212	truck

1364 rows × 4 columns

That was a long list of vehicles!

What information do we have about each vehicle?

Data cleaning#

Some implausible vehicle lengths are in the sample. They must be data entry errors.

Find them and replace them with NaNs.

# your code here to find the long vehicles

vehicles.sort_values(by='length', ascending=False)

# replace the incorrect vehicle lengths with NaNs
# vehicles.loc(.....)=np.nan

	length	height	width	type
1022	9999.0000	2.9010	2.2571	towing
1361	9999.0000	4.1964	2.4757	truck
1121	9999.0000	3.8834	2.4869	truck
1093	9999.0000	3.9173	2.5168	truck
1008	94.7230	2.8883	2.2566	towing
...	...	...	...	...
6	3.2169	1.5708	1.7401	car
469	3.1957	1.5372	1.7438	car
811	3.1682	1.5888	1.7338	car
512	3.1197	1.4932	1.7817	car
653	3.1109	1.5512	1.7912	car

1364 rows × 4 columns

2.10. Your report for SpeedyFerry#

This is a stub for your report to SpeedyFerry.

The text in each markdown cell is given to guide you. You will replace this with your own text.

Similarly, you will edit the code in each code cell to produce the necessary plots and statistics.

This stub is quite structured to guide you through the process. Later in the course, you will develop your reports with less structured guidance.

# load the data
vehicles = ### your code here to load the csv file

# replace bad values with NaNs

  Cell In[4], line 2
    vehicles = ### your code here to load the csv file
               ^
SyntaxError: invalid syntax

Description of vehicle types and sizes#

Based on the sample data recorded at 1pm on Sunday 24th April 2022, the vehicles to be accommodated fall into XXX categories:

cars
xxx
xxx

The majority of vehicles are cars.

# your code to count vehicles by type - 
# hint use groupby() and describe(), or use value_counts()
vehicles.groupby(['type']).describe()

	length								height					width
	count	mean	std	min	25%	50%	75%	max	count	mean	...	75%	max	count	mean	std	min	25%	50%	75%	max
type
car	981.0	4.269801	1.683456	3.1109	3.81810	4.1216	4.52020	45.438	981.0	1.580810	...	1.6119	1.8993	981.0	1.791925	0.046921	1.6241	1.7602	1.79040	1.82090	1.9580
towing	53.0	198.786094	1372.101739	7.2561	8.13230	8.7012	9.23470	9999.000	53.0	2.897838	...	2.9064	2.9445	53.0	2.248326	0.008222	2.2292	2.2442	2.24790	2.25400	2.2642
truck	330.0	104.695470	949.142610	11.1480	12.57725	14.4005	15.08325	9999.000	330.0	4.072725	...	4.2009	4.2137	330.0	2.501304	0.015871	2.4629	2.4898	2.50145	2.51155	2.5467

3 rows × 24 columns

The length and width of vehicles differs substantially between classes

# produce a plot to illustrate the distribution of lengths and widths for each class
plt.subplot(1,2,1)
sns.histplot(data=vehicles, x="length", bins = np.arange(0,16,0.5), hue="type")
plt.xlabel('vehicle length (m)')

plt.subplot(1,2,2)
sns.histplot(data=vehicles, x="width", bins = np.arange(1.5,3,0.1), hue="type")
plt.xlabel('vehicle width (m)')

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap

_images/f88374e0cfb6240c87c8f1dec31469f7666ca8ef866e216f27cedfddbb468249.png

The mean length of cars is 4.20m (sd 0.51), the mean length of trucks < your text here > and tows < your text here >.

# Your code here to output the mean and s.d. of length for each class

The mean width of cars is < your text here giving descriptives for width of each class >

# Your code here to output the mean and s.d. of width for each class

therefore we would recommend …..[your comment on how to segregate the parking areas for vehicle classes]……:

Size and number of parking spaces in each zone#

We recommend that the parking spaces in each zone should be sized to fit the 95th centile in length and width of each vehicle class. The exact lengths are: /

# edit this code to give the 95th percentile (0.95 quantile) of measurements for each vehicle type
#

vehicles.groupby(['type']).q# complete the line!......

	length	height	width
type
car	4.61130	1.62480	1.83000
towing	9.27982	2.91196	2.25472
truck	15.21260	4.20212	2.51552

Given the observed frequencies in each vehicle class, we recommend the following minimum number of spaces in each zone, which is our observed vehicle counts +10% /< your text here - />

# your code to give the number of vehicles in each class - 
# hint - similar to the code above but use .count() instead of .quantile()

	length	height	width
type
car	981	981	981
towing	53	53	53
truck	330	330	330