2.9. Tutorial exercises I#

You should work through this is the tutorial. The idea is to bring together the skills you have learned (and highlight any gaps to discuss with your tutor)

Car park exercise#

In this exercise, you will plan car parking at a ferry terminal and inside the ferry itself.

You will be given data about the lengths of vehicles in a .csv file. By plotting the data and calculating descriptive statististics, you will produce a short report recommending the size and number of parking spots required.

The brief:

The SpeedyFerry Company are planning a new terminal. Vehicles will arrive at the terminal in advance of their sailing time and be parked in a car park to await boarding.

SpeedyFerry would like to know how to mark out the car park. They want to fit as many parking spaces into their land as possible, whilst still making sure that the vehicles fit in the spaces

  • How long and wide should the parking spots be?
  • Should different vehicle types be separated in different sections of the car park?
  • If so, what ratio of long vehicle places to short vehicle places is needed?

Your task is to produce a report answering these questions, justifying you answer with plots and descriptive statistics based on the sample data provided by SpeedyFerry, introduced below

Picture of some cars

Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas 
import seaborn as sns
sns.set_theme()

Load and view the data#

To make our plan for car parking, we need some information about the vehicles to be accommodated.

SpeedyFerry have provided a data file with a complete list of the vehicles parked at a vehicle-ferry terminal at 1pm on Sunday 24th April 2022, which they regard as a representative sample.

Let’s load the datafile “data/vehicles.csv” and have a look what information we have in the dataset

vehicles = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/vehicles_2.csv')
display(vehicles)
length height width type
0 3.9187 1.5320 1.8030 car
1 4.6486 1.5936 1.6463 car
2 3.5785 1.5447 1.7140 car
3 3.5563 1.5549 1.7331 car
4 4.0321 1.5069 1.7320 car
... ... ... ... ...
1359 15.5000 4.2065 2.5112 truck
1360 14.4960 4.1965 2.5166 truck
1361 9999.0000 4.1964 2.4757 truck
1362 14.3700 4.2009 2.5047 truck
1363 14.2350 4.2016 2.5212 truck

1364 rows × 4 columns

That was a long list of vehicles!

  • What information do we have about each vehicle?

Data cleaning#

Some implausible vehicle lengths are in the sample. They must be data entry errors.

Find them and replace them with NaNs.

# your code here to find the long vehicles

vehicles.sort_values(by='length', ascending=False)


# replace the incorrect vehicle lengths with NaNs
# vehicles.loc(.....)=np.nan
length height width type
1022 9999.0000 2.9010 2.2571 towing
1361 9999.0000 4.1964 2.4757 truck
1121 9999.0000 3.8834 2.4869 truck
1093 9999.0000 3.9173 2.5168 truck
1008 94.7230 2.8883 2.2566 towing
... ... ... ... ...
6 3.2169 1.5708 1.7401 car
469 3.1957 1.5372 1.7438 car
811 3.1682 1.5888 1.7338 car
512 3.1197 1.4932 1.7817 car
653 3.1109 1.5512 1.7912 car

1364 rows × 4 columns

2.10. Your report for SpeedyFerry#

This is a stub for your report to SpeedyFerry.

The text in each markdown cell is given to guide you. You will replace this with your own text.

Similarly, you will edit the code in each code cell to produce the necessary plots and statistics.

This stub is quite structured to guide you through the process. Later in the course, you will develop your reports with less structured guidance.

# load the data
vehicles = ### your code here to load the csv file

# replace bad values with NaNs
  Cell In[4], line 2
    vehicles = ### your code here to load the csv file
               ^
SyntaxError: invalid syntax

Description of vehicle types and sizes#

Based on the sample data recorded at 1pm on Sunday 24th April 2022, the vehicles to be accommodated fall into XXX categories:

  • cars

  • xxx

  • xxx

The majority of vehicles are cars.

# your code to count vehicles by type - 
# hint use groupby() and describe(), or use value_counts()
vehicles.groupby(['type']).describe()
length height width
count mean std min 25% 50% 75% max count mean ... 75% max count mean std min 25% 50% 75% max
type
car 981.0 4.269801 1.683456 3.1109 3.81810 4.1216 4.52020 45.438 981.0 1.580810 ... 1.6119 1.8993 981.0 1.791925 0.046921 1.6241 1.7602 1.79040 1.82090 1.9580
towing 53.0 198.786094 1372.101739 7.2561 8.13230 8.7012 9.23470 9999.000 53.0 2.897838 ... 2.9064 2.9445 53.0 2.248326 0.008222 2.2292 2.2442 2.24790 2.25400 2.2642
truck 330.0 104.695470 949.142610 11.1480 12.57725 14.4005 15.08325 9999.000 330.0 4.072725 ... 4.2009 4.2137 330.0 2.501304 0.015871 2.4629 2.4898 2.50145 2.51155 2.5467

3 rows × 24 columns

The length and width of vehicles differs substantially between classes

# produce a plot to illustrate the distribution of lengths and widths for each class
plt.subplot(1,2,1)
sns.histplot(data=vehicles, x="length", bins = np.arange(0,16,0.5), hue="type")
plt.xlabel('vehicle length (m)')

plt.subplot(1,2,2)
sns.histplot(data=vehicles, x="width", bins = np.arange(1.5,3,0.1), hue="type")
plt.xlabel('vehicle width (m)')

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/f88374e0cfb6240c87c8f1dec31469f7666ca8ef866e216f27cedfddbb468249.png

The mean length of cars is 4.20m (sd 0.51), the mean length of trucks < your text here > and tows < your text here >.

# Your code here to output the mean and s.d. of length for each class

The mean width of cars is < your text here giving descriptives for width of each class >

# Your code here to output the mean and s.d. of width for each class

therefore we would recommend …..[your comment on how to segregate the parking areas for vehicle classes]……:

Size and number of parking spaces in each zone#

We recommend that the parking spaces in each zone should be sized to fit the 95th centile in length and width of each vehicle class. The exact lengths are: /

# edit this code to give the 95th percentile (0.95 quantile) of measurements for each vehicle type
#

vehicles.groupby(['type']).q# complete the line!......
length height width
type
car 4.61130 1.62480 1.83000
towing 9.27982 2.91196 2.25472
truck 15.21260 4.20212 2.51552

Given the observed frequencies in each vehicle class, we recommend the following minimum number of spaces in each zone, which is our observed vehicle counts +10% /< your text here - />

# your code to give the number of vehicles in each class - 
# hint - similar to the code above but use .count() instead of .quantile()
length height width
type
car 981 981 981
towing 53 53 53
truck 330 330 330

The end!#