12.10. Loading data from a .csv file#
This section covers how to load data from a csv file saved on your own computer
You need to do this for the hand-in assignment set in 3rd week so make sure you try it, and check with your tutor in class if stuck.
#Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas
import seaborn as sns
sns.set_theme()
sns.set_style('white')
In this course you have generally worked with data in Pandas.
To get the data into pandas, you usually run a readymade code block like this:
# load the data and have a look
heartRates = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/HeartRates.csv')
display(heartRates)
cookery | horror | |
---|---|---|
0 | 60.4 | 72.9 |
1 | 53.9 | 57.0 |
2 | 54.4 | 68.3 |
3 | 60.0 | 57.4 |
4 | 67.7 | 58.7 |
5 | 56.2 | 47.0 |
6 | 61.9 | 71.8 |
7 | 58.9 | 62.1 |
8 | 65.6 | 68.6 |
9 | 54.6 | 73.8 |
10 | 85.2 | 93.1 |
11 | 87.8 | 94.8 |
12 | 90.5 | 111.4 |
13 | 92.7 | 89.7 |
14 | 85.4 | 97.4 |
15 | 77.5 | 90.9 |
16 | 81.3 | 83.9 |
17 | 79.7 | 86.9 |
18 | 96.8 | 90.1 |
19 | 81.9 | 75.4 |
Let’s take a closer look at that.
You are using a function called pandas.read_csv()
Inside the brackets is a URL for an online file repository (my Github if you are interested), from which the file will be read.
I place all the datafiles for the course on my online repository so I can edit them as needed. However, in ‘real life’ your data wouldn’t be on my github, they would be in a csv file on your own computer.
Download the datafile CloudSeeding.csv from this week’s page on Canvas and place it in the same directory (folder) as the downloaded copy of this Jupyter notebook
Now try running this code block:
clouds = pandas.read_csv('CloudSeeding.csv')
clouds
status | rainfall | |
---|---|---|
0 | Unseeded | 1202.6 |
1 | Unseeded | 830.1 |
2 | Unseeded | 372.4 |
3 | Unseeded | 345.5 |
4 | Unseeded | 321.2 |
5 | Unseeded | 244.3 |
6 | Unseeded | 163.0 |
7 | Unseeded | 147.8 |
8 | Unseeded | 95.0 |
9 | Unseeded | 87.0 |
10 | Unseeded | 81.2 |
11 | Unseeded | 68.5 |
12 | Unseeded | 47.3 |
13 | Unseeded | 41.1 |
14 | Unseeded | 36.6 |
15 | Unseeded | 29.0 |
16 | Unseeded | 28.6 |
17 | Unseeded | 26.3 |
18 | Unseeded | 26.1 |
19 | Unseeded | 24.4 |
20 | Unseeded | 21.7 |
21 | Unseeded | 17.3 |
22 | Unseeded | 11.5 |
23 | Unseeded | 4.9 |
24 | Unseeded | 4.9 |
25 | Unseeded | 1.0 |
26 | Seeded | 2745.6 |
27 | Seeded | 1697.8 |
28 | Seeded | 1656.0 |
29 | Seeded | 978.0 |
30 | Seeded | 703.4 |
31 | Seeded | 489.1 |
32 | Seeded | 430.0 |
33 | Seeded | 334.1 |
34 | Seeded | 302.8 |
35 | Seeded | 274.7 |
36 | Seeded | 274.7 |
37 | Seeded | 255.0 |
38 | Seeded | 242.5 |
39 | Seeded | 200.7 |
40 | Seeded | 198.6 |
41 | Seeded | 129.6 |
42 | Seeded | 119.0 |
43 | Seeded | 118.3 |
44 | Seeded | 115.3 |
45 | Seeded | 92.4 |
46 | Seeded | 40.6 |
47 | Seeded | 32.7 |
48 | Seeded | 31.4 |
49 | Seeded | 17.5 |
50 | Seeded | 7.7 |
51 | Seeded | 4.1 |
OOh, it worked!
Subdirectories#
Say you have all your Jupyter Notebooks (including this one) in a nice tidy folder (or directory) called StatsClassWeek3
and you don’t want lots of messy csv files lying around in there.
No problem - in your file browser, go to the folder StatsClassWeek3
and cerate a new folder (or directory) called data
. Place the csv file CloudSeeding.csv
in the folder data
Now we run the following code:
clouds = pandas.read_csv('data/CloudSeeding.csv')
clouds
status | rainfall | |
---|---|---|
0 | Unseeded | 1202.6 |
1 | Unseeded | 830.1 |
2 | Unseeded | 372.4 |
3 | Unseeded | 345.5 |
4 | Unseeded | 321.2 |
5 | Unseeded | 244.3 |
6 | Unseeded | 163.0 |
7 | Unseeded | 147.8 |
8 | Unseeded | 95.0 |
9 | Unseeded | 87.0 |
10 | Unseeded | 81.2 |
11 | Unseeded | 68.5 |
12 | Unseeded | 47.3 |
13 | Unseeded | 41.1 |
14 | Unseeded | 36.6 |
15 | Unseeded | 29.0 |
16 | Unseeded | 28.6 |
17 | Unseeded | 26.3 |
18 | Unseeded | 26.1 |
19 | Unseeded | 24.4 |
20 | Unseeded | 21.7 |
21 | Unseeded | 17.3 |
22 | Unseeded | 11.5 |
23 | Unseeded | 4.9 |
24 | Unseeded | 4.9 |
25 | Unseeded | 1.0 |
26 | Seeded | 2745.6 |
27 | Seeded | 1697.8 |
28 | Seeded | 1656.0 |
29 | Seeded | 978.0 |
30 | Seeded | 703.4 |
31 | Seeded | 489.1 |
32 | Seeded | 430.0 |
33 | Seeded | 334.1 |
34 | Seeded | 302.8 |
35 | Seeded | 274.7 |
36 | Seeded | 274.7 |
37 | Seeded | 255.0 |
38 | Seeded | 242.5 |
39 | Seeded | 200.7 |
40 | Seeded | 198.6 |
41 | Seeded | 129.6 |
42 | Seeded | 119.0 |
43 | Seeded | 118.3 |
44 | Seeded | 115.3 |
45 | Seeded | 92.4 |
46 | Seeded | 40.6 |
47 | Seeded | 32.7 |
48 | Seeded | 31.4 |
49 | Seeded | 17.5 |
50 | Seeded | 7.7 |
51 | Seeded | 4.1 |
The slash in the commmand pandas.read_csv('data/CloudSeeding.csv')
just means that data
is the name of a folder and CloudSeeding.csv
is inside that folder
Colab#
If you are on Colab you will need to upload the data file before you can read it in.
To do this you click the file icon at the left of your notebook
… a file panel opens.
Click the upload button at the top of this panel
… a file browser opens. Select the CSV file from where you downloaded it on your own computer (if you are not sure where it went, it might have gone in your Downloads folder!).
The file appears in the file panel and can now be loaded from your Colab notebook as per my instructions above