3.8. Tweaking plots#

In this section we cover a some points about adjusting the appearance of plots

Matplotlib#

Seaborn is designed to produce nice looking plots without us having to manually set many options

If we want to manually set something like the axis labels or axis range, many of the functions for doing this are from the Matplotlib library which contains a lot of lower level plotting functions (things that produce or edit bits of plots, rather than producing a whole nice figure in one step).

In the olden days, people made plots just with Matplotlib and had to write a lot more code to achieve a nice looking plot.

Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas 
import seaborn as sns
sns.set_theme()

Import the data#

We’ll use the Titanic data again

titanic = pandas.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/data/titanic_2.csv')
display(titanic)
Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 11 columns

Subplot#

Say for some reason we want to plot a histogram of age separately for men and women.

We might like to show these plots next to eachother as panels of a larger figure

This is achieved using the function plt.subplot which creates… a figure with multiple panels or subplots.

plt.subplot(1,2,1)
sns.histplot(data=titanic.loc[titanic.Sex=='male'], x='Age', color='b', bins=range(0,80,5))
plt.legend(['men'])

plt.subplot(1,2,2)
sns.histplot(data=titanic.loc[titanic.Sex=='female'], x='Age', color='r', bins=range(0,80,5))
plt.legend(['women'])

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/b3f6b035e643a2e2943b3b4c785ad43d021ca9475d0197adb56c89b19f4b0c18.png

The arguments of plt.subplot are the number of rows and columns to be created in the figure, and then the location in which to place the next plot.

In the example above we have one row and two columns, hence we call plt.subplot(1 [rows],2 [columns],1 [location for next plot]) for the first plot.

Can you change the code in the block below to organize the panels one above the other, rather than next to each other?

plt.subplot(1,2,1) # edit this line!
sns.histplot(data=titanic.loc[titanic.Sex=='male'], x='Age', color='b', bins=range(0,80,5))
plt.legend(['men'])

plt.subplot(1,2,2) # edit this line!
sns.histplot(data=titanic.loc[titanic.Sex=='female'], x='Age', color='r', bins=range(0,80,5))
plt.legend(['women'])

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/b3f6b035e643a2e2943b3b4c785ad43d021ca9475d0197adb56c89b19f4b0c18.png

Ah, it’s actually a bit easier to compare the distributions when they are arranged vertically! Good choice!

Can you edit the code block below to produce three stacked subplots showing the age distribution in each class?

# Edit this code block!

plt.subplot(1,2,1) 
sns.histplot(data=titanic.loc[titanic.Sex=='male'], x='Age', color='b', bins=range(0,80,5))
plt.legend(['men'])

plt.subplot(1,2,2) 
sns.histplot(data=titanic.loc[titanic.Sex=='female'], x='Age', color='r', bins=range(0,80,5))
plt.legend(['women'])

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/b3f6b035e643a2e2943b3b4c785ad43d021ca9475d0197adb56c89b19f4b0c18.png

Adjust axes#

It is often easier to compare across plots if the axis ranges are the same.

Seaborn will automatically adjust the axes to fit the range of the data in each plot, which normally means the axis ranges don’t match across subplots.

We can set the axis range using the functions plt.ylim (to set the limits in y) and plt.xlim (to set the limits in x)

Let’s remake our two side-by-side subbplots of age of men and women, and this time set the y axis to have the same range

plt.subplot(1,2,1) 
sns.histplot(data=titanic.loc[titanic.Sex=='male'], x='Age', color='b', bins=range(0,80,5))
plt.ylim([0,80])
plt.legend(['men'])

plt.subplot(1,2,2) 
sns.histplot(data=titanic.loc[titanic.Sex=='female'], x='Age', color='r', bins=range(0,80,5))
plt.ylim([0,80])
plt.legend(['women'])

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/c1a473de0fce5023ade771ed28fbc807dad46cd1fef22719b49d60443a4df418.png

Ooh, suddenly we can see that there were a lot more men than women on the Titanic!

Set axis labels#

Your axis labels should always convey what is plotted. If you are using Seaborn with a Pandas dataframe, the axis labels will often be the column labels from your dataframe, which are (usually) meaningful although sometimes they are odd codes that wouldn’t mean much to a naive reader (The titanic dataset is a culprit here - what is Pclass? SibSp? Parch?!).

You can always edit the labels on the axes and unless their meaning is clear, you must do so.

Just for fun let’s label the x axis “bananas” and the y axis “fruitbats”

plt.subplot(1,2,1) 
sns.histplot(data=titanic.loc[titanic.Sex=='male'], x='Age', color='b', bins=range(0,80,5))
plt.ylim([0,80])
plt.xlabel('bananas')
plt.ylabel('fruitbats')
plt.legend(['mad'])

plt.subplot(1,2,2) 
sns.histplot(data=titanic.loc[titanic.Sex=='female'], x='Age', color='r', bins=range(0,80,5))
plt.ylim([0,80])
plt.xlabel('bananas')
plt.ylabel('fruitbats')
plt.legend(['madder'])

plt.subplots_adjust(wspace = 0.5) # shift the plots sideways so they don't overlap
_images/388047d8a228304ec6318f9f9e59534ed7d2b3002aad1292204c96f45ccf82b9.png