4.2. Learning Objectives#

Conceptual#

This week we are looking at relationships between two variables measured in a bivariate (or multivariate) dataset.

After this week you should be able to:

  • Recognise and understand the equations for covariance and correlation
  • Explain conceptually the difference between covariance and correlation (whereby correlation is a normalized form of covariance)
  • Explain what would happen to correlation and covariance if the units of the variables x or y or both are changed
  • Recognise datasets with features that violate the assumptions for Pearson's correlation:
    1. non-linear relationship
    2. outliers
    3. heteroscedasticity
  • Explain why a rank-based correlation methon (Spearman's correlation) resolves the issue in each case
  • Visually identify interactions and crossover interactions in bar plots of categorical data

This material is covered in the lecture (also in the lecture videos on Canvas)

Python skills#

We are working with Pandas dataframes and some of the associated methods

After this week you should be able to:

  • Calculate the covariance matrix for a dataframe and understand what each entry in the matrix means
  • Calculate the correlation matrix for a dataframe and identify the correlation value for a given pair of variables
  • Obtain a correlation matrix for just the relevant columns, by creating a reduced size dataframe
  • Change the correlation method between Pearson and Spearman's correlation

This material is covered in the Jupyter Notebooks in this section