4.2. Learning Objectives#
Conceptual#
This week we are looking at relationships between two variables measured in a bivariate (or multivariate) dataset.
After this week you should be able to:
- Recognise and understand the equations for covariance and correlation
- Explain conceptually the difference between covariance and correlation (whereby correlation is a normalized form of covariance)
- Explain what would happen to correlation and covariance if the units of the variables x or y or both are changed
- Recognise datasets with features that violate the assumptions for Pearson's
correlation:
- non-linear relationship
- outliers
- heteroscedasticity
- Explain why a rank-based correlation methon (Spearman's correlation) resolves the issue in each case
- Visually identify interactions and crossover interactions in bar plots of categorical data
This material is covered in the lecture (also in the lecture videos on Canvas)
Python skills#
We are working with Pandas dataframes and some of the associated methods
After this week you should be able to:
- Calculate the covariance matrix for a dataframe and understand what each entry in the matrix means
- Calculate the correlation matrix for a dataframe and identify the correlation value for a given pair of variables
- Obtain a correlation matrix for just the relevant columns, by creating a reduced size dataframe
- Change the correlation method between Pearson and Spearman's correlation
This material is covered in the Jupyter Notebooks in this section