1.2. Learning Objectives#

1.2.1. Conceptual#

This week we are thinking about how to describe data – covering measures of centre (mean, median, mode), measures of spread (variance, standard deviation, inter quartile range, percentiles), and description of distributions (shape and skew).

After this week you should understand:

  • Conceptual difference between the mean, median and mode, and when each is used

  • Conceptual difference between the standard deviation and interquartile range and when each is used

  • Why measures based on ranks (median and inter quartile range) are robust to outliers

  • Why the mean is useful in predicting the behaivour of large samples

  • Describe the shape and skew of a distribution in words (based on viewing a data plot)

  • Make predictions about the shape of a distribution from summary statistics (for example, what is the skew for a distribution where the median is higher than the mean?)

  • Appreciate common factors affecting the shape of distributions (what happens when a measure can only take values above zero for example).

  • Understand what correlation is, and what correlated data look like ona scatter plot

  • Understand the assumptions of Pearson’s correlation coefficient, and when to use Spearman’s and Pearson’s correlation coefficients

This material is covered in the lecture (also in the lecture videos on Canvas)

1.2.2. Python skills#

We are working with Pandas dataframes and some of the associated methods

After this week you should be able to:

  • Read data from a .csv file into a pandas dataframe using pandas.read_csv()

  • View a dataframe using df.display() including viewing only certain rows (selected by row index or condition)

  • Obtain a set of descriptive statsitics using df.describe(), including obbtaining statistics for a subset of rows or columns

  • Obtain specific descriptive statistics using methods such as df.mean(), df.count(), df.quantile(), and df.corr() including for a subset of rows or columns

This material is covered in the Jupyter Notebooks in this section

There is also a DataCamp module on Pandas - you may wish to revisit this for further Python practice