2.2. Learning Objectives#

Conceptual#

This week we are thinking about how to describe data – covering measures of centre (mean, median, mode), measures of spread (variance, standard deviation, inter quartile range, percentiles), and description of distributions (shape and skew).

After this week you should understand:

  • Conceptual difference between the mean, median and mode, and when each is used
  • Conceptual difference between the standard deviation and interquartile range and when each is used
  • Why measure based on ranks (median and inter quartile range) are robust to outliers
  • Why the mean is useful in predicting the behaivour of large samples
  • Describe the shape and skew of a distribution in words (based on viewing a data plot)
  • Make predictions about the shape of a distribution from summary statistics (for example, what is the skew for a distribution where the median is higher than the mean?)
  • Appreciate common factors affecting the shape of distributions (what happens when a measure can only take values above zero for example).

This material is covered in the lecture (also in the lecture videos on Canvas)

Python skills#

We are working with Pandas dataframes and some of the associated methods

After this week you should be able to:

  • Read data from a .csv file into a pandas dataframe using pandas.read_csv
  • View a dataframe using display() including viewing only certain rows (selected by row index or condition)
  • Obtain a set of descriptive statsitics using describe(), including for a subset of rows or columns
  • Obtain specific descriptive statistics using methods such as mean(), count(), quantile(), including for a subset of rows or columns
  • Remove rows from a dataframe (eg those corresponding to bad data records)
  • Replace values in a dataframe with new values or with NaN

    This material is covered in the Jupyter Notebooks in this section