2.2. Learning Objectives#
Conceptual#
This week we are thinking about how to describe data – covering measures of centre (mean, median, mode), measures of spread (variance, standard deviation, inter quartile range, percentiles), and description of distributions (shape and skew).
After this week you should understand:
- Conceptual difference between the mean, median and mode, and when each is used
- Conceptual difference between the standard deviation and interquartile range and when each is used
- Why measure based on ranks (median and inter quartile range) are robust to outliers
- Why the mean is useful in predicting the behaivour of large samples
- Describe the shape and skew of a distribution in words (based on viewing a data plot)
- Make predictions about the shape of a distribution from summary statistics (for example, what is the skew for a distribution where the median is higher than the mean?)
- Appreciate common factors affecting the shape of distributions (what happens when a measure can only take values above zero for example).
This material is covered in the lecture (also in the lecture videos on Canvas)
Python skills#
We are working with Pandas dataframes and some of the associated methods
After this week you should be able to:
- Read data from a .csv file into a pandas dataframe using pandas.read_csv
- View a dataframe using display() including viewing only certain rows (selected by row index or condition)
- Obtain a set of descriptive statsitics using describe(), including for a subset of rows or columns
- Obtain specific descriptive statistics using methods such as mean(), count(), quantile(), including for a subset of rows or columns
- Remove rows from a dataframe (eg those corresponding to bad data records)
- Replace values in a dataframe with new values or with NaN
This material is covered in the Jupyter Notebooks in this section