Sampling

3. Sampling#

This week we will be thinking about random variability across samples

Often, we have a relatively small sample of data, and want to infer from it something about the properties of a parent or population distribution. For example: I want to know the height of the average man in Oxford.

  • I cannot measure the heights of all men in Oxford (the population) as this is impractical

  • Instead I go to Carfax and measure the heights of 10 random (adult male) passers by, my sample

    • I calculate the mean - this gives me an estimate of the mean height of all men in Oxford However it is important to as how uncertain this estimate is - in other words, how much might my estimate change if I re-did the experiment and randomly selected 10 different men for my sample?

We will see that this uncertainty depends on two factors:

  • the size of the sample, \(n\)

  • the variability (in height) in the population

We start by looking at the unusual case in which the population or parent distribution is available to us. We will simulate the process of drawing many samples of size n from a parent distribution, and taking the mean of each sample. The distribution of these means, known as the sampling distribution of the mean, describes the expected random variability in the mean for different samples.

We see that due to the Central Limit Theorem, when \(n\) is large enough (above about 50) the sampling distribution of the mean is well approximated by a normal distribution, whose standard deviation is the standard error of the mean.

3.1. Tasks for this week#

Conceptual material is covered in the lecture. In addition to the live lecture, you can find lecture videos on Canvas.

Please work through the guided exercises in this section (everything except the page labelled “Tutorial Exercises”) in advance of the computer-based tutorial session.

To complete the guided exercises you will need to either:

  • open the pages in Google Colab (simply click the Colab button on each page), or

  • download them as Jupyter Notbooks to your own computer and work with them locally (eg in JupyterLab)

If you find something difficult or have questions, you can discuss with your tutor in the computer-based tutoral session.

This week is particularly heavy on conceptual material, so please do discuss the guided exercises and tutorial exercises with your tutor to make sure you understand