3.5. The t distribution#

The \(t\) distribution describes the shape of the best-estimate sampling distribution of the mean when data are drawn from a normal distribution and the best-fitting normal distribution (ie, it mean and standard deviation) have been estimated from a small sample.

3.5.1. Sampling distribution of the mean#

  • The sampling distribution of the mean is the distribution you wuld get if you took many different samples of sie \(n\) from the population and calculated each of our means

  • The null distribution is the sampling distribution of the mean (or another test statistic) that we would expect to get if the null hypothesis were true

3.5.2. Scaling by sample size#

Recall from the previous lecture that the spread of the sampling distribution of the mean depends on the sample size. More precisely:

  • The standard deviation of the sampling distribution of the mean is the standard error, \(s/\sqrt{n}\) where \(s\) is the sample standard deviation and \(n\) is the sample size

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp6_tvsn.png

3.5.3. Pointy top, heavy tails#

The \(t\) distribution looks a lot like the normal distribution, but it has a pointy top and heavy tails.

The \(t\) distribution actually changes shape depending on the sample size used to fit the best-fitting normal:

  • when the sample is tiny (\(n=2\)) we get an extreme pointy top and heavy tails

  • as the sample size gets large (about 30) the t distribution is almost identical to the normal distribution

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp6_tDist.png

This distinction may appear subtle but remember that when a value is statistically significant, it lies in the tails of the null distribution. So if the \(t\)-distribution is the sampling distribution of the mean under the null hypothesis, we will be looking closely at its tails to determine the probability that our result (difference of means) could have arisen due to chance.

Why heavy tails?#

You can skip this bit if you don’t fancy it; it is sufficient to understand that the \(t\)-distribution has heavy tails because of ‘something to do with estimating the standard deviation’!

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp6_whytails.png

3.5.4. Analogy: \(t\) and \(Z\)#

We saw that if data are normally distributed, that when we standardize data by converting the data to Z-scores, we create a distribution (of Z) with a fixed mean (zero) and standard deviation (1). If the data mean is \(\bar{x}\) and standard deviation \(s\):

\[ Z - \frac{x-\bar{x}}{\frac{s}{\sqrt{n}}} \]

From this standardied distribution, we can directly read off the probability of a given data value (eg the probability ot a Z-score greater than 1 is 15%)

Similarly for sample means, if we standardize by converting to \(t\)-scores:

\[ t = \frac{\bar{x}}{\frac{s}{sqrt{n}}} \]

This standardized distribution tells us the probability of a sample mean as large as our observed one \(\bar{x}\) (or larger) if the population mean was really zero

  • This is the probability of the sample mean arising due to chance under the null hypothesis, if the null hypothesis is that the population mean is zero.