3.4. Conditional Distributions#
Conditional distributions refer to spread of value around the regression line.
In this example, where
For people with 12 years of education (
However, we can think of the predicted value as a predicted mean, and that there will be a normal distribution of values around that mean. The word ‘conditional’ here refers to the distribution of

We can use a statistic called the “Root mean square error” (Or RMSE) to estimate spread around the regression line.
The RMSE provides the estimated standard deviation of conditional distribution of
The equation for the RMSE is:
Again, you’ll notice that it is comprised of familiar components, namely, the standard deviation of
Coming back to the immigration data from last week, where
= 2.533, and = -0.1572, plug these values into the equation and find the RMSE.
Click to reveal answer
How do we interpret the RMSE?
Click to reveal answer
We expect 95% of the values of
note that if the sample size were smaller, we would not use 1.96, which is the critical value of
for and for large sample sizes (over about 50). Instead we would use the critical values of is for =0.05 and out actual sample size - there is a section on confidence intervals in the final lecture from Michaelmas Term.
Let’s look at an example when
5.380
When