Conditional Distributions

3.4. Conditional Distributions#

Conditional distributions refer to spread of value around the regression line.

In this example, where y = income, and x = years of education, the expected value of income = -5000 + 3000*years of education.

For people with 12 years of education (𝑥 = 12), the expected mean income is 31,000.

However, we can think of the predicted value as a predicted mean, and that there will be a normal distribution of values around that mean. The word ‘conditional’ here refers to the distribution of y, conditional on a given value of x.

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/images/regression3_conditional.png

We can use a statistic called the “Root mean square error” (Or RMSE) to estimate spread around the regression line.

The RMSE provides the estimated standard deviation of conditional distribution of y at each value of x. It is also known as the standard deviation of the residuals.

The equation for the RMSE is:

MSE=sy1r2

Again, you’ll notice that it is comprised of familiar components, namely, the standard deviation of y, and R2.

  • Coming back to the immigration data from last week, where sy = 2.533, and r = -0.1572, plug these values into the equation and find the RMSE.

  • How do we interpret the RMSE?