3.3. Model fit: \(R^2\)#


https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook/main/images/regression3_church.jpg

We’ll begin with the by-now familiar correlation coefficient, in the following example: Who marries whom is the subject of intense interest in groups ranging from tabloid readers to behaviour geneticists. An empirical study by Buss (1984) of 93 married couples found correlations between spouses in a selection of settings (e.g., attending church, going to the beach, attending discos, nightclubs, ski resorts, dances, baseball games). The correlation, for example, between whether spouse A attends church and spouse B attends church was found to be 0.81.

  • Would you conclude, from the correlation coefficient, that the sample association is strong or weak?

  • Find the square of the correlation. How do we interpret this?

The \(R^2\) summarises how well \(x\) can predict \(y\). The \(R^2\) is a measure of the proportional reduction in prediction error, when \(x\) is used to predict \(y\), compared to when \(\bar{y}\)̅ is used to predict \(y\). An \(R^2\) of 0.656 means that for predicting

  • \(y\) = whether spouse B attends church, …the linear prediction equation which uses

  • \(x\) = whether spouse A attends church, …as an explanatory variable, has 65.6% less error, than if \(\bar{y}\) is used to predict \(y\) (whether spouse B attends church).

  • \(R^2\) is also known as the ‘proportional reduction of error’. Why is this?

  • In what ways is \(R^2\) like \(r\), and in what ways is it different?