2.6. Multivariate analysis and interaction terms#

Let’s continue working with the same example here – smoking and health. This time, we will add age as an additional x variable. It will come as no surprise to know that there is a strong statistical relationship between age and health, as health declines in older ages. The output from Python, with age added to the model, is:

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp11_InteractionTable1.png
  • What is the relationship between age and health?

  • Is this a large effect or a small one?

  • The N in this model (including age) is 4494, but the N in our first model (on the previous page) was 4527. Do you have any idea why this might be?

  • Why have the parameter estimates for smoking changed in this new model?

As our final model in the smoking example, let’s take a look at the interaction term between age and smoking. Remember that an interaction term helps us to understand if the effect of a variable x1 is the same for all values of a second variable, x2. Here is the model output table from Python after specifying the new model with the interaction term:

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp11_InteractionTable2.png
  • How do interactions appear in the model?

  • How can we interpret the “main effect” of age in this model?

The plot shows us that the downward slope is steeper for the people who have never smoked. And shallowest for the current smokers. Another way to think about this is that the health gaps by smoker status are larger when people are younger, and much narrower when people are older. You will practise running plots like this, and interpreting interactions, in this week’s tutorial.

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/Chp11_InteractionFigure1.png