3.5. Statistical Significance in Regression#
What is statistical inference?
Click to reveal answer
Inference is drawing conclusions about the population from sample data. To be able to draw conclusions about the population from the results of a regression model, we need to test whether the slope is statistically significantly different to zero.
To be able to draw conclusions about the population from the results of a regression model, we need to check two things. First, whether the slope is statistically significantly different to zero, and second, whether the regression assumptions have been met. We’ll start by thinking about testing the slope for significance.
When we test a regression slope for significance, we are running a hypothesis test. We can set up the hypotheses in the following way, where
The null hypothesis can be written as
Note: In regression we use the two-tailed test, as we are interested in testing whether there is an association and not to predict the direction of the association.
The test for significance of a slope in regression can also be called a test of independence. We consider
The null hypothesis for statistical independence is thus:
We test the slope for significance with a
3.5.1. Testing a slope for significance Example#
The General Social Survey sampled 2,428 respondents, asking about
How do you go about testing the null hypothesis that these variables are independent? (Without Python).
Click to reveal answer
To compute the test statistic, we use the equation:
Where
To find the critical value of
When
Here, in the mother’s education example, we compute
As the calculated value of
How would you find a 95% confidence interval for the population slope.
Click to reveal answer
As you shave seen earlier in ths course, we can use the standard error to find 95% confidence intervals around the slope. We do this with the following equation:
…where
= 1.96 for large samples. When n is small, refer to the table to t to find the cut-off point for testing significance
Here,
or:
Conclusion: We can be 95% confident that
In education example, the standard error was rather conveniently provided for us. But where did this come from?
The equation for calculating the standard error is:
Note its familiar components! It is computed using the known values for the slope,
In the immigration data from last week’s tute, the correlation between age and immigration attitudes is -0.1572, the regression slope is -0.0217, and
you can check the SE calculated by Python in the regression results summary table - does it match what you calculated here?
Click to reveal answer
Just by eyeballing and slope and the standard error, can you tell if it is statistically significant?
Click to reveal answer
Yes!
Remember the test statistic is
…which gives us a number that is higher than 1.96 (in fact
Therefore the slope is statistically significantly different to zero