17.4. Logistic Regression Equation#

  • When should we use logistic regression rather than linear regression?

  • Why can’t we use linear regression for binary outcomes?

  • What is a logit transformation?

The logistic equation uses the log of the odds, $\log{ \left[ \frac{p(y=1)}{1-p(y=1)} \right]} $ as the outcome

We model the logit (the log of the odds) as follows:

$$ logit(p) = \log{ \left[ \frac{p(y=1)}{1-p(y=1)} \right]} = \alpha + \beta_1 x_1 + \beta_2 x_2 + \dots \beta_k x_k $$

In a regression model with $k$ independent variables

The natural log of the odds would take a value of zero when the probability is 0.5. With a probability of 1 logit(p) would be infinity, and with a probability of 0, logit(p) would be minus infinity.

The logit transformation gets around the problem that the assumption of linearity has been violated. The transformation is a way of expressing a non-linear relationship in a linear way.

Apart from the outcome variable, the form of the regression equation is very familiar! Like in linear regression the slope estimate $\beta$ describes the change in the outcome variable for each unit of $x$, and like in linear regression, the intercept $\alpha$ is the value of the outcome variable when all $x$ variables take the value zero. However, in interpreting the coefficients, we need to keep in mind the transformation function of $y$, which we’ll practice in the next section.