4.8. Power analysis with statsmodels#

We have seen how to build our own power analysis by simulating a population with a certain effect size (in our example, a correlation of \(\rho=0.25\)) and sampling from it.

In Python, there is also a built in function for doing power analysis, in a library called statsmodels

In this notebook we look at how to run power analysis for t-tests and corelation using statsmodels

4.8.1. Set up Python libraries#

As usual, run the code cell below to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings 
warnings.simplefilter('ignore', category=FutureWarning)

4.8.2. The three ingredients of power analysis#

For a given statistical test, the power depends on three factors:

  • The effect size (Pearson’s \(r\) for correlation, or Cohen’s \(d\) for the \(t\)-test)

  • The sample size(s) \(n\)

  • The \(\alpha\) value

These factors are linked -

  • if the effect size is small, we need a larger sample size to detect it with a certain power

  • if the alpha value is small, we need a large sample size to detect it with a certain power

    • we need a larger \(n\) to detect an effect at the 0.01 level the the 0.05 level

The process of power analysis works out one of the three ‘factors’ from the other two (for a given level of power).

For example, consider an paired samples \(t\)-test. We might ask about \(n\):

  • “What sample size is needed to detect an effect of size \(d=0.2\) with 80% power, at the \(\alpha=0.05\) level?

    • the answer is 156 pairs

Or we might know \(n\), \(d\) and \(\alpha\) and ask what our power is

  • “what is the power to detect an effect of size \(d=0.2\) with sample size \(n=100\) in each group, at the \(\alpha=0.05\) level?

    • the answer is 63%

4.8.3. Running it: Paired samples \(t\)-test#

The package used for power analysis is statsmodels - you will use statsmodels extensively in the section on regression analysis.

The syntax of statsmodels may be slightly unfamiliar.

  • First we create an analysis of the correct form (in this case, a power analysis for a paired samples \(t\)-test, using statsmodels.stats.power.TTestPower)

  • Then we solve the analysis for given values of \(n\), \(d\), \(\alpha\) and power

    • Exactly one of those four parameters (\(n\), \(d\), \(\alpha\) and power) is unknown - this is the one being calculated and its value in the code is None

    • All the ther parameters have to be input into the code

Here it is in action for the question

  • “What sample size is needed to detect an effect of size \(d=0.2\) with 80% power, at the \(\alpha=0.05\) level?”

# import required modules
from statsmodels.stats.power import TTestPower

# create an analysis of the correct form
analysis = TTestPower()

# solve for power
samplesize = analysis.solve_power(effect_size=0.2, alpha=0.05, nobs=None, power=0.8, alternative='two-sided')
print('n = ' + str(samplesize))
n = 198.15082433942104

Note that we had to specify the argument alternative='two-sided' (could also be alternative='larger' for a one-tailed test)

Alternatively we can solve for power, as in the question

  • “what is the power to detect an effect of size \(d=0.2\) with sample size \(n=100\) in each group, at the \(\alpha=0.05\) level?

# import required modules
from statsmodels.stats.power import TTestPower

# create an analysis of the correct form
analysis = TTestPower()

# solve for power
power = analysis.solve_power(effect_size=0.2, alpha=0.05, nobs=100, power=None, alternative='two-sided')
print('power = ' + str(power*100) + '%')
power = 50.826480387188255%

If we so wished, we could solve for any of the other parameters instead.

4.8.4. Independent samples \(t\)-test#

For an independet samples \(t\)-test, we need to use a different model in statsmodels: TTestIndPower instead or TTestPower.

Additionally, the \(n\) for the two groups may be different. We specify the samples sizes as follows:

  • size of group 1 is nobs1

  • ratio tells us the ratio n2/n1

Here it is in action for the question

  • “What is the power to detect an effect of size \(d=0.5\), at the \(\alpha=0.05\) (two-tailed) level?, with samples of 20 and 25 in my two groups?”

# import required modules
from statsmodels.stats.power import TTestIndPower

# create an analysis of the correct form
analysis = TTestIndPower()

# solve for power
power = analysis.solve_power(effect_size=0.5, alpha=0.05, nobs1=20, ratio=25/20, power=None, alternative='two-sided')
print('power = ' + str(power))
power = 0.3706770336488768

So the power is 37%

As for the one sample test, you can sole for nobs1, effect_size, or alpha instead as required.

4.8.5. One sample \(t\)-test#

The one sample \(t\)-test is just like a paired sample \(t\)-test in which one of the samples is your actual data (say, the heights of 8 rowers) and the other sample is the reference value (such as the mean height of adult men in the general population, 175cm), 8 times over

So whilst a paired samples t-test might ask: “is each brother taller than his own sister”, a one sample test asks “is each rower taller than 175cm”

The effect size is then given by

\[d = \frac{\bar{x}-175}{s}\]

where \(\bar{x}\) is the sample mean and \(s\) is the sample standard deviation.

Let’s try it with the heights of 8 rowers (from the notebook on the one-sample \(t\)-test)

heights = pd.read_csv('https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/data/rowersHeights.csv')

First we work out the effect size:

d = (heights.height.mean()-175)/heights.height.std()
d
0.6004636222643069

The effect size here is \(d = 0.60\). We also know that \(n\)=8 and let’s assume we are working with \(\alpha=0.05\)

Now we can use TTestPower from statsmodels to work out the power. Thinking back to our rowers example, the alternative hypothesis was one-tailed (rowers are taller than average) so we set the arguement alternative='larger' accordingly

# import required modules
from statsmodels.stats.power import TTestPower

# create an analysis of the correct form
analysis = TTestPower()

# solve for power
power = analysis.solve_power(effect_size=0.60, alpha=0.05, nobs=8, power=None, alternative='larger')
print('power = ' + str(power*100) + '%')
power = 45.488366024594875%

Apparently with a sample size of \(n=8\), we had 45% power, ie 45% chance to detect a significant difference in height between the rowers and the reference value

4.8.6. Correlation#

It would be handy if statsmodels had a built in power analysis function for correlations (Pearson’s \(r\)) but it does not.

However, by the power of maths, we can actually convert our \(r\) value to a \(t\) value using the equation:

\[ t=\frac{r\sqrt{n-2}}{\sqrt{(1-r^2)}} \]

Thereafter the procedure is the same as for the one sample \(t\)-test, with a reference value of zero (because when we are testing ‘if there is a correlation’, we are actually testing if \(r\) is different from zero).

Thinking back to our Maths/French exam scores example, we had \(r\)=0.25, \(n\)=50 and \(\alpha=0.05\)

We convert \(r\) to \(t\):

r = 0.25
n = 50
t = (r*(n-2)**0.5)/((1-r**2)**0.5) # note **2 means 'squared', **0.5 means 'square root')
t
1.7888543819998315

Then we convert \(t\) to \(d\) using the formula

\[d = \frac{t}{\sqrt(n)}\]

NOTE - this formula is explained on the page Determining Effect Size

d = t/(n**0.5)
d
0.2529822128134703

Now we can run the power analysis:

# import required modules
from statsmodels.stats.power import TTestPower

analysis = TTestPower()

# solve for power
power = analysis.solve_power(effect_size=0.25, alpha=0.05, nobs=50, power=None, alternative='two-sided')
print('power = ' + str(power*100) + '%')
power = 41.038064179110286%

Looking back at our home-made code (in a previous notebook), we can see a good match to the power given by the built in function.

4.8.7. Recap#

The syntax for the power analysis is a bit fiddly so let’s recap it:

There is a four way relationship between:

  • sample size(s) (nobs)

  • alpha values (alpha)

  • effect size (effect size)

  • power

If we know three of them, we can work out the fourth.

… note that for the two-sample test we specify the samples size thus:

  • size of group 1 is nobs1

  • ratio tells us the ratio n2/n1