4.9. Knowing the effect size#
Some sleight of hand has been at play in this chapter.
I said that to do power analysis we assume
So I went from this:
I collect data on end-of year exam scores in Maths and French for 50 high school studehts. Then I calculate the correlation coefficient, Pearson’s r, between Maths and French scores across my sample of 50 participants.
Under the null hypothesis there is no correlation between maths scores and French scores Under the alternative hypothesis, there is a correlation
to this:
If
is true, the population correlation is
How did I actually decide what effect size (value of
4.9.1. Post hoc power analysis#
In the example given, I took the value of
This is sometimes called a post-hoc power analysis.
When I ran the power analysis after the fact, it told me I should have had a sample 128 people rather than 50 to detect that correlation with 80% power.
This isn’t quite the intended purpose of power analysis, although it is how power analysis is often used in reality - to evaluate post hoc, or after the fact, whether a study was sufficiently well powered.
Ideally, we are supposed to do a power analysis when planning the experiment, to decide in advance what sample size to collect.
Power calculations in advance of the study are now required by almost all funders, ethical review boards and pre-registration repositories, as well as many scientific journals
This is important because underpowered studies are a waste of money (for funders) and less likley to produce reproducible results.
But if we want to do the power calculation before the study, how can we know the effect size?
4.9.2. Video#
Here is a video about how we can decide on the effect size for a power analysis:
4.9.3. Estimating the effect size from the literature#
To get an idea of the effect size we expect in a planned study, we can look at other similar studies in the literature. For example, if I want to know whether a new literacy intervention improves reading scores in primary school children, I can look at the effect sizes in previous studies of reading interventions.
4.9.4. Recovering from and #
Although it is not common practice to report effect sizes in journal article, they can be recovered from the
Paired sample -test#
Remember that
where
Now Cohen’s
Rearranging, we see that
One sample -test#
This is very similar to the paired sample t-test.
We have
where
Again we have
Correlation#
Power analysis could be run on the effect size statsmodels
we convert
Again we have
Independent samples -test#
For the independent samples
Yikes!
The formula for
where
This all means that to recover Cohen’s
Phew!
4.9.5. Practical effect size#
One context in which power can definitely be meaningfully defined, is when we know how big an effect would be useful, even if we don’t know what the underlying effect size in the population is.
Say for example we are testing a new analgesic drug. We may not know how much the drug will reduce pain scores (the true effect size) but we can certainly define a minimum effect size that would be clinically meaningful. You could say that you would only consider the effect of the drug clinically significant if there is a 10% change in pain scores (otherwise, the drug won’t be worth taking). That is different from statstistical significance - if you test enough patients you could detect a statstically significant result even for a very small change in clinical outcome but it still wouldn’t mean your drug is an effective painkiller.
If we conduct a power analysis assuming that the effect size in the population is the minimum clincally significant effect, this will tell us how many participants we need to detect such a clinically significant effect with (say) 80% power. By definition a smaller effect would need more participants to detect it (but we wouldn’t be interested in such a small effect from a clinical perspective, so that doesn’t matter). Any effect larger than the minimum clinically significant effect would have more than 80% power, as larger effects are easier to detect.