An effect size is a quantitative measure of the magnitude of a result — how large the difference between two groups is, or how strong the relationship between two variables is. Unlike a p-value, which only tells you whether an effect is likely to be real, an effect size tells you how big and how important that effect actually is. The most widely reported measure is Cohen’s d, where 0.2 counts as a small effect, 0.5 as medium and 0.8 as large.
Whenever changes are made to a curriculum or teaching method in a chemistry or physics class, we usually want to evaluate those changes to judge their effectiveness. Although the term normalised gain is used in the sciences to compare pre- and post-test performance, in the social sciences it is far more common to report an effect size. This single number lets readers compare findings across studies, conduct a meta-analysis and judge real-world importance.
So what does an effect size actually represent? Let’s break it down.
What is an Effect Size?
An effect size in statistics measures how important the difference between group means is, or how strong the relationship between different variables is. While analysts often focus on statistical significance using p-values, the effect size captures the practical significance of a result.
It helps to separate two ideas that are easy to confuse:
- Statistical significance tells you how plausible the null hypothesis is, given your data — essentially, how confident you can be that an observed effect is not just random noise.
- Practical significance tells you whether the finding is large enough to matter in the real world. This is what the effect size describes.
A small effect size indicates that the difference between groups (or the strength of a relationship) is minor, even if it is real. A large effect size indicates a substantial, meaningful difference. For a standardised mean-difference measure such as Cohen’s d, the effect size expresses the gap between two group means in units of standard deviation — so a d of 0.5 means the two means differ by half a standard deviation.
“The effect size is the main finding of a quantitative study. While a P value can inform the reader whether an effect exists, the P value will not reveal the size of the effect.” — Sullivan & Feinn, Journal of Graduate Medical Education (2012)
So why do we need an effect size when we already have a significance test? Let’s find out.
Why is an Effect Size Important?
When the p-value is larger than the chosen alpha level, an observed difference can usually be explained by sampling variability. The crucial problem is the opposite case: when a sample is very large, a statistical test will almost always return a “significant” result, however trivial the real-world effect is. The effect size does not behave this way — it is essentially independent of sample size, so reporting only a p-value is not enough for readers to understand what a study really found.
Here is the core issue, stated plainly:
- With a sample of, say, 20,000, a tiny and unimportant difference can still produce a very small p-value and be flagged as “statistically significant”.
- A p-value confounds the size of an effect with the size of the sample — it reflects both at once, so it cannot tell you which is driving the result.
- An effect size isolates the magnitude of the effect itself, so it remains stable whether your sample has 30 participants or 30,000.
This is why journals, the APA and most dissertation supervisors now expect an effect size alongside every p-value. The two are complementary: the p-value addresses “Is this effect likely to be real?” and the effect size addresses “Is this effect big enough to care about?” Effect size is also the input you need to run a power analysis and decide how large a sample your study requires.
Now that we know how important an effect size is, let’s see how to calculate one.
Not sure which effect size to report for your test?
- Our statisticians choose and compute the right effect size for your data.
- Clear interpretation written up to dissertation standard.
How to Calculate an Effect Size?
There are several effect-size measures, and the right one depends on the type of analysis you are running. The three you will meet most often are Cohen’s d (for the difference between two means), Pearson’s r (for the strength of a correlation) and eta-squared, η² (for the variance explained in ANOVA). We’ll cover each below.
1. Cohen’s d — difference between two means
Cohen’s d is designed to express the standardised difference between two means. It is the natural companion to a t-test and is the headline effect size used in meta-analysis. To compute it, subtract one group mean from the other and divide by the (pooled) standard deviation:
d = (M₁ − M₂) ÷ SDpooled
where M₁ and M₂ are the two group means and SDpooled is the pooled standard deviation of the two groups. The pooled standard deviation is calculated as:
SDpooled = √[ ((n₁−1)s₁² + (n₂−1)s₂²) ÷ (n₁ + n₂ − 2) ]
where n is each group’s sample size and s² is each group’s variance. Because the result is expressed in standard-deviation units, it is unit-free: a d of 1 means the two groups differ by exactly one standard deviation, a d of 2 means they differ by two standard deviations, and so on. (Standard-deviation units are equivalent to z-scores, so 1 z-score = 1 standard deviation.)
A teacher tests a new revision method. The treatment class scores a mean of M₁ = 78 and the control class a mean of M₂ = 72. Both classes have the same standard deviation, so the pooled SD is 10.
- Find the difference between means: 78 − 72 = 6.
- Divide by the pooled standard deviation: 6 ÷ 10 = 0.6.
- Interpret: d = 0.6 sits between Cohen’s “medium” (0.5) and “large” (0.8) benchmarks, so the new method produced a moderate-to-large, practically meaningful improvement — worth adopting, regardless of the p-value.
Cohen proposed that d = 0.2 is a “small” effect, d = 0.5 a “medium” effect and d = 0.8 a “large” effect. So if two group means differ by less than 0.2 standard deviations, the difference is trivial in practice, even if it is statistically significant.
“A medium effect of .5 is visible to the naked eye of a careful observer. A small effect of .2 is noticeably smaller than medium but not so small as to be trivial … a large effect of .8 is the same distance above the medium as small is below it.” — Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd ed. (1988)
2. Pearson’s r — strength of a correlation
Pearson’s r, the correlation coefficient, measures the strength and direction of the linear relationship between two variables. Its value always lies between −1 and +1: a value near 0 means little or no linear relationship, while values near −1 or +1 indicate a strong relationship. The formula is:
rxy = [ N∑XY − (∑X)(∑Y) ] ÷ √{ [N∑X² − (∑X)²][N∑Y² − (∑Y)²] }
where:
- rxy is the strength of the correlation between variables x and y
- N is the sample size
- ∑ means “the sum of”
- X and Y are each x- and y-variable value
- XY is the product of each x value and its corresponding y value
In practice you rarely compute this by hand — SPSS, R, Excel or a free online calculator will return r from raw data instantly. Like Cohen’s d, Pearson’s r is a unit-free, standardised measure, so correlations from different studies can be compared directly. Both r and d are intended for interval or ratio variables; for ordinal or nominal data you should use other measures (for example, Spearman’s rho or Cramér’s V).
3. Eta-squared (η²) — variance explained in ANOVA
When you compare three or more group means with ANOVA, the common effect size is eta-squared. It expresses the proportion of the total variance in the outcome that is explained by your grouping variable:
η² = SSeffect ÷ SStotal
where SSeffect is the sum of squares for the factor of interest and SStotal is the total sum of squares. An η² of 0.10, for instance, means the grouping variable accounts for 10% of the variance in the outcome. Cohen’s benchmarks for η² are roughly 0.01 (small), 0.06 (medium) and 0.14 (large).
Interpreting Effect Size: The Threshold Table
Cohen’s benchmarks are conventions, not laws — a “small” effect can still be hugely important in fields such as medicine. Still, the thresholds below are the standard reference point for reporting and interpreting results.
| Effect size measure | Small | Medium | Large | Used for |
|---|---|---|---|---|
| Cohen’s d | 0.2 | 0.5 | 0.8 | Difference between two means (t-test) |
| Pearson’s r | 0.1 | 0.3 | 0.5 | Strength of a linear correlation |
| Eta-squared (η²) | 0.01 | 0.06 | 0.14 | Variance explained in ANOVA |
A few practical pointers when reporting an effect size:
- Always report it alongside the p-value, not instead of it — together they tell readers both whether an effect is real and how large it is.
- Give a confidence interval for the effect size where possible, so readers can see the precision of the estimate.
- Judge importance in context. In safety-critical or clinical research, even a d of 0.2 may be meaningful; in exploratory work it may not be.
- Use the effect size to plan future studies. It is the key input for a statistical power calculation and sample-size estimate.
Struggling to report and interpret your effect sizes?
ResearchProspect to the rescue!
Our experts run the analysis, calculate the correct effect size and write up a clear interpretation — explore our statistical analysis service to get started.





