ANOVA (Analysis of Variance) is a statistical test that compares the means of three or more groups at the same time to decide whether at least one of them is significantly different from the others. It does this by comparing the variation between the groups with the variation within the groups, producing a single number called the F-statistic. If the between-group variation is large relative to the within-group variation, ANOVA concludes that the group means are not all equal.
ANOVA is one of the most widely used techniques in inferential statistics because a single test handles many groups at once, avoiding the inflated error rate you would get from running multiple t-tests. This guide explains what ANOVA is, how the F-statistic works, the difference between one-way and two-way ANOVA, the assumptions you must check, post-hoc tests, and a fully worked example with a complete ANOVA summary table.
“The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic.” — Sir Ronald A. Fisher, who developed ANOVA in the 1920s
What Is ANOVA? A Clear Definition
ANOVA stands for Analysis of Variance. It is a parametric statistical test, first developed by the British statistician Sir Ronald A. Fisher in the 1920s and popularised in his 1925 book Statistical Methods for Research Workers. Despite its name, ANOVA is really a test about means: it uses the variances in the data to work out whether the population means of several groups are likely to be equal.
The logic is simple. If several groups are drawn from populations with the same mean, the spread of the group averages around the overall (grand) average should be no bigger than you would expect from random sampling noise. If the group averages are spread much further apart than the noise alone can explain, that is evidence the groups really do differ. ANOVA formalises this comparison and tells you, through a p-value, how likely the observed differences are if every group mean were actually the same.
A key strength of ANOVA is efficiency. To compare three groups with t-tests you would need three separate comparisons; with four groups, six. Each test carries its own risk of a false positive, so the chance of wrongly declaring a difference (a Type I error) climbs quickly. ANOVA tests all the groups together in one shot, keeping the overall error rate under control.
Key Terms You Need First
- Dependent variable: the continuous outcome you measure (e.g. exam score, blood pressure, yield). ANOVA requires this to be measured on an interval or ratio scale.
- Independent variable (factor): the categorical grouping variable that splits the data into groups, such as teaching method or drug type.
- Levels: the categories within a factor. A “teaching method” factor with three methods has three levels.
- Between-group variance: how far the individual group means sit from the grand mean. Large values suggest a real effect.
- Within-group variance: the natural scatter of observations inside each group. This is the “error” or noise against which the effect is judged.
- Null hypothesis (H0): all group means are equal (μ₁ = μ₂ = μ₃ = …). The alternative hypothesis (H1) states that at least one mean differs — not that they all differ.
How Does ANOVA Work? The F-Statistic
ANOVA partitions the total variation in the data into two parts: variation between groups and variation within groups. It then forms a ratio of the two, called the F-statistic:
where a mean square is a sum of squares divided by its degrees of freedom.
The building blocks are:
- Sum of squares between (SSbetween): ∑ nj(x̄j − x̄grand)² — how far each group mean sits from the grand mean.
- Sum of squares within (SSwithin): ∑∑(xij − x̄j)² — the spread of observations around their own group mean.
- Degrees of freedom: dfbetween = k − 1 and dfwithin = N − k, where k is the number of groups and N the total sample size.
- Mean squares: MSbetween = SSbetween ÷ dfbetween; MSwithin = SSwithin ÷ dfwithin.
Interpreting F is intuitive. If the null hypothesis is true, between-group and within-group variation should be similar, so F is close to 1. The larger the F-statistic, the stronger the evidence that the group means differ. You compare your F-value against a critical value from the F-distribution (set by your degrees of freedom and significance level, usually α = 0.05), or simply read the p-value. If F is large enough — equivalently, if p < 0.05 — you reject the null hypothesis and conclude that at least one group mean is different. For more on this decision rule, see our guide to statistical significance.
ANOVA partitions total variance
- Variation due to the treatment / factor
- Random error inside each group
ANOVA Worked Example (One-Way)
Suppose a researcher wants to know whether three study methods produce different exam scores. Five students are randomly assigned to each method, giving N = 15 and k = 3 groups.
| Group | Scores | Group mean (x̄j) |
|---|---|---|
| Method A | 80, 85, 90, 75, 70 | 80 |
| Method B | 90, 95, 85, 80, 100 | 90 |
| Method C | 70, 75, 65, 80, 60 | 70 |
- Grand mean: all 15 scores sum to 1,200, so x̄grand = 1200 ÷ 15 = 80.
- SSbetween: 5[(80−80)² + (90−80)² + (70−80)²] = 5[0 + 100 + 100] = 1,000.
- SSwithin: summing squared deviations inside each group gives 250 + 250 + 250 = 750.
- Degrees of freedom: dfbetween = k − 1 = 2; dfwithin = N − k = 12.
- Mean squares: MSbetween = 1000 ÷ 2 = 500; MSwithin = 750 ÷ 12 = 62.5.
- F-statistic: F = 500 ÷ 62.5 = 8.0.
- Decision: the critical value F(2, 12) at α = 0.05 is 3.89. Because 8.0 > 3.89 (p < 0.05), we reject the null hypothesis — the study methods do not all produce the same mean score.
The results are usually presented in a standard ANOVA summary table:
| Source of variation | Sum of squares (SS) | df | Mean square (MS) | F |
|---|---|---|---|---|
| Between groups | 1,000 | 2 | 500.0 | 8.0 |
| Within groups (error) | 750 | 12 | 62.5 | |
| Total | 1,750 | 14 |
Note that the between and within sums of squares add up to the total sum of squares (1,000 + 750 = 1,750), and the degrees of freedom add up too (2 + 12 = 14). This partitioning is the heart of analysis of variance.
Assumptions of ANOVA
ANOVA is a parametric test, so its conclusions are only trustworthy when several assumptions hold. Always check these before relying on the result:
- Normality: the dependent variable should be approximately normally distributed within each group. ANOVA is fairly robust to mild departures, especially with larger samples.
- Homogeneity of variance (homoscedasticity): the groups should have roughly equal variances. Levene’s test is commonly used to check this; if it fails, a Welch’s ANOVA is a safer alternative.
- Independence of observations: each data point must be independent of the others, which is normally guaranteed by random sampling and random assignment. This is the most important assumption and the hardest to fix after the fact.
- Continuous dependent variable & categorical factor: the outcome must be measured on an interval or ratio scale, while the grouping variable must be categorical. See levels of measurement if you are unsure.
If the normality or equal-variance assumptions are seriously violated, consider a non-parametric alternative such as the Kruskal–Wallis test, or transform the data. Choosing correctly between these options is covered in our guide on which statistical test you should use.
Post-Hoc Tests: Finding Out Which Groups Differ
A significant ANOVA tells you that at least one group mean is different, but not which one. To pinpoint the specific differences, you run a post-hoc (after-the-fact) test that compares pairs of groups while controlling the overall error rate. Common choices include:
- Tukey’s HSD (Honestly Significant Difference): the most popular option when comparing all possible pairs with equal group sizes.
- Bonferroni correction: a conservative, general-purpose adjustment that divides the significance level by the number of comparisons.
- Scheffé’s test: flexible and conservative, useful for complex comparisons.
In our worked example, a Tukey’s HSD would let us check whether Method B truly outperforms Methods A and C, and whether A and C differ from one another.
One-Way vs Two-Way ANOVA
The two most common forms of ANOVA differ in how many factors (independent variables) they involve.
1. One-Way ANOVA
A one-way ANOVA examines the effect of a single categorical factor (with three or more levels) on a continuous outcome. It is also called one-factor ANOVA or between-subjects ANOVA. Our exam-score example above is a one-way ANOVA: one factor (study method) with three levels.
2. Two-Way ANOVA
A two-way ANOVA examines the effect of two categorical factors on a continuous outcome at the same time. Crucially, it tests three things: the main effect of each factor, plus the interaction effect — whether the influence of one factor depends on the level of the other. This interaction is something a one-way ANOVA cannot detect.
When an analysis extends to two or more factors it falls under the umbrella of factorial ANOVA, of which two-way ANOVA is the simplest case. The table below summarises the differences.
| Feature | One-way ANOVA | Two-way ANOVA |
|---|---|---|
| Number of factors | One | Two |
| Tests main effects? | Yes (one) | Yes (two) |
| Tests interaction? | No | Yes |
| Typical question | Do three+ groups differ? | Do two factors and their combination affect the outcome? |
| Example | Sleep by social-media use | IQ by gender and region |
Where ANOVA Is Used
- Business & marketing: comparing the sales generated by several advertising campaigns to find the most effective one.
- Medicine: testing whether three or more drugs produce different mean recovery times in a clinical trial.
- Education: comparing exam performance across several teaching methods or schools, as in the worked example above.
- Agriculture & manufacturing: the field where Fisher first applied it — comparing crop yields under different fertiliser treatments or product quality across production lines.
Struggling to run or interpret ANOVA?
ResearchProspect to the rescue!
Our statisticians can run your one-way or two-way ANOVA, check the assumptions and write up the results — explore our statistical analysis service.