"> ANOVA Explained: Definition, Types & Example
Home > Library > Statistics > ANOVA: What It Is, Types, Assumptions and a Worked Example

Published by at September 2nd, 2021 , Revised On June 16, 2026

ANOVA (Analysis of Variance) is a statistical test that compares the means of three or more groups at the same time to decide whether at least one of them is significantly different from the others. It does this by comparing the variation between the groups with the variation within the groups, producing a single number called the F-statistic. If the between-group variation is large relative to the within-group variation, ANOVA concludes that the group means are not all equal.

ANOVA is one of the most widely used techniques in inferential statistics because a single test handles many groups at once, avoiding the inflated error rate you would get from running multiple t-tests. This guide explains what ANOVA is, how the F-statistic works, the difference between one-way and two-way ANOVA, the assumptions you must check, post-hoc tests, and a fully worked example with a complete ANOVA summary table.

“The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic.” — Sir Ronald A. Fisher, who developed ANOVA in the 1920s

What Is ANOVA? A Clear Definition

ANOVA stands for Analysis of Variance. It is a parametric statistical test, first developed by the British statistician Sir Ronald A. Fisher in the 1920s and popularised in his 1925 book Statistical Methods for Research Workers. Despite its name, ANOVA is really a test about means: it uses the variances in the data to work out whether the population means of several groups are likely to be equal.

The logic is simple. If several groups are drawn from populations with the same mean, the spread of the group averages around the overall (grand) average should be no bigger than you would expect from random sampling noise. If the group averages are spread much further apart than the noise alone can explain, that is evidence the groups really do differ. ANOVA formalises this comparison and tells you, through a p-value, how likely the observed differences are if every group mean were actually the same.

A key strength of ANOVA is efficiency. To compare three groups with t-tests you would need three separate comparisons; with four groups, six. Each test carries its own risk of a false positive, so the chance of wrongly declaring a difference (a Type I error) climbs quickly. ANOVA tests all the groups together in one shot, keeping the overall error rate under control.

Key Terms You Need First

  • Dependent variable: the continuous outcome you measure (e.g. exam score, blood pressure, yield). ANOVA requires this to be measured on an interval or ratio scale.
  • Independent variable (factor): the categorical grouping variable that splits the data into groups, such as teaching method or drug type.
  • Levels: the categories within a factor. A “teaching method” factor with three methods has three levels.
  • Between-group variance: how far the individual group means sit from the grand mean. Large values suggest a real effect.
  • Within-group variance: the natural scatter of observations inside each group. This is the “error” or noise against which the effect is judged.
  • Null hypothesis (H0): all group means are equal (μ₁ = μ₂ = μ₃ = …). The alternative hypothesis (H1) states that at least one mean differs — not that they all differ.

How Does ANOVA Work? The F-Statistic

ANOVA partitions the total variation in the data into two parts: variation between groups and variation within groups. It then forms a ratio of the two, called the F-statistic:

F = Mean Square Between (MSbetween) ÷ Mean Square Within (MSwithin)
where a mean square is a sum of squares divided by its degrees of freedom.

The building blocks are:

  • Sum of squares between (SSbetween): ∑ nj(x̄j − x̄grand)² — how far each group mean sits from the grand mean.
  • Sum of squares within (SSwithin): ∑∑(xij − x̄j)² — the spread of observations around their own group mean.
  • Degrees of freedom: dfbetween = k − 1 and dfwithin = N − k, where k is the number of groups and N the total sample size.
  • Mean squares: MSbetween = SSbetween ÷ dfbetween; MSwithin = SSwithin ÷ dfwithin.

Interpreting F is intuitive. If the null hypothesis is true, between-group and within-group variation should be similar, so F is close to 1. The larger the F-statistic, the stronger the evidence that the group means differ. You compare your F-value against a critical value from the F-distribution (set by your degrees of freedom and significance level, usually α = 0.05), or simply read the p-value. If F is large enough — equivalently, if p < 0.05 — you reject the null hypothesis and conclude that at least one group mean is different. For more on this decision rule, see our guide to statistical significance.

ANOVA partitions total variance

Between-groups

  • Variation due to the treatment / factor
Within-groups

  • Random error inside each group

ANOVA Worked Example (One-Way)

Suppose a researcher wants to know whether three study methods produce different exam scores. Five students are randomly assigned to each method, giving N = 15 and k = 3 groups.

Group Scores Group mean (x̄j)
Method A 80, 85, 90, 75, 70 80
Method B 90, 95, 85, 80, 100 90
Method C 70, 75, 65, 80, 60 70
Example: Step-by-step ANOVA calculation

  1. Grand mean: all 15 scores sum to 1,200, so x̄grand = 1200 ÷ 15 = 80.
  2. SSbetween: 5[(80−80)² + (90−80)² + (70−80)²] = 5[0 + 100 + 100] = 1,000.
  3. SSwithin: summing squared deviations inside each group gives 250 + 250 + 250 = 750.
  4. Degrees of freedom: dfbetween = k − 1 = 2; dfwithin = N − k = 12.
  5. Mean squares: MSbetween = 1000 ÷ 2 = 500; MSwithin = 750 ÷ 12 = 62.5.
  6. F-statistic: F = 500 ÷ 62.5 = 8.0.
  7. Decision: the critical value F(2, 12) at α = 0.05 is 3.89. Because 8.0 > 3.89 (p < 0.05), we reject the null hypothesis — the study methods do not all produce the same mean score.

The results are usually presented in a standard ANOVA summary table:

Source of variation Sum of squares (SS) df Mean square (MS) F
Between groups 1,000 2 500.0 8.0
Within groups (error) 750 12 62.5  
Total 1,750 14    

Note that the between and within sums of squares add up to the total sum of squares (1,000 + 750 = 1,750), and the degrees of freedom add up too (2 + 12 = 14). This partitioning is the heart of analysis of variance.

Assumptions of ANOVA

ANOVA is a parametric test, so its conclusions are only trustworthy when several assumptions hold. Always check these before relying on the result:

  1. Normality: the dependent variable should be approximately normally distributed within each group. ANOVA is fairly robust to mild departures, especially with larger samples.
  2. Homogeneity of variance (homoscedasticity): the groups should have roughly equal variances. Levene’s test is commonly used to check this; if it fails, a Welch’s ANOVA is a safer alternative.
  3. Independence of observations: each data point must be independent of the others, which is normally guaranteed by random sampling and random assignment. This is the most important assumption and the hardest to fix after the fact.
  4. Continuous dependent variable & categorical factor: the outcome must be measured on an interval or ratio scale, while the grouping variable must be categorical. See levels of measurement if you are unsure.

If the normality or equal-variance assumptions are seriously violated, consider a non-parametric alternative such as the Kruskal–Wallis test, or transform the data. Choosing correctly between these options is covered in our guide on which statistical test you should use.

Post-Hoc Tests: Finding Out Which Groups Differ

A significant ANOVA tells you that at least one group mean is different, but not which one. To pinpoint the specific differences, you run a post-hoc (after-the-fact) test that compares pairs of groups while controlling the overall error rate. Common choices include:

  • Tukey’s HSD (Honestly Significant Difference): the most popular option when comparing all possible pairs with equal group sizes.
  • Bonferroni correction: a conservative, general-purpose adjustment that divides the significance level by the number of comparisons.
  • Scheffé’s test: flexible and conservative, useful for complex comparisons.

In our worked example, a Tukey’s HSD would let us check whether Method B truly outperforms Methods A and C, and whether A and C differ from one another.

One-Way vs Two-Way ANOVA

The two most common forms of ANOVA differ in how many factors (independent variables) they involve.

1. One-Way ANOVA

A one-way ANOVA examines the effect of a single categorical factor (with three or more levels) on a continuous outcome. It is also called one-factor ANOVA or between-subjects ANOVA. Our exam-score example above is a one-way ANOVA: one factor (study method) with three levels.

Example: Use a one-way ANOVA to compare the average hours of sleep of low, medium and high social-media users. There is one independent variable (social-media usage, with three levels) and one continuous dependent variable (hours of sleep).

2. Two-Way ANOVA

A two-way ANOVA examines the effect of two categorical factors on a continuous outcome at the same time. Crucially, it tests three things: the main effect of each factor, plus the interaction effect — whether the influence of one factor depends on the level of the other. This interaction is something a one-way ANOVA cannot detect.

Example: To study IQ scores by both gender and region, use a two-way ANOVA. Gender is one factor and region is the other, while IQ is the continuous dependent variable. The interaction term reveals whether any gender difference in IQ varies from region to region.

When an analysis extends to two or more factors it falls under the umbrella of factorial ANOVA, of which two-way ANOVA is the simplest case. The table below summarises the differences.

Feature One-way ANOVA Two-way ANOVA
Number of factors One Two
Tests main effects? Yes (one) Yes (two)
Tests interaction? No Yes
Typical question Do three+ groups differ? Do two factors and their combination affect the outcome?
Example Sleep by social-media use IQ by gender and region

Where ANOVA Is Used

  • Business & marketing: comparing the sales generated by several advertising campaigns to find the most effective one.
  • Medicine: testing whether three or more drugs produce different mean recovery times in a clinical trial.
  • Education: comparing exam performance across several teaching methods or schools, as in the worked example above.
  • Agriculture & manufacturing: the field where Fisher first applied it — comparing crop yields under different fertiliser treatments or product quality across production lines.

Struggling to run or interpret ANOVA?

ResearchProspect to the rescue!

Our statisticians can run your one-way or two-way ANOVA, check the assumptions and write up the results — explore our statistical analysis service.

Frequently Asked Questions

What is ANOVA used for?

ANOVA (Analysis of Variance) is used to compare the means of three or more groups simultaneously and decide whether at least one group mean differs significantly from the others. It is widely used in research to test the effect of a categorical factor on a continuous outcome.

A one-way ANOVA tests the effect of a single categorical factor on the outcome. A two-way ANOVA tests two factors at once and, importantly, their interaction — that is, whether the effect of one factor depends on the level of the other.

The alternative hypothesis for ANOVA states that at least one group mean is different from the others. It does not claim that all the means differ — only that they are not all equal. The null hypothesis is that every group mean is the same.

The F-statistic is the ratio of between-group variance to within-group variance. A value near 1 suggests the group means are similar, while a large F suggests they differ. You reject the null hypothesis when F exceeds the critical value (equivalently, when p < 0.05).

ANOVA assumes the dependent variable is approximately normally distributed within each group, the group variances are roughly equal (homogeneity of variance), and the observations are independent. The outcome must be continuous and the grouping variable categorical.

Running many t-tests inflates the chance of a false positive (Type I error) because each test carries its own risk. ANOVA compares all groups in a single test, keeping the overall error rate controlled, and is therefore the correct choice for three or more groups.

About Owen Ingram

Avatar for Owen IngramIngram is a dissertation specialist. He has a master's degree in data sciences. His research work aims to compare the various types of research methods used among academicians and researchers.

WhatsApp Live Chat