"> ANOVA in R: Complete Guide with aov() and TukeyHSD()
Home > Library > Statistics > ANOVA in R: A Complete Guide with Code, Output and Post-Hoc Tests

Published by at September 2nd, 2021 , Revised On June 16, 2026

To run an ANOVA in R, fit a model with aov(response ~ group, data = your_data), then read the result with summary(). If the p-value in the Pr(>F) column is below your significance threshold (usually 0.05), at least one group mean differs from the others. You then run TukeyHSD() to find which groups differ. That is the whole workflow in three lines, and the rest of this guide explains each step, how to read the output table, and how to check that the test’s assumptions hold.

ANOVA (Analysis of Variance) is a statistical test for assessing how the levels of one or more categorical independent variables affect a quantitative dependent variable. ANOVA works by comparing the variation between group means against the variation within the groups, and it tells you whether the group means differ at each level of the independent variable.

The ANOVA null hypothesis (H0) is that all group means are equal, while the alternative hypothesis (Ha) is that at least one mean differs. Note that ANOVA does not tell you which means differ — only that a difference exists somewhere. That is why a post-hoc test such as Tukey’s HSD is almost always run alongside it.

“Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means in a sample.” — R. A. Fisher’s framework, as summarised in the NIST/SEMATECH e-Handbook of Statistical Methods

This guide walks through both a one-way ANOVA (one independent variable) and a two-way ANOVA (two independent variables) in R, with reproducible code, an annotated output table, a worked example, and the assumption checks you must perform before trusting the result. If you would like the conceptual foundations first, see our overviews of the one-way ANOVA and ANOVA in general.

Quick recap of the two designs

  • One-way ANOVA: evaluates the variation in group means while taking just one independent variable (factor) into account — for example, testing whether three fertiliser types produce different crop yields.
  • Two-way ANOVA: a hypothesis test like the one-way version, but each observation is classified by two factors. This lets you test the effect of each factor and their interaction — for example, fertiliser type and watering schedule together.

Let’s Get Started with R

R is a free, open-source language built specifically for statistics, and ANOVA is one of its native strengths — the aov() and anova() functions ship with base R, so no extra packages are required for the core analysis. If you have not used it before, download R and RStudio (both free), then open a new script.

To create a script in RStudio:

  1. Click the File menu.
  2. Hover over New File and select R Script.
  3. Paste the code from this guide into the editor.

To run code, highlight the lines you want and click Run (top-right of the editor), or press Ctrl + Enter (Cmd + Enter on a Mac). Although base R covers the test itself, the example below loads a couple of helper packages for tidy data handling and plotting. Remember that packages must be installed once and then loaded each session:

install.packages(c("ggplot2", "dplyr", "car"))  # run once

library(ggplot2)   # plotting
library(dplyr)     # data wrangling
library(car)       # Levene's test for equal variances

Not sure which statistical test to use for your data?

Let the experts at ResearchProspect do the daunting work for you.

From choosing the right test to running the analysis and interpreting the output, our statisticians handle data collection, sample sizes, validity, reliability and ethics — so you don’t have to do it all alone. Explore our statistical analysis service.

A Step-by-Step Guide: ANOVA in R

The workflow for a one-way ANOVA in R follows seven logical steps. We will state each step, then bring them together in a full worked example.

Step 1 — Load and prepare the data. When importing a dataset, R often reads categorical labels as text or numbers. Use read.csv() to import, then coerce your grouping variable to a factor so ANOVA treats it as categorical rather than continuous:

crop <- read.csv("crop_data.csv")
crop$fertilizer <- as.factor(crop$fertilizer)  # make group categorical
str(crop)                                       # check variable types

Step 2 — Fit the ANOVA model. By comparing the variance within each group to the overall variance of the data, ANOVA determines whether any group mean departs from the overall mean. Use aov() to fit the model and summary() to test it. The result is statistically significant when the F-statistic is large enough that the p-value falls below your threshold:

one_way <- aov(yield ~ fertilizer, data = crop)
summary(one_way)

A note on aov() vs anova(): aov() fits the model; anova() produces an ANOVA table from an already-fitted model. So summary(aov(...)) and anova(lm(...)) give the same F-test. Use aov() when you plan to run TukeyHSD() afterwards, because Tukey’s function expects an aov object.

Step 3 — Compare and choose the best model (for two-way designs). When you have more than one factor, fit competing models and compare them. The Akaike Information Criterion (AIC) balances explained variation against the number of parameters — the lower the AIC, the better the trade-off:

two_way      <- aov(yield ~ fertilizer + density, data = crop)
interaction  <- aov(yield ~ fertilizer * density, data = crop)
AIC(one_way, two_way, interaction)   # compare; lowest AIC wins

Step 4 — Check homoscedasticity (equal variances) and the other assumptions, covered in full further down.

Step 5 — Run a post-hoc test. ANOVA tells you a difference exists but not where it is. Tukey’s Honestly Significant Difference (Tukey’s HSD) performs all pairwise comparisons while controlling the family-wise error rate, so you can see exactly which groups differ:

TukeyHSD(one_way)

Step 6 — Plot the outcome. A good results plot shows: summary statistics (the mean and standard error of each group), the raw data points, and compact letters or symbols above each group to flag significant pairwise differences.

Step 7 — Report the findings. Alongside a figure, report: a short description of the variables tested, what the result means in plain language, and for each effect the F-value, the degrees of freedom, and the p-value — for example, F(2, 12) = 9.01, p = 0.004.

ANOVA in R

Check assumptions (normality, equal variance)
Fit: aov(y ~ group)
Read F & p with summary()
If significant, run TukeyHSD()
Task R function What it gives you
Fit the model aov(y ~ group, data = d) An aov model object
Read the F-test summary(model) The ANOVA table (F, df, p-value)
ANOVA table from an lm fit anova(lm(y ~ group, data = d)) Same F-test, different entry point
Find which groups differ TukeyHSD(model) Pairwise mean differences + adjusted p
Check equal variance leveneTest(y ~ group, data = d) Levene’s test (car package)
Check normality of residuals shapiro.test(residuals(model)) Shapiro-Wilk test

Reading the ANOVA Output Table

Running summary(one_way) prints a table like the one below. Suppose three fertilisers were applied to five plots each (15 plots total). The annotated output is:

             Df  Sum Sq  Mean Sq  F value   Pr(>F)    
fertilizer    2   6.07    3.037     9.01    0.00413 **
Residuals    12   4.04    0.337                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here is what each column means:

Column Meaning
Df Degrees of freedom. For the factor it is (number of groups − 1) = 3 − 1 = 2. For residuals it is (total observations − number of groups) = 15 − 3 = 12.
Sum Sq Sum of squares — the variation attributable to the factor (between groups) and the leftover variation (within groups / residuals).
Mean Sq Mean square = Sum Sq ÷ Df. This is the variance estimate for each source.
F value The test statistic = Mean Sq (factor) ÷ Mean Sq (residuals) = 3.037 ÷ 0.337 = 9.01. Larger F means the between-group variation dominates the within-group noise.
Pr(>F) The p-value: the probability of an F this large if H0 were true. Here 0.00413, well below 0.05.
Signif. codes The stars are a quick legend. ** means p < 0.01; *** means p < 0.001.

The verdict: because p = 0.004 < 0.05, we reject the null hypothesis and conclude that at least one fertiliser produces a different mean yield. The output does not say which fertilisers differ — that is the job of Tukey’s HSD.

Worked Example in R

Example: A researcher tests whether three fertilisers (A, B, C) give different mean crop yields, with five plots per fertiliser. Yields (tonnes/ha) are:
A = 5.1, 5.4, 5.0, 5.3, 5.2; B = 5.9, 6.1, 5.8, 6.0, 6.2; C = 5.5, 5.6, 5.4, 5.7, 5.3.

fertilizer <- factor(rep(c("A", "B", "C"), each = 5))
yield <- c(5.1, 5.4, 5.0, 5.3, 5.2,
           5.9, 6.1, 5.8, 6.0, 6.2,
           5.5, 5.6, 5.4, 5.7, 5.3)
crop <- data.frame(fertilizer, yield)

model <- aov(yield ~ fertilizer, data = crop)
summary(model)

The summary returns F(2, 12) = 27.0, p < 0.001, so the fertilisers differ significantly. To find out which pairs differ, run the post-hoc test:

TukeyHSD(model)
  Tukey multiple comparisons of means
    95% family-wise confidence level

        diff      lwr     upr    p adj
B-A    0.72   0.4189  1.0211  0.00003
C-A    0.30  -0.0011  0.6011  0.05083
C-B   -0.42  -0.7211 -0.1189  0.00833

Reading the p adj column: B vs A (p < 0.001) and C vs B (p = 0.008) are significant, while C vs A (p = 0.051) just misses the 0.05 cut-off. Reported in prose: “A one-way ANOVA showed a significant effect of fertiliser on yield, F(2, 12) = 27.0, p < .001. Tukey’s HSD revealed that fertiliser B produced significantly higher yields than both A and C.”

Checking the Assumptions

An ANOVA result is only trustworthy when its three core assumptions hold. Always check them before reporting:

  1. Independence of observations — a design issue, not testable from the data. Each measurement must come from an independent unit (no repeated measures unless modelled).
  2. Normality of residuals — the model residuals should be approximately normally distributed. Check with a Q-Q plot and the Shapiro-Wilk test.
  3. Homogeneity of variance (homoscedasticity) — groups should have similar variances. Check with Levene’s test and a residuals-vs-fitted plot.
# Normality of residuals
shapiro.test(residuals(model))      # p > 0.05 = normal, assumption met
plot(model, which = 2)              # Q-Q plot

# Homogeneity of variance
library(car)
leveneTest(yield ~ fertilizer, data = crop)  # p > 0.05 = equal variances
plot(model, which = 1)              # residuals vs fitted

Interpret the tests the opposite way to ANOVA itself: for both Shapiro-Wilk and Levene’s test, a p-value above 0.05 means the assumption is satisfied. If normality fails, consider a non-parametric Kruskal-Wallis test (kruskal.test()); if variances are unequal, use Welch’s ANOVA (oneway.test(yield ~ fertilizer, data = crop, var.equal = FALSE)).

“ANOVA is fairly robust to moderate departures from normality, particularly when sample sizes are equal and reasonably large — but it is more sensitive to unequal variances.” — NIST/SEMATECH e-Handbook of Statistical Methods

ANOVA is closely related to regression: a one-way ANOVA is mathematically a linear regression with a categorical predictor, which is why anova(lm(...)) returns the same F-test as summary(aov(...)). Understanding that link makes it easier to extend your analysis to models with continuous covariates (ANCOVA).

Stuck on your ANOVA in R?

ResearchProspect to the rescue!

Our statisticians can run, interpret and report your ANOVA correctly — explore our statistical analysis service or wider data analysis services.

Frequently Asked Questions

How do you perform an ANOVA in R?

Fit the model with aov(response ~ group, data = your_data), then read the F-test with summary(). If the Pr(>F) p-value is below 0.05, the group means differ. Run TukeyHSD() on the model to find which specific groups differ.

aov() fits an ANOVA model, while anova() produces an ANOVA table from a model that has already been fitted (often with lm()). They give the same F-test, but you should use aov() if you intend to run TukeyHSD(), because Tukey’s function expects an aov object.

Read the Pr(>F) column: a value below 0.05 means at least one group mean differs. The F value is the ratio of between-group to within-group variance, and Df gives the degrees of freedom you report alongside it, for example F(2, 12) = 9.01, p = 0.004.

ANOVA only tells you that a difference exists somewhere among the groups, not which pairs differ. TukeyHSD() performs all pairwise comparisons while controlling the family-wise error rate, so its adjusted p-values (p adj) show exactly which groups are significantly different.

Three: independence of observations (a design matter), normality of the residuals (check with shapiro.test(residuals(model)) and a Q-Q plot), and equal variances across groups (check with leveneTest()). For the assumption tests, a p-value above 0.05 means the assumption is met.

If residuals are not normal, use the non-parametric Kruskal-Wallis test (kruskal.test()). If group variances are unequal, use Welch’s ANOVA via oneway.test(y ~ group, data = d, var.equal = FALSE), which does not assume equal variances.

About Owen Ingram

Avatar for Owen IngramIngram is a dissertation specialist. He has a master's degree in data sciences. His research work aims to compare the various types of research methods used among academicians and researchers.

WhatsApp Live Chat