A test statistic is a single number calculated from your sample data that summarises how far your results sit from what the null hypothesis predicts. You compute it during a hypothesis test, then compare it with a known sampling distribution (such as the normal, t, F or chi-square distribution) to obtain a p-value and decide whether to reject the null hypothesis. Common test statistics include the z, t, F and chi-square (χ²) values.
If statistics is not your background, the idea can feel abstract at first — but it is genuinely simple once you see what the number represents. This guide explains what a test statistic is, how it is calculated, the four most common types and exactly how it connects to the p-value, with worked examples throughout.
What Is a Test Statistic?
A test statistic is a value derived from sample data and used in a hypothesis test. Its job is to measure how strongly your data disagree with the null hypothesis — the default claim that there is no effect, no difference or no relationship.
In practice, the test statistic compares the pattern you actually observed with the pattern you would expect if the null hypothesis were true. The further the statistic is from the value expected under the null (usually 0 or 1, depending on the test), the stronger the evidence against the null.
“A test statistic is a quantity derived from the sample for statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic… chosen or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis.” — Encyclopaedic definition of a test statistic (Wikipedia, citing standard statistical references)
Because it is computed from a random sample, a test statistic is itself a random variable: its value changes from one sample to the next. The probability distribution it follows when the null hypothesis is true is called the sampling distribution under the null (sometimes the “null distribution”). When your data provide strong evidence against the null, the test statistic falls far into the tail of this distribution, the corresponding p-value becomes small, and you reject the null hypothesis.
For a quick recap, the null hypothesis (denoted H₀) states that there is no statistically meaningful difference or relationship — for example, that two group means are equal, or that two variables are independent. The alternative hypothesis (H₁ or Hₐ) is the claim that there is such a difference or relationship.
How the Test Statistic Relates to the P-Value
The test statistic and the p-value are two views of the same evidence. The test statistic is the standardised distance between your data and the null hypothesis; the p-value translates that distance into a probability.
Formally, the p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. You find it by locating your test statistic on its sampling distribution and measuring the tail area beyond it.
- A test statistic close to 0 (for z, t and F-style tests near their null centre) sits in the middle of the distribution → large tail area → large p-value → weak evidence against H₀.
- A test statistic far from the null value sits deep in the tail → small tail area → small p-value → strong evidence against H₀.
You can reach a decision in two equivalent ways. The critical value method compares the test statistic with a cut-off; the p-value method compares the p-value with your significance level α (commonly 0.05). If the p-value is less than α, the result is statistically significant and you reject H₀.
What Is a Critical Value?
A critical value is a cut-off point on the test statistic’s scale that marks the boundary of the rejection region. It is determined by the significance level α (and, for t, F and χ² tests, the degrees of freedom). If your test statistic is more extreme than the critical value, you reject the null hypothesis.
For example, in a two-tailed z-test at α = 0.05, the critical values are ±1.96. This comes directly from the normal distribution: about 95% of the area under the standard normal curve lies within 1.96 standard deviations of the mean, leaving 2.5% in each tail. So any z below −1.96 or above +1.96 falls in the rejection region.
Using critical values for hypothesis testing comes down to three steps:
- Calculate the test statistic from your sample.
- Find the critical value(s) for your chosen significance level α and degrees of freedom.
- Reject H₀ if the test statistic is more extreme than the critical value; otherwise, fail to reject it.
With the groundwork in place, let’s look at the four most common types of test statistic and when to use each.
Not sure which test statistic your data needs?
ResearchProspect to the rescue!
Our statisticians will choose the right test, run the analysis and explain every number — see our statistical analysis service.
How Many Types of Test Statistics Are There? Which One Should You Use?
There are many test statistics, but most fall into four families, each tied to its own sampling distribution. The right one depends on your data type, how many groups you are comparing and what you know about the population. The flowchart below summarises the choice; for a fuller walk-through see our guide on which statistical test you should use.
Choosing an inferential test
Note: different tests use different formulas, but the underlying logic is identical — standardise the gap between observed and expected, then compare it with the relevant distribution. We will cover the four common types: the t-test, z-test, ANOVA (F-test) and chi-square test.
T-Test (t-statistic)
A t-test compares means and is used when the population standard deviation is unknown and is estimated from the sample — which is almost always the case in real research, especially with smaller samples. It assumes the data are approximately normally distributed. The resulting test statistic is the t-value, and it follows Student’s t-distribution.
There are three common versions:
- One-sample t-test — compares one group’s mean with a known or hypothesised value.
- Independent-samples t-test — compares the means of two separate groups.
- Paired-samples t-test — compares two measurements taken on the same group (e.g. before vs after).
For a one-sample t-test, the t-statistic is:
t = (x̄ − μ₀) / (s / √n), where
- x̄ is the sample mean
- μ₀ is the hypothesised population mean under H₀
- s is the sample standard deviation
- n is the sample size (so s / √n is the standard error of the mean)
This test uses n − 1 degrees of freedom.
Standard error = s / √n = 10 / √25 = 10 / 5 = 2.
t = (74 − 70) / 2 = 4 / 2 = t = 2.0 with 24 degrees of freedom.
The two-tailed critical value at α = 0.05 with 24 df is about ±2.064. Since 2.0 < 2.064, you fail to reject H₀ — the 4-mark difference is not quite statistically significant at the 5% level (p ≈ 0.057).
Null hypothesis (H₀): the population mean equals the hypothesised value (μ = μ₀).
Alternative hypothesis (H₁): the population mean differs from it (μ ≠ μ₀).
Z-Test (z-statistic)
A z-test also compares means, but it is used when the population standard deviation (σ) is known, or when the sample is large enough (typically n ≥ 30) for the normal approximation to hold. The test statistic is the z-value, which follows the standard normal distribution.
The formula for a one-sample z-test is:
z = (x̄ − μ) / (σ / √n), where
- x̄ is the sample mean
- μ is the population mean under H₀
- σ is the population standard deviation
- σ / √n is the standard error of the mean
Standard error = 10 / √100 = 10 / 10 = 1.
z = (73 − 70) / 1 = z = 3.0.
Because 3.0 > 1.96 (the two-tailed critical value at α = 0.05), you reject H₀. The corresponding p-value is about 0.0027 — strong evidence the true mean is not 70.
Null hypothesis (H₀): the means are equal (μ = μ₀).
Alternative hypothesis (H₁): the means are not equal (μ ≠ μ₀).
ANOVA / F-Test (F-statistic)
Analysis of Variance (ANOVA) compares the means of three or more groups in a single test, avoiding the inflated error rate of running many t-tests. Its test statistic is the F-statistic, which follows the F-distribution.
The F-statistic is the ratio of two estimates of variance:
F = (variance between groups) / (variance within groups)
If the group means are truly equal, both quantities estimate the same underlying variance and F is close to 1. If the means differ, the between-group variance grows and F increases. ANOVA uses two degrees-of-freedom values — one for the numerator (k − 1, where k is the number of groups) and one for the denominator (N − k, where N is the total sample size).
Common forms include:
- One-way ANOVA — one categorical factor with three or more levels.
- Two-way ANOVA — two factors and their interaction.
- MANOVA — extends ANOVA to two or more dependent variables at once.
Null hypothesis (H₀): all group means are equal (μ₁ = μ₂ = … = μₖ).
Alternative hypothesis (H₁): at least one group mean differs from the others.
Chi-Square Test (χ²-statistic)
The chi-square (χ²) test is used for categorical data — it compares observed frequency counts with the counts expected under the null hypothesis. The test statistic is the chi-square statistic, which follows the χ² distribution.
The formula is:
χ² = Σ [ (O − E)² / E ], where
- O is the observed frequency count in each category (or cell)
- E is the expected frequency count under H₀
- the sum runs over all categories or table cells
The larger the gap between observed and expected counts, the larger χ² becomes — and the stronger the evidence against the null.
χ² = (8−10)²/10 + (12−10)²/10 + (9−10)²/10 + (11−10)²/10 + (13−10)²/10 + (7−10)²/10
= 0.4 + 0.4 + 0.1 + 0.1 + 0.9 + 0.9 = χ² = 2.8 with 5 degrees of freedom.
The critical value at α = 0.05 with 5 df is about 11.07. Since 2.8 < 11.07, you fail to reject H₀ — the die appears fair.
The two main versions are:
- Goodness-of-fit test — checks whether a single categorical variable matches an expected distribution.
- Test of independence — checks whether two categorical variables are related, using a contingency table.
Null hypothesis (H₀): the variables are independent (or the data fit the expected distribution).
Alternative hypothesis (H₁): the variables are associated (or the data do not fit).
Summary: Comparing the Four Test Statistics
The table below summarises when to use each test statistic and what it tells you.
| Test statistic | Distribution | Data / use case | Key condition |
|---|---|---|---|
| z | Standard normal | Compare a mean (or proportion); large samples | Population σ known, or n ≥ 30 |
| t | Student’s t | Compare one or two means; small samples | Population σ unknown (estimated by s) |
| F | F-distribution | Compare three or more means (ANOVA); variances | Three+ groups; normality, equal variances |
| χ² | Chi-square | Categorical data: fit or association | Counts, adequate expected frequencies (≥ 5) |