Variance is a measure of the amount of variation that exists in the data with respect to its mean. In plain terms, it tells you how spread out a set of numbers is by averaging the squared distance of every value from the mean. A small variance means the values cluster tightly around the average; a large variance means they are widely scattered. Variance is denoted by σ² for a population and s² for a sample, and its square root is the more familiar standard deviation.
This guide covers everything you need to know about variance: what it means, the difference between population and sample variance, the exact formulas, a fully worked example you can follow step by step, and how variance relates to the standard deviation. Let us start with the definition.
Definition of Variance
Variance is a statistical measurement of the dispersion between the values in a data collection. It expresses how far each number in the set deviates from the mean, and therefore how far the numbers lie from one another. The symbol most often used to represent variance is σ² (sigma squared) for a whole population and s² for a sample drawn from that population.
Because variance squares every deviation, it is always a non-negative number. A variance of zero indicates that every value in the set is identical, while any spread at all produces a positive variance. It can never be negative. Variance sits alongside the range and the interquartile range within the family of measures of variability, but unlike those, it uses every single data point in its calculation.
“The variance is the average of the squared deviations from the mean. It is a measure of how spread out a distribution is.” — National Institute of Standards and Technology (NIST/SEMATECH e-Handbook of Statistical Methods)
Population Variance vs Sample Variance
There are two distinct variance formulas, and choosing the correct one is the single most common source of error. The difference depends on whether your data represents an entire population or just a sample drawn from a larger population. Understanding population vs sample is therefore essential before you calculate anything.
Population variance (σ²) is used when your data set includes every member of the group you are studying. You divide the sum of squared deviations by N, the total number of values:
σ² = Σ(xᵢ − μ)² / N
Here μ (mu) is the population mean, xᵢ is each individual value, and N is the number of values in the population.
Sample variance (s²) is used when your data is only a sample of a larger population — which is the case in most real research. You divide the sum of squared deviations by n − 1 rather than n:
s² = Σ(xᵢ − x̄)² / (n − 1)
Here x̄ (x-bar) is the sample mean, xᵢ is each value, and n is the sample size.
Why divide by n − 1? Bessel’s Correction
Dividing by n − 1 instead of n is known as Bessel’s correction. When you calculate variance from a sample, you use the sample mean (x̄) rather than the true population mean (μ), which is unknown. The data values are, on average, slightly closer to their own sample mean than to the true population mean, so dividing by n would systematically underestimate the real variance. Reducing the denominator to n − 1 inflates the result just enough to make the sample variance an unbiased estimator of the population variance. The term n − 1 is also called the degrees of freedom, because once the mean is fixed, only n − 1 of the values are free to vary.
| Feature | Population Variance | Sample Variance |
|---|---|---|
| Symbol | σ² | s² |
| Mean symbol | μ (mu) | x̄ (x-bar) |
| Denominator | N (all values) | n − 1 (Bessel’s correction) |
| Formula | Σ(xᵢ − μ)² / N | Σ(xᵢ − x̄)² / (n − 1) |
| Use when | You have every member of the group | You have a sample of a larger group |
How to Calculate Variance: Step by Step
Whether you are computing population or sample variance, the procedure is the same until the final division. Follow these steps:
- Find the mean of your data set (add all values and divide by how many there are).
- Subtract the mean from each value to get the deviation of each point.
- Square each deviation (this removes negative signs and weights larger deviations more heavily).
- Add up all the squared deviations to get the sum of squares.
- Divide the sum of squares by N (for a population) or by n − 1 (for a sample).
Step 1 — Find the mean (x̄):
(4 + 8 + 6 + 5 + 12) ÷ 5 = 35 ÷ 5 = 7
Step 2 — Find each deviation (xᵢ − x̄):
4 − 7 = −3 | 8 − 7 = 1 | 6 − 7 = −1 | 5 − 7 = −2 | 12 − 7 = 5
Step 3 — Square each deviation:
(−3)² = 9 | (1)² = 1 | (−1)² = 1 | (−2)² = 4 | (5)² = 25
Step 4 — Sum the squared deviations:
9 + 1 + 1 + 4 + 25 = 40
Step 5 — Divide:
Sample variance (divide by n − 1 = 4): s² = 40 ÷ 4 = 10
Population variance (if these five were the whole group, divide by N = 5): σ² = 40 ÷ 5 = 8
Notice the sample variance (10) is larger than the population variance (8) for the same numbers — that is Bessel’s correction at work.
The chart above shows each value, the mean line at 7, and the squared deviation contributed by each data point — the further a value sits from the mean, the taller its squared-deviation bar, which is exactly why outliers have such a strong influence on variance.
Standard Deviation and Variance
Variance and standard deviation measure the same thing — spread — but in different units. The standard deviation is simply the square root of the variance, and it tells you, on average, how far each value lies from the mean.
σ = √σ² and s = √s²
Using our worked example, the sample standard deviation is √10 ≈ 3.16, and the population standard deviation is √8 ≈ 2.83.
The crucial practical difference is units. If the original data is measured in metres, the variance is in metres squared, whereas the standard deviation is back in plain metres — the same unit as the data itself. This makes the standard deviation far easier to interpret intuitively, which is why it is usually reported as the headline measure of spread. Variance, however, carries more convenient mathematical properties (variances of independent variables can be added together), so it is the quantity that underpins many inferential techniques, including the standard error and analysis of variance (ANOVA).
Advantages and Disadvantages of Variance
Variance is a cornerstone of statistics, but like any measure it has strengths and limitations.
Advantages:
- It uses every value in the data set, not just a few summary points.
- It treats all deviations from the mean equally, regardless of direction (positive or negative).
- Variances are additive for independent variables, which makes variance indispensable for inferential statistics and ANOVA.
Disadvantages:
- Because deviations are squared, variance gives outliers a disproportionately large influence, which can distort the picture.
- Its squared units make it hard to interpret directly — hence the standard deviation is usually reported instead.
- It is sensitive to the scale of measurement, so variances across differently-scaled variables are not directly comparable.
Struggling with variance and the rest of your statistics?
ResearchProspect to the rescue!
Our experts can run, check and explain your calculations — explore our statistical analysis service for accurate, fully-documented results.