Confounding Variables: Examples & How to Control

Home > Library > Research Methodology > Confounding Variables: Examples & How to Control

Published by Owen Ingram at August 26th, 2021 , Revised On June 17, 2026

A confounding variable is a third variable that is associated with both the independent variable and the dependent variable, distorting the apparent relationship between them and creating a misleading or spurious result. Because a confounder offers an alternative explanation for what you observe, it is one of the most serious threats to the internal validity of any study that aims to establish cause and effect. Confounding variables are also called confounders, confounding factors, or lurking variables.

This guide gives you a precise definition, the two criteria a variable must meet to qualify as a confounder, classic worked examples, and the practical techniques researchers use to control confounding — randomisation, restriction, matching, statistical control, and randomised controlled trial (RCT) design. It also clearly separates a confounder from a mediator and a moderator, three concepts students routinely confuse.

What is a confounding variable?

A confounding variable is an extraneous variable that influences both the independent variable (the presumed cause) and the dependent variable (the presumed effect), so that the relationship you measure between them is partly or wholly due to the confounder rather than to the causal mechanism you are interested in. When a confounder is present and uncontrolled, you may conclude that X causes Y when in fact a third factor, Z, is driving both. This is the statistical engine behind the warning every methods tutor repeats: correlation does not imply causation.

To see why this matters, recall the difference between an extraneous variable and a confounding one. Every study has dozens of extraneous variables — anything outside your independent variable that could in principle affect the outcome. An extraneous variable only becomes a confounder when it is systematically tied to your independent variable. Random noise that is evenly distributed across your groups weakens your power but does not bias your estimate; a confounder, by contrast, biases the estimate itself, pulling it away from the true value in a specific and often predictable direction. Understanding the wider family of types of variables — independent, dependent, control, extraneous, and confounding — is the foundation for spotting confounders before they wreck your conclusions.

The two criteria a confounder must meet

Not every variable that worries you is a confounder. To qualify, a variable must satisfy two conditions, and a third condition rules out a common look-alike (the mediator):

It must be associated with the independent variable. The candidate confounder is distributed unevenly across the levels or groups of your IV. For example, if smokers drink more coffee than non-smokers, smoking is associated with the “coffee” exposure.
It must independently affect the dependent variable. Quite apart from the IV, the confounder is itself a cause of the outcome. Smoking independently raises the risk of heart disease, regardless of coffee intake.
It must NOT lie on the causal pathway between the IV and the DV. A variable that sits between cause and effect — that the IV produces, which in turn produces the outcome — is a mediator, not a confounder. Controlling for a mediator is a mistake (it removes part of the very effect you want to measure). A genuine confounder is a common cause of both variables, sitting “upstream” of the relationship, not inside it.

A simple mental test: draw the variables and ask, “Does this third variable cause both my IV and my DV?” If yes, and it is not produced by the IV, it is a confounder you should control. If the IV causes it and it then causes the DV, it is a mediator you should usually leave alone.

Example: A health researcher finds that towns with more hospitals have higher death rates. Tempting (absurd) conclusion: hospitals cause death. The confounder is population size — bigger towns have both more hospitals (association with the IV) and more deaths in absolute terms (independent effect on the DV). Population size is a common cause of both, sitting upstream. Adjust for population (e.g., use death rates per capita) and the spurious link disappears.

The confounder (Z) is a common cause of both the independent variable (X) and the dependent variable (Y). Because Z drives both, X and Y appear related (dashed blue arrow) even when X does not cause Y — a spurious association produced entirely by the confounder.

Classic examples of confounding

The two examples below are textbook because the confounder is intuitive once named — which is exactly the point. In real research the confounder is rarely so obvious.

Example — ice-cream sales and drowning: Across the year, ice-cream sales (IV) and drowning deaths (DV) rise and fall together, producing a strong positive correlation. Eating ice cream does not cause drowning. The confounder is hot weather / temperature: warm days drive both ice-cream sales (association with the IV) and the number of people swimming, which raises drownings (independent effect on the DV). Temperature is a common cause of both. Hold temperature constant — compare drownings on hot days with high versus low ice-cream sales — and the correlation collapses.

Example — coffee and heart disease: Early observational studies suggested coffee drinkers (IV) had more heart disease (DV). The confounder is smoking: historically, coffee drinkers were also more likely to smoke (association with the IV), and smoking independently causes heart disease (effect on the DV). Once researchers statistically adjusted for smoking, much of coffee’s apparent risk vanished. Smoking, not coffee, was doing the damage.

Worked calculation — a confounder reverses the result (Simpson’s paradox):

A hospital compares two treatments, Drug A and Drug B, on 350 patients each, recording how many recover. Looking at the overall recovery rates, Drug B appears better. But disease severity is a confounder: it affects both which drug a patient received (the IV) and whether they recover (the DV). Watch what happens when we stratify by it.

Step 1 — Overall (confounder ignored):

Drug A: 273 recover / 350 treated = 78.0%
Drug B: 289 recover / 350 treated = 82.6% ← looks better

Step 2 — Stratify by severity (the confounder):

Severity stratum	Drug A (recovered / treated)	Drug A rate	Drug B (recovered / treated)	Drug B rate	Better drug
Mild cases	81 / 87	81 ÷ 87 = 93.1%	234 / 270	234 ÷ 270 = 86.7%	Drug A
Severe cases	192 / 263	192 ÷ 263 = 73.0%	55 / 80	55 ÷ 80 = 68.8%	Drug A
Total	273 / 350	78.0%	289 / 350	82.6%	Drug B (misleading)

Step 3 — Interpret: Within both the mild and severe groups, Drug A actually has the higher recovery rate (93.1% vs 86.7%, and 73.0% vs 68.8%). The overall figures reverse only because Drug A was given mostly to severe patients (263 of 350), who recover less often regardless of treatment, while Drug B was given mostly to mild patients (270 of 350). Severity is associated with the drug given and independently affects recovery — the two confounder criteria — so the unadjusted 78.0% vs 82.6% comparison is spurious. Stratifying by the confounder reveals the true, opposite conclusion: Drug A is the better treatment.

These cases illustrate why an unadjusted association can be a spurious correlation — a relationship that is real in the data but does not reflect a causal link, because a hidden common cause generates both variables. Recognising the possibility of confounding is the single most important habit when interpreting any observational or correlational research finding.

Why confounders threaten internal validity

Internal validity is the degree to which you can be confident that changes in the dependent variable were caused by the independent variable and not by something else. Confounding is the classic enemy of internal validity precisely because it provides a credible alternative explanation for your result. Campbell and Stanley’s foundational work on experimental design identified “history”, “selection”, and “maturation” effects — all of which are, at heart, confounding mechanisms that bias the comparison between conditions.

“A confounding variable, also known as a third variable or a lurking variable, influences both the independent variable and dependent variable… failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.” (Source: Thomas, Scribbr, 2020)

The practical consequence is that a study with strong internal validity is one in which plausible confounders have been anticipated and controlled by design or by analysis. A correlational design with no control for confounders may have excellent external validity (it generalises) yet weak internal validity (you cannot say the relationship is causal). This trade-off is why the gold standard for causal inference is the experiment, where confounding is controlled by design rather than hoped away.

How to control confounding variables

There are five core strategies. The first three are design-stage controls (built into how you collect data); the fourth is an analysis-stage control (applied after data collection); and the fifth, the RCT, is the design that combines randomisation with rigorous control. The table summarises the trade-offs.

Technique	How it controls confounding	Pros	Cons
Randomisation	Randomly assigning participants to groups distributes all confounders — known and unknown — evenly across conditions.	Controls even confounders you never thought of; the strongest single method.	Only possible in experiments; needs adequate sample size; impractical/unethical for many exposures.
Restriction	Include only participants at one level of the confounder (e.g., only non-smokers).	Simple; fully removes the restricted confounder.	Limits generalisability; cannot study the restricted variable; controls only chosen confounders.
Matching	Pair participants across groups so they share key confounder values (e.g., same age and sex).	Balances specified confounders even in small samples.	Hard with many confounders; matched cases can be lost; only controls matched variables.
Statistical control	Adjust for confounders during analysis using multivariable regression or ANCOVA, holding them constant statistically.	Flexible; controls several confounders at once; uses all data.	Only works for measured confounders; relies on model assumptions; residual confounding remains.
RCT design	Combines randomisation with controlled conditions, blinding, and a comparison group.	Gold standard for causal inference; controls known and unknown confounders.	Expensive, time-consuming, often impractical or unethical outside clinical settings.

Design-stage controls

Randomisation is uniquely powerful because it is the only method that controls unknown and unmeasured confounders. By allocating participants to conditions by chance, you make the groups probabilistically equivalent on every background variable, so any post-treatment difference can be attributed to the IV. This is the logic at the heart of true experimental research.
Restriction eliminates a confounder by studying only one of its levels — for instance, recruiting only women to remove sex as a confounder. It is clean but buys control at the cost of generalisability.
Matching ensures comparison groups are balanced on chosen confounders by deliberately pairing participants (common in case-control studies). It is effective for a small number of confounders but becomes unwieldy as the list grows.

Analysis-stage controls

Multivariable regression includes the confounder as a covariate, estimating the IV’s effect on the DV while holding the confounder constant. The adjusted coefficient is your confounder-controlled estimate.
ANCOVA (analysis of covariance) extends ANOVA by removing variance attributable to a continuous covariate (e.g., baseline score), sharpening the comparison between groups.
Stratification and propensity scores are further options for grouping or weighting observations to balance confounders.

The crucial limitation of all statistical control is that you can only adjust for confounders you measured. Anything unmeasured — “residual confounding” — remains. This is precisely why randomisation, which neutralises even the confounders you never imagined, is so prized. When your design is observational and you must lean on statistical adjustment, getting the modelling right is essential; our statistical analysis service can help you specify and run the right multivariable models.

Confounder vs mediator vs moderator

These three terms describe different roles a third variable can play, and conflating them leads to serious analytical errors — especially the mistake of “controlling for” a mediator, which biases your estimate by removing part of the true causal effect.

Concept	Causal role	Question it answers	Should you control it?
Confounder	A common cause of both the IV and the DV; sits upstream, outside the causal path.	“Is the X–Y link real, or driven by a hidden third cause?”	Yes — control it to remove bias.
Mediator	Sits on the causal path: IV → mediator → DV. The IV produces it; it then produces the DV.	“How / by what mechanism does X affect Y?”	No (not if you want the total effect) — controlling it removes part of the effect.
Moderator	Changes the strength or direction of the X–Y relationship (an interaction).	“For whom / under what conditions is X→Y stronger or weaker?”	Test it as an interaction; do not simply “adjust it out”.

In short: a confounder distorts a relationship and must be removed; a mediator explains a relationship and should be preserved (and studied); a moderator conditions a relationship and should be tested as an interaction.

Worked example: a dissertation scenario

Suppose an education student hypothesises that using a new revision app (IV) improves exam scores (DV). She surveys 200 students, finds that app users score on average 9 marks higher, and is ready to claim the app works.

Identify candidate confounders. Who chooses to use a revision app? Likely the more motivated, higher-attaining students. Prior achievement and study motivation are plausible confounders — each is associated with app use (the IV) and each independently predicts exam scores (the DV).
Check the two criteria. Prior achievement is correlated with app use (motivated, able students adopt it) and independently affects scores — and it is not caused by using the app (it predates it), so it is a genuine confounder, not a mediator.
Control by design or analysis. The strongest fix is a randomised experiment: randomly assign students to use the app or not, so motivation and ability balance out. If randomisation is impossible, she measures prior grades and motivation and adjusts for them using multivariable regression (or ANCOVA with prior grade as a covariate).
Re-estimate. After adjusting for prior achievement and motivation, the app’s apparent 9-mark advantage shrinks to 2 marks and is no longer statistically significant. The honest conclusion: most of the raw difference was confounding, not the app.

This is the everyday reality of quantitative research: the raw association is the easy part; defending it against confounding is what separates a credible dissertation from a misleading one. Building this thinking into your design from the outset — and choosing the right control technique — is far more effective than trying to rescue a confounded study after the fact.

Worried a confounder is hiding in your data?

Our statisticians identify confounders and run the right multivariable models so your causal claims hold up.

Get Statistical Help

Common mistakes to avoid

Mistaking correlation for causation. A significant association is never, by itself, evidence of a causal effect — always ask what common cause could generate both variables.
Controlling for a mediator. Adjusting for a variable on the causal path removes part of the true effect and underestimates it. Map your causal assumptions before choosing covariates.
Relying on statistical control alone. It only handles measured confounders; unmeasured confounding remains. Where feasible, design out confounding with randomisation.
Over-adjusting (“the kitchen-sink model”). Throwing every available variable into a regression can introduce bias (e.g., adjusting for colliders or mediators). Be deliberate, not exhaustive.
Ignoring confounders in qualitative-leaning quantitative work. Even small survey studies need a confounding plan, not just a tidy correlation.

Summary

A confounding variable is a common cause of both your independent and dependent variable that distorts their apparent relationship and produces spurious correlations. It must be associated with the IV, independently affect the DV, and not lie on the causal pathway (which would make it a mediator). Confounding is the principal threat to internal validity, and it is controlled by randomisation, restriction, matching, statistical adjustment (multivariable regression, ANCOVA), or — best of all — a randomised controlled trial. Anticipate confounders at the design stage, distinguish them carefully from mediators and moderators, and your causal conclusions will rest on solid ground.

Frequently Asked Questions

What is a confounding variable in simple terms?▾

A confounding variable is a third factor that influences both the cause (independent variable) and the effect (dependent variable) you are studying, making it look as though they are related when the real driver is the third factor. For example, hot weather makes both ice-cream sales and drownings rise, so the apparent link between ice cream and drowning is confounded by temperature.

What are the two criteria for a confounding variable?▾

A variable is a confounder if (1) it is associated with the independent variable, and (2) it independently affects the dependent variable. A third condition rules out look-alikes: it must NOT lie on the causal pathway between the IV and DV — if it does, it is a mediator, not a confounder. A true confounder is a common cause sitting upstream of the relationship.

What is the difference between a confounder and a mediator?▾

A confounder is a common cause of both the independent and dependent variable and sits outside the causal path; you control it to remove bias. A mediator sits on the causal path (IV → mediator → DV) and explains how the IV affects the DV; you should NOT control for it if you want the total effect, because doing so removes part of the genuine causal effect.

How do you control for confounding variables?▾

Use design-stage methods — randomisation, restriction, or matching — and analysis-stage methods such as multivariable regression or ANCOVA. Randomisation is the most powerful because it balances even unknown confounders across groups. Statistical control only handles confounders you have actually measured, which is why randomised controlled trials are the gold standard for causal inference.

Why are confounding variables a threat to internal validity?▾

Internal validity is your confidence that the independent variable, and not something else, caused the change in the dependent variable. A confounder provides a credible alternative explanation for your result, so an uncontrolled confounder means you cannot be sure the relationship is causal. This is why confounding produces spurious correlations and why correlation does not imply causation.

Is a confounding variable the same as a moderator?▾

No. A confounder distorts the relationship between two variables and must be removed. A moderator changes the strength or direction of that relationship (an interaction) and tells you for whom or under what conditions the effect is stronger or weaker. You test a moderator as an interaction term rather than simply adjusting it out as you would a confounder.

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

Confounding Variables: Examples & How to Control

What is a confounding variable?

The two criteria a confounder must meet

Classic examples of confounding

Why confounders threaten internal validity

How to control confounding variables

Design-stage controls

Analysis-stage controls

Confounder vs mediator vs moderator

Worked example: a dissertation scenario

Worried a confounder is hiding in your data?

Common mistakes to avoid

Summary

Frequently Asked Questions

You May Also Like

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

OUR EXPERTS CAN HELP WITH YOUR:

MORE AI TOOLS

Confounding Variables: Examples & How to Control

What is a confounding variable?

The two criteria a confounder must meet

Classic examples of confounding

Why confounders threaten internal validity

How to control confounding variables

Design-stage controls

Analysis-stage controls

Confounder vs mediator vs moderator

Worked example: a dissertation scenario

Worried a confounder is hiding in your data?

Common mistakes to avoid

Summary

Frequently Asked Questions

You May Also Like