"> Selection Bias: Definition, Causes & How to Avoid It - ResearchProspect
Home > Library > Research Bias > Selection Bias: Definition, Causes & How to Avoid It

Published by at July 31st, 2023 , Revised On June 22, 2026

Selection bias is a systematic error that occurs when the people, units or data included in a study are not representative of the wider population the research aims to describe, so the sample is skewed before any analysis begins. Because the selection process itself (not chance) distorts who is in or out, the results can be misleading no matter how carefully the data are later analysed. This guide covers what selection bias is, what causes it, the main types, worked and everyday examples, how it differs from related biases, and a practical checklist for avoiding and reducing it in your own dissertation or paper.

What Is Selection Bias?

Selection bias is a distortion in research that arises when the participants, cases or observations chosen for a study are not truly representative of the entire population the researcher is trying to describe. It happens when the process used to select the sample is non-random or flawed, so the result is “skewed” in a way that does not hold up in the real world. Imagine trying to determine the average height of people in a city, but you only collect data at a professional basketball tryout. You would gather plenty of measurements, but your conclusion would be miles from the truth. That is the core of selection bias: the way participants are chosen, whether by accident or by design, can guarantee a particular result before the research even begins.

Crucially, selection bias is not the same as a small or noisy sample. A small random sample is simply imprecise; you can fix that by collecting more data. A selection-biased sample is systematically wrong: gathering more of the same skewed data only makes you more confident in the wrong answer. Selection bias therefore threatens both the internal validity of a study (whether the observed effect is real for the people studied) and its external validity, or generalisability, to the population you actually care about. It sits within the broader family of pitfalls discussed in our guide to research bias, alongside measurement and analysis errors.

The word “systematic” is what makes selection bias so dangerous. Random error scatters your estimates evenly around the true value, so it averages out as your sample grows. Selection bias pulls every estimate in the same direction, so the error has a sign: your figure is consistently too high or too low. This is why a confident, tightly bunched result drawn from a biased sample can be far more misleading than a wide, uncertain result drawn from a fair one. Precision without representativeness is a false comfort, and examiners are trained to look past a neat number to the sampling story behind it.

Example: A study testing a new medication for a chronic health condition recruits participants only from a single specialist clinic. Patients at that clinic have access to better facilities and resources than the general population, and many have more severe or complex cases. When the trial reports a strong recovery rate, the figure overstates how well the drug works for the average patient: the sample was selected from a group that was never representative of everyone living with the condition. The same drug, tested on a randomly drawn sample, might show a far more modest effect. The findings are therefore biased by selection, not by the medication itself.

What Causes Selection Bias?

Selection bias does not appear from a single source. It creeps in wherever a decision is made about who or what enters the dataset, from the sampling frame right through to who actually replies. Understanding these causes is the first step to designing them out. The most common drivers are:

  • Non-random sampling. Convenience samples (friends, your own social network, a single class or clinic) over-represent whoever is easiest to reach.
  • Self-selection. People decide for themselves whether to take part, so those with strong opinions or specific characteristics dominate the sample.
  • Differential non-response. Invited participants who ignore the study differ systematically from those who reply, quietly reshaping the sample.
  • Flawed inclusion or exclusion criteria. Rules that look neutral can inadvertently screen out whole subgroups (for example, requiring an email address or smartphone).
  • Attrition (loss to follow-up). In longitudinal work, participants who drop out are rarely a random subset, so the survivors are no longer representative.
  • Sampling-frame errors. If the list you draw from omits part of the population, no amount of randomisation can recover the missing people, a problem closely linked to undercoverage bias.

Many of these causes overlap. A poorly designed survey can suffer from self-selection, non-response and undercoverage at the same time. The practical lesson is that selection bias is a property of the recruitment and sampling process, so it must be tackled at the design stage rather than patched after the data are in.

Types Of Selection Bias

Several distinct forms of selection bias can occur in research and data analysis. The table below summarises the main types, how each one arises and the direction in which it typically distorts results.

Type of selection bias How it arises Typical effect on results
Self-selection bias Individuals choose whether to join the study; those who feel strongly about the topic opt in. Over-represents motivated or opinionated participants.
Non-response bias Selected participants decline or fail to respond, and non-responders differ from responders. Skews estimates toward the traits of those who reply.
Volunteer bias Samples rely on volunteers, who tend to be healthier, wealthier or more engaged. Overstates benefits of an intervention or behaviour.
Berkson’s bias Cases are drawn from hospital or clinic populations with a higher prevalence of certain conditions. Creates spurious associations between exposures and outcomes.
Healthy-user bias The sample includes people who are unusually health-conscious or proactive. Inflates the apparent benefit of treatments or screening.
Attrition bias Participants drop out of a longitudinal study in a non-random way. Distorts change-over-time estimates among survivors.
Overmatching bias Controls are matched on factors influenced by the exposure or outcome. Artificially weakens or hides a true association.
Diagnostic-access bias The chance of being diagnosed depends on exposure status or access to testing. Distorts the exposure–outcome relationship.

Two of these deserve a closer look because students meet them constantly. Self-selection bias occurs when individuals self-select into a study or sample, producing a non-random group; in surveys, people who feel strongly about a topic are far more likely to participate. Non-response bias occurs when those invited do not reply and differ systematically from those who do. If a questionnaire on income is mostly completed by higher earners, for instance, it will overestimate average income. Volunteer bias is a related trap in clinical trials, where volunteers may be more motivated or in better health than the wider population.

Berkson’s bias is common in hospital-based research: drawing a study population from patients who are already in hospital can inflate the apparent link between two variables simply because hospitalised people often have multiple conditions at once. Healthy-user and diagnostic-access biases similarly arise when access to care or self-care behaviour, rather than the exposure of interest, drives who ends up in the sample.

Selection Bias In Research

Selection bias in research refers to the systematic error that occurs when the selection of participants or cases for a study is not random or representative of the target population. It can enter at several stages, and recognising where it strikes helps you guard against it:

  • Sampling frame: the list or source you draw from omits part of the population.
  • Recruitment: the channels you use reach some groups far more easily than others.
  • Consent and enrolment: who agrees to take part is shaped by interest, trust or incentives.
  • Data collection and follow-up: who completes the study (and who is lost) reshapes the final sample during data collection.

Because these errors compound, a study can look methodologically tidy and still rest on a biased sample. That is why transparent reporting of sampling and response rates is treated as a marker of quality, and why selection bias is a standard limitation to address in any statistical analysis chapter.

How Selection Bias Distorts a SampleTarget populationBiasedselectionSkewed sampleGroup A (over-represented)Group B (excluded)
A biased selection process lets one group through while filtering another out, so the sample no longer mirrors the population.

Examples Of Selection Bias In Everyday Life

Selection bias is not confined to laboratories and clinical trials; it shapes the information we see every day. Recognising it in familiar settings makes it easier to spot in your own work.

  • Online product reviews: people tend to review only what they love or hate, so the average rating misrepresents typical satisfaction.
  • Restaurant ratings: diners with extreme experiences review most often, while neutral experiences go unrecorded.
  • Social media feeds: personalisation algorithms show content matched to past behaviour, narrowing what you see. Researchers studying social media must account for this when sampling posts or users.
  • Political surveys: polls run by campaigns may target supporters, producing a sample that does not reflect the electorate.
  • Job applications: hiring managers may favour candidates from familiar schools or backgrounds, overlooking talent from other sources.
  • Media coverage: outlets foreground sensational stories, so the news you read is a biased slice of everything that happened.

The common thread is that some voices are systematically more likely to be heard than others. Whenever a sample is built from whoever turns up rather than whoever was meant to be included, selection bias is probably at work. A useful habit is to ask, for any dataset you encounter, “who is missing from this picture, and why?” The answer usually reveals the selection mechanism quietly shaping the result.

This everyday version of selection bias also explains a classic statistical trap known as survivorship bias. During the Second World War, analysts examined returning aircraft to decide where to add armour, focusing on the areas riddled with bullet holes. The statistician Abraham Wald pointed out the error: the planes that were hit in those areas had still made it home, so the armour belonged precisely where the survivors showed no damage, because aircraft hit there never returned to be measured. The sample of surviving planes was selected by survival itself, and reading it at face value would have reinforced exactly the wrong conclusion.

“The first principle is that you must not fool yourself—and you are the easiest person to fool.” — Richard P. Feynman

Students often confuse selection bias with other systematic errors. The distinction matters because each calls for a different fix. Selection bias is about who is in the sample; the biases below operate elsewhere in the research process.

Bias Where it operates How it differs from selection bias
Selection bias Sampling / recruitment The reference point: the sample is unrepresentative of the population.
Undercoverage bias Sampling frame A specific cause of selection bias: part of the population is missing from the list you sample from.
Cognitive bias Researcher judgement Mental shortcuts distort how the researcher decides, interprets or reports, not who is sampled.
Status-quo bias Decision-making A preference for keeping things as they are; affects choices and responses rather than sampling.

Because undercoverage bias stems from an incomplete sampling frame, it is best seen as one route into selection bias rather than a separate problem. By contrast, cognitive bias and status-quo bias arise from human judgement, so they can affect even a perfectly drawn random sample. Treating selection bias as a sampling issue, and these others as judgement issues, keeps your limitations section precise.

Example: A student researching part-time work among undergraduates posts an online questionnaire link in a campus money-saving group and on her own social feeds. She receives 280 responses and reports that 84% of students work part-time. The figure is almost certainly inflated by self-selection: a money-saving group attracts students who are already budgeting and likely to work, while classmates who do not earn have little reason to click. A defensible alternative would be to draw a random sample from the university’s full enrolment list, invite every selected student directly, and chase non-responders, so the sample reflects all undergraduates rather than the self-selected few who saw the link.

How To Avoid And Reduce Selection Bias

Selection bias is largely preventable if you plan for it. The strategies below tackle it at the design stage, where it is cheapest to fix, and at the analysis stage, where some residual bias can still be addressed.

Design-Stage Safeguards

  • Use random sampling: give every unit in the population an equal chance of selection, the single most effective defence against selection bias.
  • Use stratified sampling: when key subgroups matter, sample within each stratum to guarantee representation and avoid under- or over-representing groups.
  • Define inclusion and exclusion criteria carefully: base them on the research objectives, not on convenience, and check they do not silently exclude a subgroup.
  • Build a complete sampling frame: start from a list that covers the whole population to head off undercoverage.
  • Minimise self-selection: actively recruit a defined sample rather than relying on volunteers, and use multiple channels to reach a diverse pool.

Fieldwork And Analysis Safeguards

  • Increase response rates: follow up non-responders, offer modest incentives and explain why participation matters, to limit non-response bias.
  • Limit attrition: keep follow-up simple and stay in contact so longitudinal samples do not erode unevenly.
  • Consider blinding: where appropriate, blind researchers to participant characteristics or group assignments to curb cognitive bias during selection and analysis.
  • Validate against external sources: compare your sample’s demographics with census or registry data to check representativeness, and weight the data if a group is over- or under-sampled.
  • Report transparently: describe the sampling method, response rate and any limitations so readers can judge the risk of bias for themselves.

These steps protect the reliability and validity of your study. No single technique removes selection bias entirely, but combining a sound sampling design with honest reporting will keep it small and, just as importantly, visible to your examiners.

When you write up your dissertation, treat selection bias as something to confront rather than hide. A strong limitations section names the specific risk (for example, “the convenience sample over-represented final-year students”), explains the likely direction of the distortion, and states what a future study could do differently. Acknowledging a known weakness honestly earns more credit than pretending a sample was perfectly representative when it was not, and it signals the kind of methodological awareness that markers reward.

Quick Checklist Before You Collect Data

  • Have I defined the target population precisely, and does my sampling frame cover all of it?
  • Is my selection method random or stratified, rather than convenience-based?
  • Could my recruitment channel attract an unusual subgroup?
  • Do my inclusion and exclusion criteria silently exclude anyone I care about?
  • What is my plan for chasing non-responders and reducing drop-out?
  • How will I check the final sample against external benchmarks?

Worried your sample is biased?

Our expert academics can help you design a representative sample and write up your methodology with confidence.

Looking for research help?

Our skilled writers support students with research across a wide range of disciplines, guaranteeing 100% satisfaction.

Frequently Asked Questions

What is selection bias in simple terms?

Selection bias is a systematic error that happens when the people or cases in a study are not representative of the wider population, because the way they were chosen was non-random or flawed. The sample is skewed before any analysis begins, so the findings can be misleading even if the data are analysed correctly.

It is caused by anything that makes some members of the population more likely to be included than others: non-random or convenience sampling, self-selection by participants, differential non-response, faulty inclusion or exclusion criteria, attrition during follow-up, and incomplete sampling frames (undercoverage). These often occur together, so selection bias must be addressed in the study design.

Common types include self-selection bias, non-response bias, volunteer bias, attrition bias, Berkson’s bias, healthy-user bias, overmatching bias and diagnostic-access bias. Each arises at a different point in recruitment or follow-up, but all share the same result: a sample that does not mirror the target population.

Estimating average height from people at a basketball tryout, or testing a drug only on patients from one specialist clinic, are classic examples. In both cases the sample is systematically unusual, so the result overstates the true value for the whole population. Everyday examples include online reviews, which capture mainly very positive or very negative opinions.

Use random or stratified sampling, build a complete sampling frame, define inclusion criteria carefully and recruit actively rather than relying on volunteers. During fieldwork, raise response rates, limit drop-out, validate the sample against external data such as census figures, and report your sampling method and response rate transparently so the risk of bias is visible.

Selection bias is about who ends up in the sample: it stems from the sampling and recruitment process. Cognitive bias is about how the researcher thinks, interpreting, judging or reporting, and can affect even a perfectly drawn random sample. The two are tackled differently, so it is worth distinguishing them in your limitations section.

About Owen Ingram

Avatar for Owen IngramIngram is a dissertation specialist. He has a master's degree in data sciences. His research work aims to compare the various types of research methods used among academicians and researchers.

WhatsApp Live Chat