"> Meta-Analysis: Methods, Steps & Examples - ResearchProspect
Home > Library > Research Methodology > Meta-Analysis: Methods, Steps & Examples

Published by at April 26th, 2023 , Revised On June 17, 2026

A meta-analysis is a quantitative method that statistically combines the results of multiple independent studies on the same question to produce a single, more precise pooled estimate of an effect. Rather than narrating what the literature “seems to say”, it converts each study’s finding into a common effect size, weights every study by how much information it contributes, and averages them. Use a meta-analysis when several comparable studies measure the same relationship (for example, the effect of a teaching intervention on exam scores) and you want to know the overall direction, magnitude and reliability of that effect across the evidence base.

Meta-analysis sits at the top of the evidence hierarchy because it borrows statistical strength from the whole body of research: small, underpowered studies that individually report “no significant effect” can, when pooled, reveal a clear and trustworthy signal. This guide explains what a meta-analysis is, how it differs from a systematic and narrative review, the step-by-step process from research question to forest plot, the main effect-size measures, and a fully worked calculation you can follow by hand.

What is a meta-analysis?

A meta-analysis is a statistical procedure for integrating the quantitative findings of two or more studies that test the same hypothesis. The term was coined by Gene Glass in 1976, who defined it as “the analysis of analyses” — a second-order analysis in which the unit of observation is no longer the participant but the study. Each primary study contributes a single number, its effect size, expressed on a common scale so that results measured with different instruments can be compared and combined.

Two ideas make meta-analysis powerful. First, standardisation: by converting raw results (mean differences, proportions, correlations) into a unitless effect size, a depression trial scored on the Beck inventory can be pooled with one scored on the Hamilton scale. Second, weighting: larger and more precise studies are given more influence, usually in inverse proportion to their variance, so the pooled estimate leans towards the most informative evidence rather than treating every study as equal.

It helps to see where meta-analysis fits across disciplines. In psychology, a researcher might pool dozens of randomised trials to settle whether mindfulness training measurably reduces anxiety. In business and management, a meta-analysis could combine survey studies to estimate the true correlation between job satisfaction and employee turnover. In education, it might synthesise classroom experiments on whether spaced practice improves retention. In health, it underpins the Cochrane reviews that shape clinical guidelines, and in sociology it can aggregate findings on, say, the relationship between neighbourhood deprivation and educational attainment. In every case the logic is identical: standardise, weight, pool, and quantify how confident we can be in the combined answer.

“Meta-analysis refers to the analysis of analyses … the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.” (Source: Glass, 1976)

Meta-analysis vs systematic review vs narrative review

These three terms are routinely confused. A narrative (traditional) review summarises the literature in prose, selected at the author’s discretion. A systematic review uses a pre-registered, reproducible protocol to find, screen and appraise all relevant studies, but does not necessarily combine them numerically. A meta-analysis is the statistical step that can sit inside a systematic review: it pools the extracted data into one estimate. In short, every robust meta-analysis should be built on a systematic review, but a systematic review need not contain a meta-analysis (for example, when studies are too heterogeneous to pool sensibly).

Feature Narrative review Systematic review Meta-analysis
Search strategy Selective, author’s choice Exhaustive, pre-specified protocol Exhaustive (inherits the systematic search)
Reproducible? No Yes Yes
Synthesis Qualitative, in prose Qualitative (narrative or tabular) Quantitative (pooled effect size)
Handles bias Weakly Explicit risk-of-bias appraisal Appraisal + funnel plot for publication bias
Output A summary argument A structured evidence map A single estimate + confidence interval
Main risk Cherry-picking Heterogeneity left unquantified Pooling apples with oranges

The steps of a meta-analysis

A meta-analysis is a procedural method, so it is best learned as an ordered sequence. The workflow below mirrors the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting standard, which most journals and dissertation examiners now expect.

  1. Define the research question (PICO). Frame a precise, answerable question. The PICO scaffold — Population, Intervention, Comparator, Outcome — forces clarity. For example: in adolescents (P), does cognitive behavioural therapy (I) versus waitlist control (C) reduce anxiety symptoms (O)?
  2. Write and register a protocol. Specify eligibility criteria, databases, search terms and the planned analysis before looking at results, ideally registering it (e.g. on PROSPERO). This pre-commitment is what separates meta-analysis from data-dredging.
  3. Search the literature systematically. Search multiple databases (PsycINFO, Scopus, Web of Science, PubMed), plus grey literature and reference lists, recording every search string so the process is reproducible.
  4. Screen and apply eligibility criteria. Two reviewers independently screen titles, abstracts and then full texts against the inclusion/exclusion criteria, resolving disagreements by discussion. The PRISMA flow diagram documents how many records were identified, screened, excluded and finally included.
  5. Extract the data. From each included study, extract the statistics needed to compute an effect size — means, standard deviations, sample sizes, correlations, or 2×2 event counts — along with study characteristics (moderators) such as design, country and dose.
  6. Compute a common effect size. Convert each study to the same metric (see below): a standardised mean difference, an odds ratio, or a correlation coefficient.
  7. Pool the effect sizes (fixed vs random effects). Combine the effect sizes using inverse-variance weighting. A fixed-effect model assumes every study estimates one identical true effect and only sampling error differs. A random-effects model assumes the true effect itself varies between studies and adds a between-study variance component (τ²); it is the safer default whenever studies differ in population or method.
  8. Assess heterogeneity. Quantify how much the study results genuinely disagree. Cochran’s Q tests whether variation exceeds chance, and the I² statistic expresses the percentage of total variation due to real between-study differences rather than sampling error — roughly 25% low, 50% moderate, 75% high.
  9. Visualise with a forest plot. Display each study as a point estimate with a confidence interval and the pooled result as a diamond, so the reader can see consistency at a glance.
  10. Check for publication bias. Use a funnel plot — effect size against precision — to look for the asymmetry that suggests small, non-significant studies were never published. Egger’s regression test formalises this.

Several of these steps draw on wider quantitative reasoning. If you are shaky on the logic of significance and confidence intervals that underpins step 7, our guide to hypothesis testing is a useful companion, and the same care over measurement quality covered in reliability and validity applies to every study you pool.

Forest plot: effect of intervention (Cohen’s d)0 (no effect)-0.5+1.0Hollis (2019)Bryant (2020)Cho (2021)Driscoll (2022)Pooled (d=0.47)
Figure 1. A forest plot. Each square is a study’s effect size (square size ≈ its weight) with a confidence-interval whisker; the orange diamond is the pooled estimate. Because the diamond sits clear of the dashed “no effect” line, the overall effect is positive and significant.

Effect-size measures

The choice of effect size depends on the type of outcome. Getting this right is the single most important decision in a meta-analysis, because all studies must speak the same statistical language before they can be combined.

  • Standardised mean difference — Cohen’s d. For continuous outcomes compared between two groups. It expresses the difference between two means in standard-deviation units: d = (mean₁ − mean₂) / pooled SD. Conventionally d ≈ 0.2 is small, 0.5 medium and 0.8 large. Hedges’ g applies a small-sample correction to d.
  • Odds ratio (OR) and risk ratio. For binary outcomes (recovered/not recovered, passed/failed). An OR of 1 means no difference; above 1 means the event is more likely in the treatment group. Odds ratios are usually pooled on the log scale because their sampling distribution is skewed.
  • Correlation coefficient — Pearson’s r. For studies reporting the association between two continuous variables. Because r is bounded at ±1, it is transformed to Fisher’s z before pooling and back-transformed afterwards.

Whichever metric you choose, two practical points matter. First, the direction of the effect must be coded consistently across studies — if one trial codes “improvement” as positive and another codes it as negative, you must flip the sign before pooling, or the average will be nonsense. Second, every effect size needs an accompanying measure of precision (its standard error or confidence interval), because that precision is exactly what determines the study’s weight in the pooled analysis. Studies that report only a p-value or a “significant/not significant” verdict, with no means, standard deviations or sample sizes, usually cannot be included — which is one more reason to favour primary studies that report full descriptive statistics.

These effect sizes connect directly to the inferential machinery used in primary research; readers who want the underlying theory of estimation and confidence intervals can review our overview of inferential statistics.

Worked example: pooling four studies by hand

The clearest way to understand pooling is to do it. The example below combines four small studies of a study-skills intervention on exam performance using a fixed-effect, inverse-variance model. The principle is simple: weight = 1 / variance, so a study with a small standard error (high precision) gets a big weight.

Example: Four studies each report a Cohen’s d and its standard error (SE). We compute each variance (SE²), each inverse-variance weight (w = 1/SE²), then the weighted mean.

The data and weights:

  • Hollis (2019): d = 0.42, SE = 0.20 → variance = 0.040 → w ≈ 25
  • Bryant (2020): d = 0.55, SE = 0.15 → variance = 0.0225 → w ≈ 44
  • Cho (2021): d = 0.30, SE = 0.25 → variance = 0.0625 → w ≈ 16
  • Driscoll (2022): d = 0.48, SE = 0.18 → variance = 0.0324 → w ≈ 31

Step 1 — multiply each effect by its weight (w × d):

  • 25 × 0.42 = 10.50
  • 44 × 0.55 = 24.20
  • 16 × 0.30 = 4.80
  • 31 × 0.48 = 14.88

Step 2 — sum the weighted effects: 10.50 + 24.20 + 4.80 + 14.88 = 54.38
Step 3 — sum the weights: 25 + 44 + 16 + 31 = 116
Step 4 — divide to get the pooled effect:
Pooled d = Σ(w×d) / Σw = 54.38 / 116 = 0.47

Step 5 — precision of the pooled estimate: the pooled variance = 1 / Σw = 1 / 116 = 0.0086, so the pooled SE = √0.0086 ≈ 0.093. The 95% confidence interval is 0.47 ± (1.96 × 0.093) = 0.29 to 0.65.

Interpretation: the pooled effect of d = 0.47 is a moderate, positive effect, and because the entire 95% CI (0.29–0.65) lies above zero, the intervention is statistically significant. Notice that Bryant (2020), the most precise study, contributed the largest weight (44) and pulled the average upward — exactly what inverse-variance weighting is designed to do.

Strengths and limitations

Meta-analysis is the most powerful tool for synthesising quantitative evidence, but it is not a cure-all. Its credibility depends entirely on the quality and comparability of the studies fed into it.

Strengths:

  • Increases statistical power and precision by pooling sample sizes across studies.
  • Produces a single, objective, reproducible estimate rather than a subjective reading of the literature.
  • Can quantify and explore heterogeneity, and test moderators (e.g. does effect size depend on age or dose?).
  • Sits at the top of the evidence hierarchy and directly informs policy and clinical guidelines.

Limitations:

  • “Garbage in, garbage out” — pooling biased primary studies yields a precise but biased answer.
  • The apples-and-oranges problem — combining studies that are too dissimilar can produce a meaningless average.
  • Publication bias — if significant results are more likely to be published, the pooled effect is inflated.
  • Often relies on aggregate (published) data rather than individual participant data, limiting some analyses.

Common mistakes to avoid

  • Combining studies that measure different constructs simply because they share a keyword.
  • Using a fixed-effect model when heterogeneity is clearly present — default to random effects unless studies are near-identical.
  • Ignoring a high I² instead of investigating its source through subgroup or meta-regression analysis.
  • Failing to assess publication bias, then over-claiming a strong effect.
  • Double-counting by including multiple effect sizes from the same sample as if they were independent.
  • Skipping protocol registration, which invites accusations of selective reporting.

How to do a meta-analysis well

A defensible meta-analysis is disciplined from the first decision to the last. Pre-register a clear protocol; search exhaustively and document it with a PRISMA flow diagram; have two reviewers screen and extract independently; choose the effect-size metric that matches your outcome; default to a random-effects model and report I² honestly; and always probe for publication bias with a funnel plot. When in doubt about whether studies are similar enough to pool, it is better to present a structured narrative synthesis than to force an average that misleads. The dataset that feeds a meta-analysis is itself a form of secondary research, and the same rigour over sourcing and appraisal discussed in our guide to the advantages of secondary research applies here.

Need help pooling effect sizes and building forest plots?

Our statisticians run your meta-analysis end to end — effect-size computation, fixed/random-effects pooling, heterogeneity and publication-bias diagnostics — with clear, examiner-ready output.

Related methodology guides

  • Systematic Literature Review

Frequently Asked Questions

What is a meta-analysis in simple terms?

A meta-analysis is a statistical method that combines the results of several separate studies on the same question into one overall result. Each study’s finding is converted to a common ‘effect size’, weighted by how precise it is, and averaged to give a single, more reliable estimate of the effect than any individual study could provide.

A systematic review uses a pre-registered, reproducible protocol to find and appraise all relevant studies but may simply describe them. A meta-analysis is the statistical step that pools those studies into one numerical estimate. A meta-analysis is usually carried out within a systematic review, but not every systematic review contains a meta-analysis — sometimes the studies are too different to combine.

A fixed-effect model assumes every study is estimating the same single true effect, with only sampling error causing differences. A random-effects model assumes the true effect itself varies from study to study and adds a between-study variance component (τ²). Random effects is the safer default whenever studies differ in population, setting or method.

I² measures heterogeneity — the percentage of the total variation across studies that is due to genuine differences between them rather than chance. As a rough guide, around 25% is low, 50% moderate and 75% high. A high I² signals that you should investigate why studies disagree, often through subgroup analysis or meta-regression.

The standard tool is a funnel plot, which graphs each study’s effect size against its precision. If smaller, non-significant studies are missing, the plot looks asymmetric, suggesting they were never published. Egger’s regression test provides a formal statistical check of that asymmetry.

It depends on your outcome. Use Cohen’s d (or Hedges’ g) for continuous outcomes compared between two groups, an odds ratio or risk ratio for binary outcomes, and Pearson’s r (transformed to Fisher’s z for pooling) for associations between two continuous variables. All included studies must be converted to the same metric before they can be combined.

About Owen Ingram

Avatar for Owen IngramIngram is a dissertation specialist. He has a master's degree in data sciences. His research work aims to compare the various types of research methods used among academicians and researchers.

WhatsApp Live Chat