"> Statistical Power: A Complete Guide (1 − β)
Home > Library > Statistics > Statistical Power: Definition, Factors and How to Calculate It

Published by at September 20th, 2021 , Revised On June 16, 2026

What Is Statistical Power?

Statistical power is the probability that a statistical test will correctly reject a false null hypothesis — in plain terms, the chance that your study will detect a real effect when one genuinely exists. It is written as 1 − β (one minus beta), where β is the probability of a Type II error (failing to detect a true effect). A study with 80% power, for example, has an 80% chance of finding a real effect and a 20% chance of missing it.

Power only has a meaningful value when a true effect actually exists in the population. If there is genuinely no effect to find, there is nothing for the test to detect, so power is defined relative to a specific, assumed effect size. The statistical power of a test is sometimes called its sensitivity.

“Power is the probability of detecting an effect, given that the effect is really there.” — Statistics How To, “Statistical Power”

Explaining Statistical Power (1 − β)

To understand power, it helps to set it against the two errors that any hypothesis test can make:

  • Type I error (α): rejecting a null hypothesis that is actually true — a “false positive”. This is controlled by the significance level, usually set at 0.05.
  • Type II error (β): failing to reject a null hypothesis that is actually false — a “false negative”, i.e. missing a real effect.

Power is the complement of the Type II error: power = 1 − β. When power is high, the probability of missing a real effect (β) is low, so the test is very likely to detect an effect that truly exists. When power is low, β is high, and the study is likely to overlook a genuine effect and wrongly conclude that there is none. Read our guide to Type I and Type II errors for a fuller treatment of how these probabilities interact.

Key point: High power means you are likely to catch a real effect. Low power means you are likely to miss one — your test is under-powered and any non-significant result is hard to interpret.

Why Statistical Power Matters

Power is central to designing a credible study for two main reasons:

  • It protects you from false negatives. An under-powered study can fail to detect a real, important effect, wasting time and resources and potentially burying a genuine finding.
  • It drives sample-size planning. A power analysis conducted before data collection (an a priori power analysis) estimates the minimum sample size you need to detect an effect of a given size with your chosen power and significance level. This is now expected in most research proposals, grant applications and ethics submissions.

A famous review by Cohen and later replications found that much published research has historically been under-powered, meaning many true effects were likely missed — a key reason a priori power analysis is now standard practice.

What Is a “Good” Level of Statistical Power?

By widely accepted convention, statistical power should be at least 0.80 (80%). In other words, a well-designed study should have an 80% or greater chance of detecting a statistically significant effect when one truly exists. This 80% benchmark was popularised by the statistician Jacob Cohen.

Power is closely tied to the significance level. A common choice is a 95% confidence level, which corresponds to a significance level (alpha) of 0.05 — the “5% rule” in statistics. This means you accept a 5% risk of a Type I error (a false positive). Power, sample size, effect size and alpha are all interlocked: fix any three and the fourth is determined.

Factors Influencing Statistical Power

Four main factors determine the power of a test. Understanding how each one pushes power up or down is the foundation of any power analysis.

  • Sample size (n): directly related to power. The larger the sample, the higher the power (all else being equal), because larger samples give a more precise estimate of the population and reduce the standard error.
  • Effect size: directly related to power. The larger the true effect — for example, the bigger the difference between two group means — the easier it is to detect, so the higher the power. See our guide to effect size in statistics for how this is quantified.
  • Significance level (alpha, α): directly related to power. A larger alpha (e.g. 0.10 instead of 0.05) makes it easier to reject the null and therefore raises power — but at the cost of a higher Type I error rate, so it is rarely a good trade-off.
  • Variability (standard deviation / error variance): inversely related to power. The greater the variability among subjects, the larger the standard error and the lower the power. Reducing measurement error and using precise designs (such as stratified sampling) increases power.
Factor If the factor increases… Effect on power
Sample size (n) increases Power increases
Effect size increases Power increases
Significance level (α) increases Power increases (but more Type I errors)
Variability / standard deviation increases Power decreases
sample size (n) →
Statistical power rises as the sample size grows.

The relationship between sample size and power is non-linear: power rises steeply as the sample grows from small to moderate, then flattens as it approaches 1.0. This is why doubling an already-large sample yields diminishing returns.

How to Increase Statistical Power

If a planned study is under-powered, you can raise its power by adjusting one or more of the following:

  • Increase the sample size. This is the most common and reliable lever. A larger sample reduces the standard error and increases the chance of detecting a true effect.
  • Reduce variability (error variance). Use precise measurement instruments, standardised procedures and control variables to convert unexplained “random” variation into explained variation. Repeated-measures (within-subject) designs are especially effective because they separate variation between subjects from variation within subjects.
  • Raise the significance level (alpha) — with caution. Moving from α = 0.05 to a higher value increases power but also increases the Type I error rate, so this is rarely advisable.
  • Use a directional (one-tailed) test where justified. If theory clearly predicts the direction of the effect, a one-tailed test concentrates the rejection region on one side and gives more power — but only if the direction is genuinely justified in advance.
  • Target a larger detectable effect. Designing the study around a meaningful minimum effect of interest, rather than the smallest conceivable effect, keeps the required sample realistic.

Statistical Power vs the P-Value

Power and the p-value are related but distinct concepts, and they are easy to confuse:

  • The p-value is calculated after you collect data. It is the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. A result is usually called statistically significant when p ≤ 0.05.
  • Power is calculated before you collect data. It is the probability of rejecting the null hypothesis when it is actually false — a property of the study design, not of any single dataset.

Put simply: alpha controls the false-positive rate, while power (1 − β) controls the false-negative rate. A result with p > 0.05 is described as not statistically significant, meaning a difference as large as the one observed would be expected by chance more than 1 time in 20. Crucially, a non-significant result from an under-powered study does not prove there is no effect — it may simply mean the study lacked the power to detect one.

Worried your study is under-powered?

ResearchProspect to the rescue!

Our experts can run a full a priori power analysis and size your sample correctly with our statistical analysis service.

Worked Example: A Priori Power Analysis

Example: A psychologist plans a two-group experiment comparing a new study technique against a control. From previous research she expects a medium effect (Cohen’s d = 0.5). She wants 80% power (1 − β = 0.80) at a two-tailed significance level of α = 0.05.

Using a standard power formula for comparing two means, the required sample size per group is approximately:

n ≈ 2 × (z1−α/2 + z1−β)² / d²

Substituting the critical z-values (z1−α/2 = 1.96 for α = 0.05 two-tailed, and z1−β = 0.84 for 80% power):

n ≈ 2 × (1.96 + 0.84)² / 0.5² = 2 × (2.80)² / 0.25 = 2 × 7.84 / 0.25 = 62.7

Rounding up, she needs about 64 participants per group (128 in total) to achieve 80% power. If she could only recruit 30 per group, the same calculation shows her power would fall well below 80%, leaving her likely to miss a real medium-sized effect.

This is exactly what an a priori power analysis delivers: given a target power, an alpha level and an expected effect size, it returns the minimum sample size you need.

Calculating Statistical Power in Practice

In real research, power is almost always calculated with software rather than by hand, because the exact formula depends on the test (t-test, ANOVA, correlation, regression, chi-square, and so on), the design and the data type. Widely used tools include:

  • G*Power — a free, widely cited program for power and sample-size calculations across many test types.
  • R — packages such as pwr for analytic power and simulation-based approaches for complex designs.
  • SAS (PROC POWER) and PASS — commercial packages used in clinical and applied research.
  • Online calculators such as PowerAndSampleSize.com and StatPages.

Whichever tool you use, the inputs are the same: the test type, your chosen alpha, the expected effect size and either the sample size (to solve for power) or the target power (to solve for sample size). Choosing a sensible effect size is the hardest part — it should come from prior literature, a pilot study or a smallest effect of practical interest, not from guesswork.

For related concepts, see our guides to statistical significance, inferential statistics and choosing the right statistical test.

Frequently Asked Questions

What does low statistical power mean?

Low statistical power means a study has a small chance of detecting a real effect even when one exists. Because power = 1 − β, low power implies a high Type II (false-negative) error rate, so an under-powered study is likely to miss genuine effects and produce non-significant results that are hard to interpret.

Statistical power is sometimes called the sensitivity of a test. It is also expressed mathematically as 1 − β (one minus the Type II error rate), and the term “true-positive rate” captures the same idea: the probability of correctly detecting a real effect.

An a priori power analysis is a calculation carried out before data collection to determine the minimum sample size needed. You specify the target power (often 0.80), the significance level (often 0.05) and an expected effect size, and the analysis returns the required sample size. It is the recommended way to plan a study and avoid an under-powered design.

90% power (1 − β = 0.90) means the study has a 90% chance of detecting a true effect of the assumed size, and only a 10% chance of missing it (a Type II error). It is a stricter standard than the conventional 80% and usually requires a larger sample, but it is often chosen for high-stakes studies such as clinical trials.

The “5% rule” refers to the conventional significance level of α = 0.05. A result is declared statistically significant when its p-value is 0.05 or lower, meaning there is a 5% or smaller probability of seeing such a result by chance if the null hypothesis were true. The 5% level corresponds to a 95% confidence level and sets the accepted Type I error rate.

No. The p-value is computed after data collection and measures the evidence against the null hypothesis for that dataset, controlling the false-positive (Type I) rate. Power is computed before data collection and measures the probability of detecting a true effect, controlling the false-negative (Type II) rate. They are complementary but not interchangeable.

About Carmen Troy

Avatar for Carmen TroyTroy has been the leading content creator for ResearchProspect since 2017. He loves to write about the different types of data collection and data analysis methods used in research.

WhatsApp Live Chat