"> Correlational Research: Types, Examples & Methods - ResearchProspect
Home > Library > Research Methodology > Correlational Research: Types, Examples & Methods

Published by at August 14th, 2021 , Revised On June 17, 2026

Correlational research is a non-experimental design that measures the relationship between two or more variables as they naturally occur, without the researcher manipulating or controlling any of them. It tells you whether variables move together, in which direction, and how strongly, but it cannot, on its own, prove that one variable causes the other. Use it when manipulation is impossible, unethical, or impractical, and when your aim is to describe associations or generate predictions rather than to establish cause.

In a correlational study you simply observe and quantify what is already there, then summarise the association with a statistic called the correlation coefficient. This guide explains the types of correlation, how to read the coefficient r, which data-collection methods suit a correlational design, and the strengths and limitations every student should understand before choosing this approach for a dissertation.

What is correlational research?

Correlational research is a quantitative, non-experimental method used to determine whether, and to what degree, a relationship exists between two or more variables. The defining feature is that the researcher does not intervene: no variable is manipulated, no treatment is administered, and participants are not randomly assigned to conditions. Instead, the variables are measured as they naturally vary across a sample, and the pattern of co-variation is described statistically.

Because nothing is manipulated, correlational designs sit at a different point on the methodological spectrum from experiments. An experiment deliberately changes an independent variable to observe its effect on a dependent variable under controlled conditions. A correlational study takes the world as it finds it and asks a more modest question: do these things tend to occur together? This makes correlation invaluable in psychology, business, education, health and sociology, where many variables of interest, such as personality, income, anxiety or class size, cannot ethically or practically be assigned at random.

A correlational study typically produces three pieces of information: the direction of the relationship (do the variables move the same way or in opposite directions?), the strength of the relationship (how closely do they track each other?), and, with inferential testing, the statistical significance of the association (how likely is a relationship this size to be a fluke of sampling?).

The cardinal rule: correlation does not imply causation

The single most important principle in this entire topic is that correlation does not imply causation. Finding that two variables are strongly related tells you they vary together; it does not tell you that one produces the other. This is not a pedantic footnote, it is the boundary line that separates correlational evidence from causal claims, and misunderstanding it is the most common error in undergraduate methodology.

Consider a frequently cited real-world pattern: across towns, ice-cream sales correlate positively with drowning incidents. Months with high ice-cream sales also see more drownings. It would be absurd to conclude that ice cream causes drowning, or that drowning drives people to buy ice cream. The relationship is produced by a confounding (third) variable, hot summer weather, which independently increases both ice-cream consumption and the number of people swimming. Once you account for temperature, the apparent link between ice cream and drowning largely disappears.

Example: A health researcher notices that, across a sample of 300 adults, daily coffee consumption correlates with higher reported stress (a positive correlation). It is tempting to conclude that coffee causes stress. But a plausible confounder is workload: people with demanding jobs both drink more coffee (to stay alert) and report more stress (because of the job itself). Workload could be driving both variables, leaving the coffee–stress link spurious. Without manipulating coffee intake while holding workload constant, the researcher cannot isolate cause.

“Correlation is not causation” is perhaps the most widely repeated cautionary phrase in statistics precisely because the temptation to read cause into co-occurrence is so strong. (Source: Field, 2018)

To move from correlation to a causal claim you generally need an experimental design with manipulation and random assignment, or a carefully reasoned quasi-experimental or longitudinal approach that controls for confounders and establishes temporal order. Correlational findings are best treated as evidence of association and as a springboard for the causal hypotheses you test later.

Types of correlation

Correlations are classified first by their direction. There are three basic types.

Positiver ≈ +0.9Negativer ≈ -0.9No correlationr ≈ 0
The three patterns of correlation: as one variable rises the other rises (positive), falls (negative), or shows no consistent link (zero).
  • Positive correlation: as one variable increases, the other tends to increase too (and as one decreases, so does the other). The variables move in the same direction. Mini-example: hours spent revising and exam marks tend to rise together.
  • Negative (inverse) correlation: as one variable increases, the other tends to decrease. The variables move in opposite directions. Mini-example: the number of hours spent on social media per day and average sleep duration: more scrolling, less sleep.
  • Zero / no correlation: there is no systematic relationship; knowing one variable tells you nothing useful about the other. Mini-example: a person’s shoe size and their intelligence test score, two variables with no meaningful connection.

It is worth stressing that “positive” and “negative” describe direction, not desirability. A strong negative correlation (for example, between regular exercise and resting heart rate) can be an extremely useful and welcome finding.

The correlation coefficient (r)

Direction alone is not enough; you also need to quantify strength. This is the job of the correlation coefficient, usually denoted r. The coefficient is a single number that summarises both the direction and the strength of a linear relationship between two variables.

The coefficient always ranges from −1 to +1:

  • r = +1 indicates a perfect positive linear relationship (every increase in one variable is matched by a proportional increase in the other).
  • r = −1 indicates a perfect negative linear relationship.
  • r = 0 indicates no linear relationship at all.

Values between these extremes indicate progressively weaker relationships as they approach zero. The sign tells you the direction; the absolute value tells you the strength. So r = −0.72 is a stronger relationship than r = +0.45, even though the first is negative. The conventional bands for interpreting strength are shown below; note these are a common rule-of-thumb scheme. Cohen’s (1988) own benchmarks are more lenient — he treats r ≈ 0.10 as small, 0.30 as medium, and 0.50 as large.

Value of r (absolute) Strength of relationship Interpretation
0.00 – 0.09 None / negligible Effectively no linear association
0.10 – 0.29 Weak A small, often trivial association
0.30 – 0.49 Moderate A meaningful but partial association
0.50 – 0.69 Strong A substantial association
0.70 – 1.00 Very strong Variables track each other closely

Treat these bands as guidelines, not laws. What counts as a “strong” correlation varies by discipline: an r of 0.30 may be impressive in psychology, where human behaviour is noisy, yet disappointing in a tightly controlled physics measurement.

Scatterplots: always look before you calculate

Before trusting a single coefficient, plot your data on a scatterplot, with one variable on each axis and one point per case. The scatter reveals things a number hides: whether the relationship is linear (the coefficient r only captures linear association), whether there are outliers distorting the value, and whether the cloud of points tightens into a line (high |r|) or spreads diffusely (low |r|). A famous demonstration, Anscombe’s quartet, shows four datasets with identical correlation coefficients but utterly different shapes, one curved, one driven by a single outlier. The lesson is simple: the number summarises, the plot diagnoses.

Pearson vs Spearman

Which coefficient you calculate depends on your data. Pearson’s r is used when both variables are continuous, measured on an interval or ratio scale, and approximately normally distributed with a roughly linear relationship. Spearman’s rho (a rank-based coefficient) is used when data are ordinal, when the relationship is monotonic but not linear, or when outliers and non-normality make Pearson unsafe. In short: reach for Pearson with well-behaved continuous data, and Spearman when your data are ranked or assumptions are violated. Choosing correctly depends on understanding your variables and their levels of measurement.

Worked example: calculating Pearson’s r by hand

To see exactly what the coefficient measures, let us compute Pearson’s r from scratch on a small dataset of six students, relating hours studied (x) to exam score (y).

Step 1 — The data and the deviation table

Hours, x Score, y dx = x − x̄ dy = y − ȳ dx · dy dx² dy²
2 54 −2 −11 22 4 121
3 60 −1 −5 5 1 25
5 66 1 1 1 1 1
6 72 2 7 14 4 49
7 82 3 17 51 9 289
1 56 −3 −9 27 9 81
Σx = 24 Σy = 390 0 0 Σ = 120 Σ = 28 Σ = 566

Step 2 — The means

x̄ = Σx / n = 24 / 6 = 4

ȳ = Σy / n = 390 / 6 = 65

Step 3 — The three sums of products

Σ(dx · dy) = 22 + 5 + 1 + 14 + 51 + 27 = 120

Σdx² = 4 + 1 + 1 + 4 + 9 + 9 = 28

Σdy² = 121 + 25 + 1 + 49 + 289 + 81 = 566

Step 4 — Apply the formula

r = Σ(dx · dy) ⁄ √(Σdx² × Σdy²)r = 120 ⁄ √(28 × 566)r = 120 ⁄ √15848r = 120 ⁄ 125.89r ≈ 0.95

Step 5 — Interpret the result

The coefficient is positive, so hours studied and exam score move in the same direction: students who studied more tended to score higher. Its absolute value, 0.95, falls in the very strong band (0.70–1.00), so the two variables track each other closely in this sample.

Caveat — correlation ≠ causation. Even an r this high does not prove that studying causes higher scores. A confounder such as prior ability or motivation could raise both study time and exam performance, and with only six cases the estimate is fragile. Report the relationship as a strong positive association, not as proof of cause.

Data-collection methods for correlational designs

Correlational research is a design, not a single data-gathering technique, so it can be served by several methods. The three most common are surveys, naturalistic observation, and archival data.

  1. Surveys and questionnaires. The most popular route: you ask the same battery of standardised questions to a sample and then correlate the responses (for example, a job-satisfaction scale against an intention-to-leave scale). Surveys are efficient and allow many variables to be measured at once, but they depend on honest self-report and good instrument design.
  2. Naturalistic observation. You record behaviour as it occurs in its natural setting without intervening, then quantify and correlate what you observe (for example, observing playground interactions and correlating group size with frequency of conflict). This preserves ecological validity but can be time-consuming and vulnerable to observer bias.
  3. Archival / secondary data. You analyse data that already exist, government statistics, company records, published datasets, historical archives, and compute correlations between variables within them (for example, correlating regional unemployment rates with crime figures). This is cost-effective and enables large samples, though you are limited to variables someone else chose to measure.

Whichever method you use, the analytical step is the same: measure the variables, then quantify their association. For a fuller treatment of gathering data, see our guide to methods of data collection.

A worked example: sleep and exam performance

Example: An education researcher wants to know whether sleep is associated with academic performance. She recruits a sample of 120 first-year undergraduates and, over an exam period, records two variables for each student: average nightly sleep (in hours, from a sleep diary) and end-of-module exam score (as a percentage). She manipulates nothing, students sleep and sit exams as they normally would.

Plotting the data on a scatterplot, she sees the cloud of points sloping upward: students who slept more tended to score higher. Both variables are continuous and roughly normally distributed, so she computes Pearson’s r = +0.46, a moderate positive correlation, and tests it for significance (p < .01), meaning a relationship this strong is unlikely to be due to chance in a sample this size.

The crucial interpretation: she can report that sleep and exam performance are moderately, positively associated. She cannot conclude that more sleep causes better marks. A confounder, conscientiousness, could drive both: conscientious students may both keep regular sleep schedules and study more effectively. To test causation she would need an experiment (for example, an intervention that increases sleep) and a formal hypothesis test of the effect.

Strengths and limitations

Correlational research earns its place in the methodological toolkit for several reasons, but its weaknesses are exactly the mirror image of its strengths.

Strengths

  • Real-world (high external validity). Because variables are measured as they naturally occur, findings often generalise well to everyday settings, unlike the sometimes artificial conditions of a lab experiment.
  • Ethical and practical where experiments are not. You cannot ethically assign people to smoke, to be sleep-deprived, or to experience trauma. Correlational designs let you study such variables responsibly by observing existing differences.
  • Predictive power. A reliable correlation lets you predict one variable from another even without understanding the cause, which underpins everything from admissions screening to risk scoring.
  • Efficient and broad. Many variables can be measured at once, and large archival or survey samples are often feasible at modest cost.

Limitations

  • No causation. The headline limitation: a correlation cannot establish that one variable causes another.
  • The directionality problem. Even if A and B are causally linked, correlation alone cannot tell you whether A causes B or B causes A. Does anxiety reduce sleep, or does poor sleep raise anxiety? The coefficient is silent on direction of cause.
  • The third-variable problem. An unmeasured confounder may drive both variables, producing a spurious correlation (as with ice cream, drowning and temperature).
  • Only captures linear (or monotonic) relationships. A coefficient near zero does not rule out a strong but curved relationship, which is why scatterplots matter.

Correlational vs experimental research

Students often weigh a correlational design against an experimental one. The table below contrasts the two on the dimensions that matter most for a dissertation.

Dimension Correlational research Experimental research
Manipulation of variables None – variables measured as they are Researcher manipulates the independent variable
Random assignment No Yes (in true experiments)
What it establishes Association (direction & strength) Cause and effect
Control of confounders Limited; confounders hard to rule out High; controlled conditions isolate the cause
Setting Often natural / real-world Often controlled / lab-based
Typical question “Are X and Y related?” “Does X cause Y?”
Ethical reach Can study variables you cannot manipulate Constrained by what can be ethically manipulated

Neither design is inherently superior; they answer different questions. Strong dissertations often use correlational evidence to identify promising relationships and then experimental or longitudinal work to probe causation.

Common mistakes to avoid

  • Claiming causation from correlation. Never write that X “causes,” “leads to” or “results in” Y on the basis of a correlation alone; use “is associated with” or “predicts.”
  • Reporting r without a scatterplot. You may be summarising a curved relationship or an outlier-driven one as if it were linear.
  • Confusing the sign with strength. A negative correlation is not a weak one; strength is the absolute value of r.
  • Using Pearson on ordinal or non-normal data. Switch to Spearman when assumptions are violated.
  • Ignoring statistical significance and sample size. A correlation from a tiny sample can be large by chance; always test it.

How to conduct a correlational study well: step by step

  1. Define your variables and hypothesis. State precisely which two (or more) variables you will measure and the relationship you predict, then frame it for hypothesis testing.
  2. Choose a data-collection method. Survey, naturalistic observation or archival data, matched to your variables and resources.
  3. Operationalise and measure. Use valid, reliable instruments and check the level of measurement of each variable.
  4. Sample appropriately. Recruit a sample large enough to detect the expected effect and representative enough to generalise.
  5. Visualise first. Build a scatterplot and inspect for linearity, outliers and shape before computing anything.
  6. Select the right coefficient. Pearson for continuous, normal, linear data; Spearman for ordinal or non-normal data.
  7. Compute, test and report. Report r, its sign, the significance level and the sample size, and interpret strength using conventional bands.
  8. Interpret cautiously. Discuss possible confounders and the directionality problem, and avoid causal language.

The statistics that turn raw data into a defensible correlational finding, choosing the coefficient, checking assumptions, testing significance and reporting effect sizes, are where many students lose marks. If you want expert support with that analysis, our specialists can help.

Need help running and reporting your correlations?

Our statisticians choose the right coefficient, check every assumption and write up results you can defend.

Conclusion

Correlational research is a powerful, ethical and efficient way to discover how variables relate in the real world. Master three things and you will use it well: read the coefficient r correctly (sign for direction, absolute value for strength), always look at the scatterplot before trusting the number, and never let a correlation tempt you into a causal claim. Used with that discipline, correlational findings are a credible contribution in their own right and a sound foundation for the experimental work that follows.

Related methodology guides

  • Cross-Sectional vs Longitudinal Studies

Frequently Asked Questions

What is correlational research in simple terms?

Correlational research is a non-experimental method that measures whether two or more variables are related, in what direction (positive, negative or none) and how strongly, without the researcher manipulating any of them. It describes associations as they naturally occur, but it cannot prove that one variable causes another.

Because two variables can move together for reasons other than one causing the other. A third (confounding) variable may drive both, or the causal direction may run the opposite way. For example, ice-cream sales and drownings correlate, but hot weather, not ice cream, drives both. Establishing causation requires a controlled experiment or a design that rules out confounders.

The coefficient r summarises a linear relationship in one number ranging from -1 to +1. The sign shows direction (positive means the variables rise together; negative means one rises as the other falls), and the absolute value shows strength: values near 0 mean little or no relationship, while values near 1 (or -1) mean a very strong one.

Pearson’s r is used for two continuous variables that are roughly normally distributed and linearly related. Spearman’s rho is a rank-based alternative used for ordinal data, monotonic but non-linear relationships, or when outliers and non-normality make Pearson unreliable. Use Pearson for well-behaved continuous data and Spearman when those assumptions are violated.

Choose a correlational design when you cannot manipulate the variables for ethical or practical reasons (such as trauma, income or personality), when you want findings that generalise to real-world settings, or when your goal is to describe associations or make predictions rather than establish cause. If you specifically need to prove cause and effect, an experiment is more appropriate.

Its chief limitation is that it cannot establish causation. It also suffers from the directionality problem (it cannot say whether A causes B or B causes A) and the third-variable problem (an unmeasured confounder may drive both variables). In addition, the coefficient only captures linear or monotonic relationships, so a value near zero does not rule out a strong curved relationship.

About Carmen Troy

Avatar for Carmen TroyTroy has been the leading content creator for ResearchProspect since 2017. He loves to write about the different types of data collection and data analysis methods used in research.

WhatsApp Live Chat