Correlational research is a non-experimental design that measures the relationship between two or more variables as they naturally occur, without the researcher manipulating or controlling any of them. It tells you whether variables move together, in which direction, and how strongly, but it cannot, on its own, prove that one variable causes the other. Use it when manipulation is impossible, unethical, or impractical, and when your aim is to describe associations or generate predictions rather than to establish cause.
In a correlational study you simply observe and quantify what is already there, then summarise the association with a statistic called the correlation coefficient. This guide explains the types of correlation, how to read the coefficient r, which data-collection methods suit a correlational design, and the strengths and limitations every student should understand before choosing this approach for a dissertation.
What is correlational research?
Correlational research is a quantitative, non-experimental method used to determine whether, and to what degree, a relationship exists between two or more variables. The defining feature is that the researcher does not intervene: no variable is manipulated, no treatment is administered, and participants are not randomly assigned to conditions. Instead, the variables are measured as they naturally vary across a sample, and the pattern of co-variation is described statistically.
Because nothing is manipulated, correlational designs sit at a different point on the methodological spectrum from experiments. An experiment deliberately changes an independent variable to observe its effect on a dependent variable under controlled conditions. A correlational study takes the world as it finds it and asks a more modest question: do these things tend to occur together? This makes correlation invaluable in psychology, business, education, health and sociology, where many variables of interest, such as personality, income, anxiety or class size, cannot ethically or practically be assigned at random.
A correlational study typically produces three pieces of information: the direction of the relationship (do the variables move the same way or in opposite directions?), the strength of the relationship (how closely do they track each other?), and, with inferential testing, the statistical significance of the association (how likely is a relationship this size to be a fluke of sampling?).
The cardinal rule: correlation does not imply causation
The single most important principle in this entire topic is that correlation does not imply causation. Finding that two variables are strongly related tells you they vary together; it does not tell you that one produces the other. This is not a pedantic footnote, it is the boundary line that separates correlational evidence from causal claims, and misunderstanding it is the most common error in undergraduate methodology.
Consider a frequently cited real-world pattern: across towns, ice-cream sales correlate positively with drowning incidents. Months with high ice-cream sales also see more drownings. It would be absurd to conclude that ice cream causes drowning, or that drowning drives people to buy ice cream. The relationship is produced by a confounding (third) variable, hot summer weather, which independently increases both ice-cream consumption and the number of people swimming. Once you account for temperature, the apparent link between ice cream and drowning largely disappears.
“Correlation is not causation” is perhaps the most widely repeated cautionary phrase in statistics precisely because the temptation to read cause into co-occurrence is so strong. (Source: Field, 2018)
To move from correlation to a causal claim you generally need an experimental design with manipulation and random assignment, or a carefully reasoned quasi-experimental or longitudinal approach that controls for confounders and establishes temporal order. Correlational findings are best treated as evidence of association and as a springboard for the causal hypotheses you test later.
Types of correlation
Correlations are classified first by their direction. There are three basic types.
- Positive correlation: as one variable increases, the other tends to increase too (and as one decreases, so does the other). The variables move in the same direction. Mini-example: hours spent revising and exam marks tend to rise together.
- Negative (inverse) correlation: as one variable increases, the other tends to decrease. The variables move in opposite directions. Mini-example: the number of hours spent on social media per day and average sleep duration: more scrolling, less sleep.
- Zero / no correlation: there is no systematic relationship; knowing one variable tells you nothing useful about the other. Mini-example: a person’s shoe size and their intelligence test score, two variables with no meaningful connection.
It is worth stressing that “positive” and “negative” describe direction, not desirability. A strong negative correlation (for example, between regular exercise and resting heart rate) can be an extremely useful and welcome finding.
The correlation coefficient (r)
Direction alone is not enough; you also need to quantify strength. This is the job of the correlation coefficient, usually denoted r. The coefficient is a single number that summarises both the direction and the strength of a linear relationship between two variables.
The coefficient always ranges from −1 to +1:
- r = +1 indicates a perfect positive linear relationship (every increase in one variable is matched by a proportional increase in the other).
- r = −1 indicates a perfect negative linear relationship.
- r = 0 indicates no linear relationship at all.
Values between these extremes indicate progressively weaker relationships as they approach zero. The sign tells you the direction; the absolute value tells you the strength. So r = −0.72 is a stronger relationship than r = +0.45, even though the first is negative. The conventional bands for interpreting strength are shown below; note these are a common rule-of-thumb scheme. Cohen’s (1988) own benchmarks are more lenient — he treats r ≈ 0.10 as small, 0.30 as medium, and 0.50 as large.
| Value of r (absolute) | Strength of relationship | Interpretation |
|---|---|---|
| 0.00 – 0.09 | None / negligible | Effectively no linear association |
| 0.10 – 0.29 | Weak | A small, often trivial association |
| 0.30 – 0.49 | Moderate | A meaningful but partial association |
| 0.50 – 0.69 | Strong | A substantial association |
| 0.70 – 1.00 | Very strong | Variables track each other closely |
Treat these bands as guidelines, not laws. What counts as a “strong” correlation varies by discipline: an r of 0.30 may be impressive in psychology, where human behaviour is noisy, yet disappointing in a tightly controlled physics measurement.
Scatterplots: always look before you calculate
Before trusting a single coefficient, plot your data on a scatterplot, with one variable on each axis and one point per case. The scatter reveals things a number hides: whether the relationship is linear (the coefficient r only captures linear association), whether there are outliers distorting the value, and whether the cloud of points tightens into a line (high |r|) or spreads diffusely (low |r|). A famous demonstration, Anscombe’s quartet, shows four datasets with identical correlation coefficients but utterly different shapes, one curved, one driven by a single outlier. The lesson is simple: the number summarises, the plot diagnoses.
Pearson vs Spearman
Which coefficient you calculate depends on your data. Pearson’s r is used when both variables are continuous, measured on an interval or ratio scale, and approximately normally distributed with a roughly linear relationship. Spearman’s rho (a rank-based coefficient) is used when data are ordinal, when the relationship is monotonic but not linear, or when outliers and non-normality make Pearson unsafe. In short: reach for Pearson with well-behaved continuous data, and Spearman when your data are ranked or assumptions are violated. Choosing correctly depends on understanding your variables and their levels of measurement.
Worked example: calculating Pearson’s r by hand
To see exactly what the coefficient measures, let us compute Pearson’s r from scratch on a small dataset of six students, relating hours studied (x) to exam score (y).
Step 1 — The data and the deviation table
| Hours, x | Score, y | dx = x − x̄ | dy = y − ȳ | dx · dy | dx² | dy² |
|---|---|---|---|---|---|---|
| 2 | 54 | −2 | −11 | 22 | 4 | 121 |
| 3 | 60 | −1 | −5 | 5 | 1 | 25 |
| 5 | 66 | 1 | 1 | 1 | 1 | 1 |
| 6 | 72 | 2 | 7 | 14 | 4 | 49 |
| 7 | 82 | 3 | 17 | 51 | 9 | 289 |
| 1 | 56 | −3 | −9 | 27 | 9 | 81 |
| Σx = 24 | Σy = 390 | 0 | 0 | Σ = 120 | Σ = 28 | Σ = 566 |
Step 2 — The means
x̄ = Σx / n = 24 / 6 = 4
ȳ = Σy / n = 390 / 6 = 65
Step 3 — The three sums of products
Σ(dx · dy) = 22 + 5 + 1 + 14 + 51 + 27 = 120
Σdx² = 4 + 1 + 1 + 4 + 9 + 9 = 28
Σdy² = 121 + 25 + 1 + 49 + 289 + 81 = 566
Step 4 — Apply the formula
Step 5 — Interpret the result
The coefficient is positive, so hours studied and exam score move in the same direction: students who studied more tended to score higher. Its absolute value, 0.95, falls in the very strong band (0.70–1.00), so the two variables track each other closely in this sample.
Caveat — correlation ≠ causation. Even an r this high does not prove that studying causes higher scores. A confounder such as prior ability or motivation could raise both study time and exam performance, and with only six cases the estimate is fragile. Report the relationship as a strong positive association, not as proof of cause.
Data-collection methods for correlational designs
Correlational research is a design, not a single data-gathering technique, so it can be served by several methods. The three most common are surveys, naturalistic observation, and archival data.
- Surveys and questionnaires. The most popular route: you ask the same battery of standardised questions to a sample and then correlate the responses (for example, a job-satisfaction scale against an intention-to-leave scale). Surveys are efficient and allow many variables to be measured at once, but they depend on honest self-report and good instrument design.
- Naturalistic observation. You record behaviour as it occurs in its natural setting without intervening, then quantify and correlate what you observe (for example, observing playground interactions and correlating group size with frequency of conflict). This preserves ecological validity but can be time-consuming and vulnerable to observer bias.
- Archival / secondary data. You analyse data that already exist, government statistics, company records, published datasets, historical archives, and compute correlations between variables within them (for example, correlating regional unemployment rates with crime figures). This is cost-effective and enables large samples, though you are limited to variables someone else chose to measure.
Whichever method you use, the analytical step is the same: measure the variables, then quantify their association. For a fuller treatment of gathering data, see our guide to methods of data collection.
A worked example: sleep and exam performance
Plotting the data on a scatterplot, she sees the cloud of points sloping upward: students who slept more tended to score higher. Both variables are continuous and roughly normally distributed, so she computes Pearson’s r = +0.46, a moderate positive correlation, and tests it for significance (p < .01), meaning a relationship this strong is unlikely to be due to chance in a sample this size.
The crucial interpretation: she can report that sleep and exam performance are moderately, positively associated. She cannot conclude that more sleep causes better marks. A confounder, conscientiousness, could drive both: conscientious students may both keep regular sleep schedules and study more effectively. To test causation she would need an experiment (for example, an intervention that increases sleep) and a formal hypothesis test of the effect.
Strengths and limitations
Correlational research earns its place in the methodological toolkit for several reasons, but its weaknesses are exactly the mirror image of its strengths.
Strengths
- Real-world (high external validity). Because variables are measured as they naturally occur, findings often generalise well to everyday settings, unlike the sometimes artificial conditions of a lab experiment.
- Ethical and practical where experiments are not. You cannot ethically assign people to smoke, to be sleep-deprived, or to experience trauma. Correlational designs let you study such variables responsibly by observing existing differences.
- Predictive power. A reliable correlation lets you predict one variable from another even without understanding the cause, which underpins everything from admissions screening to risk scoring.
- Efficient and broad. Many variables can be measured at once, and large archival or survey samples are often feasible at modest cost.
Limitations
- No causation. The headline limitation: a correlation cannot establish that one variable causes another.
- The directionality problem. Even if A and B are causally linked, correlation alone cannot tell you whether A causes B or B causes A. Does anxiety reduce sleep, or does poor sleep raise anxiety? The coefficient is silent on direction of cause.
- The third-variable problem. An unmeasured confounder may drive both variables, producing a spurious correlation (as with ice cream, drowning and temperature).
- Only captures linear (or monotonic) relationships. A coefficient near zero does not rule out a strong but curved relationship, which is why scatterplots matter.
Correlational vs experimental research
Students often weigh a correlational design against an experimental one. The table below contrasts the two on the dimensions that matter most for a dissertation.
| Dimension | Correlational research | Experimental research |
|---|---|---|
| Manipulation of variables | None – variables measured as they are | Researcher manipulates the independent variable |
| Random assignment | No | Yes (in true experiments) |
| What it establishes | Association (direction & strength) | Cause and effect |
| Control of confounders | Limited; confounders hard to rule out | High; controlled conditions isolate the cause |
| Setting | Often natural / real-world | Often controlled / lab-based |
| Typical question | “Are X and Y related?” | “Does X cause Y?” |
| Ethical reach | Can study variables you cannot manipulate | Constrained by what can be ethically manipulated |
Neither design is inherently superior; they answer different questions. Strong dissertations often use correlational evidence to identify promising relationships and then experimental or longitudinal work to probe causation.
Common mistakes to avoid
- Claiming causation from correlation. Never write that X “causes,” “leads to” or “results in” Y on the basis of a correlation alone; use “is associated with” or “predicts.”
- Reporting r without a scatterplot. You may be summarising a curved relationship or an outlier-driven one as if it were linear.
- Confusing the sign with strength. A negative correlation is not a weak one; strength is the absolute value of r.
- Using Pearson on ordinal or non-normal data. Switch to Spearman when assumptions are violated.
- Ignoring statistical significance and sample size. A correlation from a tiny sample can be large by chance; always test it.
How to conduct a correlational study well: step by step
- Define your variables and hypothesis. State precisely which two (or more) variables you will measure and the relationship you predict, then frame it for hypothesis testing.
- Choose a data-collection method. Survey, naturalistic observation or archival data, matched to your variables and resources.
- Operationalise and measure. Use valid, reliable instruments and check the level of measurement of each variable.
- Sample appropriately. Recruit a sample large enough to detect the expected effect and representative enough to generalise.
- Visualise first. Build a scatterplot and inspect for linearity, outliers and shape before computing anything.
- Select the right coefficient. Pearson for continuous, normal, linear data; Spearman for ordinal or non-normal data.
- Compute, test and report. Report r, its sign, the significance level and the sample size, and interpret strength using conventional bands.
- Interpret cautiously. Discuss possible confounders and the directionality problem, and avoid causal language.
The statistics that turn raw data into a defensible correlational finding, choosing the coefficient, checking assumptions, testing significance and reporting effect sizes, are where many students lose marks. If you want expert support with that analysis, our specialists can help.
Need help running and reporting your correlations?
Our statisticians choose the right coefficient, check every assumption and write up results you can defend.
Conclusion
Correlational research is a powerful, ethical and efficient way to discover how variables relate in the real world. Master three things and you will use it well: read the coefficient r correctly (sign for direction, absolute value for strength), always look at the scatterplot before trusting the number, and never let a correlation tempt you into a causal claim. Used with that discipline, correlational findings are a credible contribution in their own right and a sound foundation for the experimental work that follows.
Related methodology guides
- Cross-Sectional vs Longitudinal Studies