Every quantitative research project starts with a description. Before you test hypotheses, run regressions or interpret p-values, you need to understand your data: what it looks like, where it is centred, how spread out it is and whether it has any unusual features. That is what descriptive statistics are for.
This guide walks through the four families of descriptive statistics, measures of central tendency, measures of variability, frequency distributions and distribution shape, with five worked examples, formulas, a summary table and answers to the questions students ask most.
What Are Descriptive Statistics
Descriptive statistics summarise and describe the features of a dataset, its centre, spread, frequency and shape, without drawing conclusions about a wider population. Drawing population conclusions is the job of inferential statistics. In short, descriptive statistics give you, and your reader, a clear picture of exactly what the data in front of you contains.
The five most common examples you will calculate are: (1) a measure of central tendency such as the mean; (2) a measure of variability such as the standard deviation; (3) a frequency distribution; (4) a measure of distribution shape such as skewness; and (5) a percentage or proportion. The four families they belong to are summarised below.
Looking for statistical analysis help?
ResearchProspect to the rescue then!
Our expert statisticians help students summarise, analyse and report data across every discipline, with our statistical analysis service. Guaranteeing 100% satisfaction.
| Type | What It Describes | Examples |
|---|---|---|
| Central tendency | Where the data is centred (the typical value) | Mean, median, mode |
| Variability (spread) | How spread out the data is around the centre | Range, IQR, standard deviation, variance |
| Distribution shape | The pattern and symmetry of the distribution | Skewness, kurtosis, normality tests |
| Frequency | How many observations fall in each category | Frequency counts, percentages, cumulative % |
Measures Of Central Tendency
The three measures of central tendency are the mean, the median and the mode.
Mean
The arithmetic average: sum all values and divide by the number of observations. It uses every data point, which makes it precise for normally distributed data. Its weakness is that extreme values (outliers) pull it away from where most of the data sits.
Median
The middle value when all observations are sorted from lowest to highest. It is resistant to outliers, which makes it the preferred measure for skewed distributions. The Office for National Statistics (ONS) reports median earnings rather than mean earnings in its official UK labour-market statistics for precisely this reason.
Mode
The most frequently occurring value or category. It is the only appropriate measure of central tendency for nominal data and is useful whenever you want the most common response. A dataset can have two modes (bimodal) or more (multimodal), which is itself informative, it often signals two distinct subgroups in the data.
| Data Type | Best Central Tendency Measure | Why? |
|---|---|---|
| Nominal | Mode | Only measure that does not require ordering or equal intervals |
| Ordinal | Median | Resistant to unequal gaps between ranks |
| Interval (symmetric) | Mean | Uses all information; appropriate when intervals are equal |
| Interval / Ratio (skewed) | Median (report both) | Outliers distort the mean; median is more representative |
| Ratio (symmetric) | Mean | Uses all information; ratio statements add interpretive value |
Measures Of Variability
The centre alone tells you very little. A mean of 50 on a test where everyone scores between 48 and 52 tells a completely different story from a mean of 50 where scores range from 10 to 90. Measures of variability capture that difference.
Range
Maximum minus minimum. Simple, but easily distorted by a single extreme value, so it is rarely the only measure of spread you should report.
Interquartile Range (IQR)
The range of the middle 50% of the data, calculated as Q3 minus Q1. It is resistant to outliers, so report it alongside the median when data is skewed or non-normal.
Standard Deviation (SD)
The most widely used measure of spread for interval and ratio data. It is the average distance of data points from the mean. A small SD means tight clustering; a large SD means wide spread. Report it alongside the mean.
Variance
The square of the standard deviation. It is used in many statistical calculations (notably ANOVA) but is less intuitive to report on its own, because it is expressed in squared units.
| Measure | Appropriate For | Resistant to Outliers | Common Pairing |
|---|---|---|---|
| Range | Quick summary only | No | Often supplemented with IQR |
| IQR | Ordinal or skewed ratio data | Yes | Reported with median |
| Standard deviation | Interval / ratio data, normal distribution | No | Reported with mean |
| Variance | Statistical computations | No | Rarely reported standalone |
Distribution Shape: Skewness & Kurtosis
Beyond centre and spread, the shape of your distribution matters, particularly when deciding whether parametric tests are appropriate.
- Positive skew (right skew): a long tail to the right, so Mean > Median. Common in income, response times and NHS waiting times.
- Negative skew (left skew): a long tail to the left, so Median > Mean. Less common; can appear with test scores on an easy exam.
- Kurtosis: describes the ‘peakedness’ of a distribution and the weight in its tails. High kurtosis means heavier tails (more extreme values).
In SPSS, skewness and kurtosis are produced via Analyze > Descriptive Statistics > Explore. As a rough rule, skewness between −1 and +1 is generally considered acceptably normal for parametric tests, while values beyond ±2 are more concerning.
Descriptive vs Inferential Statistics
Descriptive and inferential statistics are the two branches of statistical analysis, and they answer different questions. Descriptive statistics summarise the data you actually collected; inferential statistics use that sample to draw probabilistic conclusions about a wider population. You almost always run descriptive statistics first, they give the context that makes inferential results interpretable.
Descriptive vs inferential statistics
- Summarise the data you have
- Mean, SD, charts
- Generalise to a population
- Estimation & hypothesis tests
| Descriptive Statistics | Inferential Statistics | |
|---|---|---|
| Goal | Summarise and describe the dataset | Generalise from a sample to a population |
| Question answered | “What does this data look like?” | “What is likely true beyond this data?” |
| Typical outputs | Mean, SD, frequency tables, charts | p-values, confidence intervals, regression |
| Uncertainty | None – describes data exactly | Built in – estimates carry error |
| Examples | Mean exam score of your class | t-test on whether two classes differ |
Reporting Descriptive Statistics In APA Format
APA 7th edition (used in most UK universities) has specific conventions:
- Mean and standard deviation: M = 42.3, SD = 8.1
- Median: Mdn = 38.0
- Range: Range = 12–68
- IQR: IQR = 22.5
- For frequency data: ‘Of the 120 participants, 48 (40%) were male, 66 (55%) were female, and 6 (5%) identified as non-binary.’
Always include a descriptive statistics table in your results section when you have multiple variables. It gives context for all the inferential tests that follow.
Frequently Asked Questions
The main purpose of descriptive statistics is to summarize and present data in a meaningful way. It helps in understanding the central tendency, dispersion, and shape of data distribution, making complex data sets more interpretable and providing insights for decision-making and analysis.
Always. Descriptive statistics, especially means, standard deviations, and sample sizes per group, are essential context for interpreting inferential results. Many journals require descriptive statistics to be reported alongside every inferential test.
Outliers primarily affect the mean and standard deviation, pulling them toward extreme values. The median and IQR are resistant. When you identify outliers, report both sets of statistics, note the discrepancy, and explain how you handled the outliers in your analysis.
It affects your test choices. Substantially skewed or non-normal data typically calls for non-parametric tests (Mann-Whitney U instead of t-test; Spearman’s instead of Pearson’s), or data transformation (log transform) before parametric tests. It does not mean your data is flawed, many real-world variables are naturally non-normal.