"> Content Analysis: Methods, Types & Examples - ResearchProspect
Home > Library > Research Methodology > Content Analysis: Methods, Types & Examples

Published by at August 16th, 2021 , Revised On June 17, 2026

Content analysis is a research technique for systematically coding and categorising the content of texts, images, audio or other communication in order to draw replicable, valid inferences about their meaning, source or effect. First codified by Bernard Berelson and later refined by Klaus Krippendorff, it turns unstructured material such as interview transcripts, newspaper articles, policy documents, social-media posts or advertisements into structured data that can be counted, compared and interpreted. You use it whenever you need to analyse existing communication objectively and systematically rather than rely on impressionistic reading.

This guide explains what content analysis is, the difference between quantitative and qualitative content analysis, conceptual versus relational and manifest versus latent content, the full step-by-step procedure, how to check inter-coder reliability, and how it differs from thematic analysis. A fully worked example with a frequency table and an agreement calculation is included.

What is content analysis?

Content analysis is a method for the objective, systematic and quantitative (or qualitative) description of the manifest and latent content of communication. The classic definition comes from Bernard Berelson, who described it in 1952 as a technique for the “objective, systematic and quantitative description of the manifest content of communication.” Klaus Krippendorff later broadened this, defining content analysis as “a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use.” The two definitions capture the method’s evolution: from a counting exercise focused on surface features to an inferential method concerned with meaning in context.

The unit of analysis can be almost any recorded communication: open-ended survey responses, semi-structured interview transcripts, focus-group recordings, newspaper coverage, television advertisements, company annual reports, parliamentary debates, tweets, online reviews, curriculum documents or historical letters. What unites these applications is a disciplined coding procedure that any other researcher, given the same texts and the same coding rules, could reproduce. That reproducibility is what separates content analysis from a casual literature read or an opinion piece.

“Content analysis is a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use.” (Source: Krippendorff, 2018)

When should you use content analysis?

Content analysis is appropriate when your data already exist as communication and your research question concerns what is said, how often, by whom, and with what emphasis. It is especially well suited to:

  • Analysing large volumes of textual or media material that would be impractical to read interpretively in full.
  • Tracking how the framing of a topic changes over time (for example, how press coverage of climate change has shifted across decades).
  • Comparing communication across sources, outlets, organisations or countries on a common set of categories.
  • Quantifying qualitative material so it can be summarised statistically or correlated with other variables.
  • Studying messages unobtrusively, without the reactivity that comes from interviewing or observing people directly.

Because it usually works with material that already exists, content analysis is a form of secondary or non-reactive data and is often more affordable and replicable than primary fieldwork. It is less suited to questions about why people behave as they do or to phenomena that leave no documentary trace.

Quantitative vs qualitative content analysis

Content analysis is not a single procedure but a family of approaches sitting on a spectrum from highly quantitative to interpretive. The quantitative tradition counts the frequency of pre-defined categories and reports numbers, proportions and statistical relationships. The qualitative tradition, sometimes called qualitative content analysis or, in Hsieh and Shannon’s typology, conventional, directed or summative content analysis, focuses on meaning, context and the development of categories from the data themselves. Most real dissertations blend the two: categories may be counted, but their interpretation is informed by close reading.

Dimension Quantitative content analysis Qualitative content analysis
Goal Measure frequency and pattern; test hypotheses Interpret meaning; build understanding in context
Coding scheme Pre-defined, fixed categories (deductive) Often emergent, refined during analysis (inductive)
Output Counts, percentages, statistical tests Themes, categories, illustrative quotations
Sample size Large; statistical generalisation possible Smaller; analytical depth prioritised
Reliability focus Inter-coder reliability is central Trustworthiness, audit trail, credibility
Typical question How often does X appear, and does it differ by group? How is X represented, and what does it mean?

For a fuller treatment of this divide, see our guide on quantitative vs qualitative research.

Conceptual vs relational content analysis

Within content analysis, a second distinction concerns what you actually examine. Conceptual content analysis (sometimes called thematic content analysis) establishes the existence and frequency of concepts in a text. You decide which concepts to count, define the rules for recognising them, and tally how often each appears. The question is simply: how present is this idea?

Relational content analysis goes a step further. Rather than counting concepts in isolation, it examines the relationships among them, for example whether “immigration” tends to co-occur with “economy” or with “security,” and with what valence. Techniques here include proximity analysis, semantic network analysis and sentiment-laden co-occurrence mapping. Relational analysis is more demanding but reveals how meanings are connected, not merely how often a word appears.

Manifest vs latent content

A third foundational distinction is between manifest and latent content, and it shapes how much inference you allow your coders to make.

  • Manifest content is the visible, surface meaning — the words actually present, the number of times a brand is named, whether a photograph shows a man or a woman. It is easy to code reliably because there is little room for interpretation.
  • Latent content is the underlying meaning — the tone, the implied attitude, the ideology beneath the words. Coding latent content (for instance, judging whether an article is “sympathetic” or “hostile” toward a policy) yields richer findings but is harder to do reliably, because two coders may read the same passage differently.

Strong content-analysis designs make a deliberate choice here and write coding rules detailed enough that even latent judgements can be applied consistently. The more interpretive the content, the more your reliability and validity safeguards matter.

DefinequestionSelect &sample textsUnits &codingschemeCode thematerialAnalyseresultsReport &interpretrefine scheme after pilot & reliability checkA pilot-test loop feeds back into the coding scheme before full coding begins.
Figure 1: The content analysis process, with a pilot-and-reliability feedback loop.

The steps of content analysis

Whether your design leans quantitative or qualitative, content analysis follows a recognisable sequence. The numbered procedure below is the spine of any defensible study.

  1. Define the research question and concepts. Decide exactly what you want to infer and which concepts you must measure. A vague question (“how is mental health portrayed?”) must be sharpened into measurable constructs (stigmatising language, help-seeking cues, named conditions).
  2. Select and sample the texts. Define the population of communication (e.g., all UK broadsheet articles mentioning “apprenticeships” in 2024) and draw a sample using a transparent sampling strategy — random, stratified or a defined census — so the corpus is justifiable.
  3. Define the unit of analysis and the coding scheme. Choose your recording unit (word, sentence, paragraph, whole article, image) and build a codebook: a list of categories, mutually exclusive and exhaustive where possible, each with a clear definition, decision rules and examples.
  4. Pilot and code the material. Test the codebook on a small subset, refine ambiguous rules, then code the full corpus. Latent-content coding in particular benefits from a pilot round.
  5. Check inter-coder reliability. Have two or more coders independently code an overlapping subset and compute an agreement statistic before you trust the full dataset.
  6. Analyse the coded data. Produce frequency tables, cross-tabulations, proportions and, where appropriate, statistical tests or relational/network analysis.
  7. Interpret and report. Translate the numbers back into answers to your research question, acknowledge limitations, and present the codebook and reliability figures so the study is transparent and replicable.

Inter-coder reliability

Because content analysis claims to be replicable, you must demonstrate that the coding is not idiosyncratic to one person. Inter-coder (or intercoder) reliability measures the extent to which independent coders, applying the same scheme to the same material, reach the same decisions. The simplest index is percentage agreement — the proportion of coding decisions on which coders agree. It is intuitive but flawed, because some agreement will occur by chance, especially when one category dominates.

For that reason, methodologists prefer chance-corrected statistics: Cohen’s kappa for two coders on nominal categories, Scott’s pi, and Krippendorff’s alpha, which generalises across any number of coders, any level of measurement and missing data. As a rough convention, kappa or alpha values above 0.80 are considered good and above 0.67 acceptable for tentative conclusions, though thresholds depend on the difficulty of the content. Always report which statistic you used and on how much of the data.

Example: A health-communication student analyses 6 NHS social-media posts about vaccination, coding each sentence into one of three categories — Information, Reassurance or Call-to-action. After coding 40 sentences across the posts, she builds a frequency table:

Category Count Percentage
Information 22 22 ÷ 40 × 100 = 55.0%
Reassurance 12 12 ÷ 40 × 100 = 30.0%
Call-to-action 6 6 ÷ 40 × 100 = 15.0%
Total 40 100.0%

Interpretation: NHS posts are dominated by information (55%), with reassurance secondary and explicit calls-to-action least frequent — suggesting the campaign prioritises informing over mobilising.

Inter-coder agreement (% agreement): A second coder independently codes the same 40 sentences. The two coders assign the same category on 34 sentences and disagree on 6. Percentage agreement is:
Po = (number of agreements ÷ total units) × 100 = (34 ÷ 40) × 100 = 85.0%.

Because chance agreement matters, she also estimates expected agreement from the category proportions: Pe = (0.552) + (0.302) + (0.152) = 0.3025 + 0.09 + 0.0225 = 0.415. Cohen’s kappa is then κ = (Po − Pe) ÷ (1 − Pe) = (0.85 − 0.415) ÷ (1 − 0.415) = 0.435 ÷ 0.585 = 0.74. A kappa of 0.74 indicates substantial, acceptable agreement — comfortably above the 0.67 tentative-conclusion threshold — so the coding scheme is reliable enough to proceed.

Content analysis vs thematic analysis

Students frequently confuse content analysis with thematic analysis, and the two do overlap, especially in their qualitative forms. The key differences are summarised below.

Feature Content analysis Thematic analysis
Primary aim Systematic, often countable description of content Identifying, interpreting patterns of meaning (themes)
Quantification Frequencies and percentages are common and central Usually non-numeric; frequency is not the point
Coding Codebook with mutually exclusive categories Flexible codes that cluster into themes
Reliability check Inter-coder reliability (kappa, alpha) expected Reflexive credibility; coding reliability optional
Key authority Berelson; Krippendorff Braun & Clarke (2006)
Epistemology Often (post)positivist; can be qualitative Flexible, frequently constructionist

In short, content analysis asks “what is there and how much,” while thematic analysis asks “what patterns of meaning run through the data.” Related interpretive approaches you may consider include textual analysis and discourse analysis, which place even greater emphasis on language, structure and power.

Strengths and limitations

Content analysis is popular because it is transparent and flexible, but it is not a universal tool.

Strengths:

  • Systematic and replicable: clear coding rules let others reproduce the study.
  • Unobtrusive: it studies communication without disturbing the people who produced it.
  • Scalable: with sampling and software, very large corpora can be handled.
  • Flexible across media: text, images, audio and video can all be coded.
  • Combines well with other methods, both quantitative and qualitative.

Limitations:

  • Coding latent meaning is difficult and threatens reliability if rules are loose.
  • Stripping content from its production context can reduce validity.
  • It describes what is communicated, not why, or how audiences actually receive it.
  • Findings depend heavily on the quality of the codebook and the sampling frame.
  • It can become mechanically descriptive if interpretation is neglected.

Common mistakes to avoid

  • Building categories that overlap or are not exhaustive, so units cannot be coded cleanly.
  • Skipping the pilot and reliability check, then discovering coders interpreted categories differently.
  • Choosing a convenience corpus and over-claiming generalisability beyond it.
  • Reporting raw counts without percentages or context, making patterns hard to read.
  • Confusing frequency with importance — the most-mentioned theme is not always the most significant.
  • Failing to publish the codebook, which makes the study impossible to replicate.

Struggling to design or write up your content analysis?

Our subject experts can help you build a watertight codebook, run reliability checks and present your findings in a distinction-grade methodology chapter.

How to do content analysis well

The best content-analysis studies share a few habits. They start from a precise question and well-defined constructs rather than coding first and theorising later. They sample transparently, so the corpus is defensible. They invest in the codebook, writing definitions and decision rules detailed enough that a second coder reaches the same judgements. They pilot, refine, and report inter-coder reliability with a chance-corrected statistic. Finally, they move beyond description: the numbers are a means to answer a question, not the answer itself. Pair systematic coding with thoughtful interpretation and an honest account of limitations, and content analysis becomes one of the most rigorous and versatile tools in the qualitative-quantitative researcher’s kit.

Related methodology guides

  • Qualitative Data Analysis
  • Narrative Analysis

Frequently Asked Questions

What is content analysis in simple terms?

Content analysis is a research method for systematically coding the content of texts, images or other communication into categories so you can count, compare and interpret what is being communicated. It turns unstructured material such as interview transcripts, news articles or social-media posts into structured data, allowing replicable and valid inferences about meaning, emphasis or source.

Quantitative content analysis counts the frequency of pre-defined categories and reports numbers, percentages and statistical relationships, with inter-coder reliability at its core. Qualitative content analysis focuses on meaning and context, often develops categories inductively from the data, and prioritises depth and trustworthiness over counts. Many dissertations combine both, counting categories while interpreting them through close reading.

Manifest content is the visible, surface meaning of communication, such as the actual words used or the number of times a brand is named; it is easy to code reliably. Latent content is the underlying meaning, tone or implied attitude beneath the words; it yields richer findings but is harder to code consistently, so it requires detailed coding rules and careful reliability checks.

Have two or more coders independently code an overlapping subset of the material, then compute an agreement statistic. Percentage agreement is the simplest (agreements divided by total units), but chance-corrected measures such as Cohen’s kappa, Scott’s pi or Krippendorff’s alpha are preferred. Values above 0.80 are good and above 0.67 acceptable for tentative conclusions, though the threshold depends on how interpretive the content is.

Content analysis aims at systematic, often countable description, asking what is present in the data and how much, using a codebook of mutually exclusive categories and reporting inter-coder reliability. Thematic analysis, associated with Braun and Clarke (2006), aims at identifying and interpreting patterns of meaning (themes), is usually non-numeric, and relies on reflexive credibility rather than coding-reliability statistics.

First define your research question and key concepts, then select and sample the texts using a transparent strategy. Next define the unit of analysis and build a codebook of clear categories. Pilot the codebook, code the full corpus, and check inter-coder reliability. Finally analyse the coded data into frequencies and cross-tabulations, then interpret and report the findings alongside the codebook and reliability figures.

About Aadam Mae

Avatar for Aadam MaeAadam Mae, an academic researcher and author with a PhD in NLP (Natural Language Processing) at ResearchProspect. Mae's work delves into the intricacies of language and technology, delivering profound insights in concise prose. Pioneering the future of communication through scholarship.

WhatsApp Live Chat