"> How to Transcribe an Interview: Tips & Methods - ResearchProspect
Home > Library > Research Methodology > How to Transcribe an Interview: Tips & Methods

Published by at August 16th, 2021 , Revised On June 17, 2026

Interview transcription is the process of converting a recorded research interview into an accurate written text that you can read, code and analyse. In qualitative research it is the bridge between fieldwork and findings: a clean, faithful transcript is what you actually analyse, so the quality of your themes can never exceed the quality of your transcription. Learning how to transcribe an interview well means choosing the right level of detail (verbatim, intelligent verbatim or edited), applying consistent notation, and handling consent and anonymisation properly before anyone else sees the data.

This guide walks you through what transcription is and why it matters, the three main transcription styles and when to use each, a step-by-step process, the notation conventions researchers use, manual versus automated (ASR) tools, realistic time estimates, the ethics and GDPR considerations, and how a finished transcript feeds straight into thematic analysis.

What interview transcription is and why it matters

Transcription is the systematic conversion of spoken audio (or video) into written text. In a dissertation that uses interviews or focus groups, transcription is not clerical busywork — it is an analytical act. As Braun and Clarke (2006) note in their account of thematic analysis, the researcher who transcribes their own data begins to know it intimately, and that early familiarisation is the first phase of analysis proper. Every decision you make — whether to capture a stammer, a three-second pause, or a nervous laugh — shapes what the text can later tell you.

Why does it matter so much? Because qualitative analysis is grounded in the participant’s own words. A misheard phrase, a dropped negative (“I don’t agree” becoming “I do agree”), or a smoothed-over hesitation can quietly distort meaning. A transcript is also a key part of your audit trail: it supports the dependability and confirmability of your study, lets a supervisor or examiner check your interpretations against the source, and underpins the wider concern with reliability and validity in qualitative work. Transcription belongs to the broader family of methods of data collection and analysis that turn raw fieldwork into defensible findings.

“Transcription, while it may seem time-consuming, frustrating, and at times boring, can be an excellent way to start familiarising yourself with the data.” (Source: Braun & Clarke, 2006)

The three main types of transcription

There is no single “correct” transcript. The right level of detail depends on your research question and your method of analysis. Conversation analysis needs every micro-pause and intake of breath; a study of policy attitudes usually does not. The three styles below sit on a spectrum from most faithful to most readable.

1. Verbatim (full / true verbatim)

Captures everything: every “um”, “er”, false start, repetition, filler word, stutter, and often non-verbal sounds (laughter, sighs) and timed pauses. It is the most faithful record and the most time-consuming to produce.

2. Intelligent verbatim (clean verbatim)

Keeps the speaker’s words and meaning intact but removes non-essential fillers and false starts, lightly tidying grammar without changing sense. This is the default for most dissertations using thematic or content analysis, because it is readable yet faithful to what was said.

3. Edited (clean / edited) transcript

Goes further: it removes irrelevant tangents, corrects grammar fully, and may reorganise for clarity. Useful for member-checking, quotation in a published article, or when only the substantive content matters. Use it carefully — over-editing can strip away the texture analysts rely on.

Transcription type What it captures When to use it Trade-off
Verbatim Every word, filler, false start, pause, laughter and non-verbal sound Conversation analysis, discourse analysis, sociolinguistics, where how something is said matters Most accurate; slowest and hardest to read
Intelligent verbatim All meaningful speech; fillers and false starts removed; light tidying Most dissertations using thematic or content analysis Best balance of fidelity and readability
Edited / clean Substantive content only; grammar corrected; tangents removed Quotations for publication, member-checking, executive summaries Most readable; risks losing analytic nuance

The interview transcription process, step by step

Treat transcription as a defined workflow rather than something you improvise at midnight before a deadline. The figure below shows the end-to-end pipeline, from recording through to analysis.

Recordclear audioPreparetools & templateTranscribeverbatimintelligent verbatimedited / cleanAnonymisepseudonymsCheckproof vs audioAnalysecode & theme
Figure 1: The interview transcription workflow — record, prepare, transcribe (choosing a style), anonymise, check, then analyse.
  1. Record well. Good transcription starts with good audio. Use an external microphone where possible, sit in a quiet room, test levels, and record to two devices if the interview is important. Poor audio multiplies transcription time.
  2. Prepare your tools and template. Choose your software and a foot pedal or keyboard shortcuts for play/pause/rewind. Set up a header (date, participant pseudonym, interview number, duration) and decide your style and conventions before you start.
  3. Transcribe in passes. Do a first pass to get the words down, then a second pass to insert speaker labels, pauses and non-verbal notes, slowing the audio to around 50–75% speed.
  4. Anonymise as you go. Replace names of people, employers, schools and places with pseudonyms or bracketed labels (e.g. [participant’s manager]) the moment you type them.
  5. Check against the audio. Listen through once more while reading the transcript to catch mishearings, especially negatives, numbers and technical terms.
  6. Format for analysis. Add line numbers, consistent speaker labels (e.g. I: for interviewer, P3: for participant 3), and save in a format your coding software accepts.

Transcription conventions and notation

Consistency is everything. Agree a notation key at the start and apply it to every transcript so your data set is comparable. A common, lightweight convention set looks like this:

  • Speaker labels: I: for interviewer, P: or P3: for participants.
  • Short pause: (.) for a micro-pause; timed pause: (2s) or (pause) for longer silences.
  • Inaudible speech: [inaudible] or [inaudible 00:14:32] with a timestamp.
  • Uncertain hearing: [unclear: word?] when you have a best guess.
  • Overlapping speech: square brackets aligned across lines, or // to mark where the next speaker breaks in.
  • Non-verbal cues: (laughs), (sighs), (long exhale), [phone rings] in round or square brackets.
  • Emphasis: italics or underlining for a stressed word; CAPITALS for markedly raised volume.
  • Redactions: [name], [employer], [city] for anonymised identifiers.
Example: A short verbatim excerpt with conventions applied, from a study on remote-working wellbeing (business/psychology). The same exchange in intelligent verbatim is shown beneath so you can see the difference.

Verbatim:
I: So how did the move to working from home affect you?
P3: Um, honestly it was… (.) it was a lot. Like at first I thought, oh great, no commute, but then… (2s) yeah, the lines just sort of blurred? [laughs] I’d be answering emails at, at like ten at night for [employer].
I: Ten at night [right]
P3:           [yeah] every night, basically. It was [inaudible 00:11:04] for months.

Intelligent verbatim (same passage):
I: So how did the move to working from home affect you?
P3: Honestly, it was a lot. At first I thought, great, no commute — but then the lines just blurred. I’d be answering emails at ten at night for [employer]. Every night, basically. It was like that for months.

Tools: manual transcription versus ASR

You can transcribe by hand or lean on automatic speech recognition (ASR). Most researchers now use a hybrid approach: let software produce a rough draft, then correct it against the audio.

  • Manual tools. Software such as oTranscribe, Express Scribe or your word processor with hotkeys lets you slow playback and type. Slow, but you control every word and you familiarise yourself deeply with the data.
  • ASR / AI tools. Otter.ai, Microsoft Word’s transcribe feature, Descript, Trint, and open-source Whisper generate a draft transcript in minutes and often add speaker diarisation (who spoke when).

ASR accuracy has improved dramatically, but it is not analysis-ready out of the box. Accuracy drops with strong accents, overlapping speech, technical jargon, background noise and poor recordings — and the errors it makes (confident, plausible mishearings) are exactly the kind that distort meaning. Treat any machine draft as a first draft only:

  • Always proofread against the original audio — never analyse a raw ASR output.
  • Check numbers, names, negatives and discipline-specific terms with special care.
  • Add the human layer ASR misses: pauses, emphasis, non-verbal cues and accurate speaker turns.
  • Mind data protection: uploading identifiable audio to a cloud transcription service can breach your ethics approval and GDPR if participants did not consent to third-party processing. Whisper can run locally (offline), which is often the safer choice for sensitive data.

How long does transcription take?

Plan your timetable realistically. As a rule of thumb, manual transcription takes around four to six hours per hour of clear audio for an intelligent-verbatim transcript; full verbatim with detailed notation can run to eight to ten times the audio length. ASR-assisted workflows cut this, but you should still budget meaningful correction time.

Example: Worked time estimate for a small qualitative study (education discipline).

You have 8 interviews, each averaging 45 minutes of audio, and you want intelligent-verbatim transcripts.

Step 1 — total audio. 8 × 45 min = 360 min = 6 hours of audio.
Step 2 — apply the ratio (manual). At 4×: 6 × 4 = 24 hours. At 6×: 6 × 6 = 36 hours. So plan for roughly 24–36 hours of manual transcription work.
Step 3 — the ASR-assisted route. Suppose ASR produces a draft in near-real-time and you spend about 2× the audio length correcting it: 6 × 2 = 12 hours of correction. Add ~1 hour for uploading, exporting and formatting → about 13 hours total.
Step 4 — convert to working days. At 5 productive transcription hours per day, the manual route is 24–36 ÷ 5 ≈ 5–7 days; the ASR-assisted route is 13 ÷ 5 ≈ 2.6 days.

Takeaway: even the “fast” hybrid route costs you over two full working days for just six hours of audio — so build transcription explicitly into your dissertation Gantt chart rather than treating it as an afterthought.

Anonymisation, ethics and GDPR

Interview data is personal data, and audio of someone’s voice is itself identifiable. In the UK that brings transcription squarely under the UK GDPR and the Data Protection Act 2018, so handle it accordingly:

  • Consent. Your participant information sheet and consent form should state that interviews will be recorded, transcribed, and how the data will be stored, who will see it, and when recordings will be destroyed.
  • Anonymise / pseudonymise. Remove or replace direct identifiers (names, employers, places, unusual roles) in the transcript. Keep the key linking pseudonyms to identities in a separate, secure file.
  • Secure storage. Store recordings and transcripts on encrypted, access-controlled university systems — not on a personal laptop desktop or an unapproved cloud drive.
  • Third-party processing. If you use a commercial transcription service or cloud ASR, you are sharing personal data with a processor; this usually needs explicit consent and a data-processing agreement, and may breach ethics approval if not declared.
  • Data minimisation and retention. Delete audio once the verified transcript exists if your ethics protocol allows, and keep data only for the period your ethics approval specifies.

From transcript to analysis

A transcript is a means, not an end. Once you have accurate, anonymised, line-numbered transcripts, you move into analysis. For most interview-based dissertations that means thematic analysis, following Braun and Clarke’s six phases: (1) familiarising yourself with the data — which your transcribing has already begun; (2) generating initial codes; (3) searching for themes; (4) reviewing themes; (5) defining and naming themes; and (6) producing the report. Your line numbers and consistent speaker labels make it easy to tag extracts and trace each theme back to its source.

The level of transcription detail you chose now pays off or constrains you. Conversation- or discourse-oriented projects need the fine-grained verbatim notation; theme-focused projects work happily from intelligent verbatim. If your interviews were paired with structured instruments, see how transcripts complement a qualitative research questionnaire and how the same corpus can support a more frequency-oriented content analysis alongside thematic interpretation.

Common mistakes to avoid

  • Analysing a raw ASR transcript without proofreading it against the audio.
  • Changing transcription style or notation between interviews, making the data set inconsistent.
  • Smoothing speech so heavily that meaning, emphasis or hesitation is lost.
  • Leaving real names or employers in the transcript and only anonymising later (or never).
  • Uploading sensitive audio to a cloud service participants never consented to.
  • Underestimating the time, then rushing — which is where mishearings creep in.
  • Forgetting line numbers and speaker labels, making later coding and quoting painful.

Doing it well: a quick checklist

  • Record clean audio with a backup device.
  • Decide style and notation before you start, and document the key.
  • Transcribe in passes; slow the playback.
  • Use ASR for a draft, then correct every line against the audio.
  • Anonymise as you type and store data securely.
  • Add line numbers and consistent speaker labels.
  • Build realistic transcription time into your project plan.

Turn your interviews into a winning dissertation

From transcription and thematic coding to a fully written methodology and findings chapter, our UK academics can help you analyse and present your qualitative data with confidence.

Related methodology guides

  • Interviews in Research
  • Qualitative Data Analysis

Frequently Asked Questions

What is the best transcription style for a dissertation?

For most interview-based dissertations, intelligent verbatim is the best default. It keeps the participant’s words and meaning intact while removing distracting fillers and false starts, giving you a transcript that is both faithful and readable for thematic or content analysis. Choose full verbatim only if your method (e.g. conversation or discourse analysis) depends on capturing every pause, stammer and overlap.

Plan for roughly four to six hours of work per hour of clear audio for an intelligent-verbatim transcript done manually. Full verbatim with detailed notation can take eight to ten times the audio length. An ASR-assisted workflow is faster — often around two to three times the audio length once you include correcting the draft against the recording — but you should still budget proper proofreading time.

Yes, ASR tools such as Otter.ai, Word’s transcribe feature and open-source Whisper are widely used to generate a first draft quickly. However, you must always proofread the output against the original audio, because ASR mishears accents, overlapping speech, numbers and technical terms. For sensitive data, prefer a tool that runs locally (Whisper can) so you are not uploading identifiable recordings to a cloud service without consent.

Replace direct identifiers — names of people, employers, schools, towns and unusual roles — with pseudonyms or bracketed labels such as [participant’s manager] or [city] as you transcribe. Keep the key that links pseudonyms to real identities in a separate, secure, access-controlled file, and store recordings and transcripts on encrypted university systems. This protects participants and helps you comply with UK GDPR.

These are notation conventions. (.) marks a very short pause and (2s) or (pause) a longer timed silence; [inaudible] (often with a timestamp) marks speech you cannot make out; [unclear: word?] flags a best guess; round brackets like (laughs) or (sighs) capture non-verbal cues; and italics or capitals mark emphasis or raised volume. Agree your key before you start and apply it consistently across every transcript.

An accurate, anonymised, line-numbered transcript is the raw material for analysis. Transcribing it yourself begins phase one of Braun and Clarke’s thematic analysis — familiarising yourself with the data. You then generate initial codes, search for and review themes, define and name them, and write up, using your line numbers and speaker labels to tag extracts and trace each theme back to its source in the data.

About Aadam Mae

Avatar for Aadam MaeAadam Mae, an academic researcher and author with a PhD in NLP (Natural Language Processing) at ResearchProspect. Mae's work delves into the intricacies of language and technology, delivering profound insights in concise prose. Pioneering the future of communication through scholarship.

WhatsApp Live Chat