Interview transcription is the process of converting a recorded research interview into an accurate written text that you can read, code and analyse. In qualitative research it is the bridge between fieldwork and findings: a clean, faithful transcript is what you actually analyse, so the quality of your themes can never exceed the quality of your transcription. Learning how to transcribe an interview well means choosing the right level of detail (verbatim, intelligent verbatim or edited), applying consistent notation, and handling consent and anonymisation properly before anyone else sees the data.
This guide walks you through what transcription is and why it matters, the three main transcription styles and when to use each, a step-by-step process, the notation conventions researchers use, manual versus automated (ASR) tools, realistic time estimates, the ethics and GDPR considerations, and how a finished transcript feeds straight into thematic analysis.
What interview transcription is and why it matters
Transcription is the systematic conversion of spoken audio (or video) into written text. In a dissertation that uses interviews or focus groups, transcription is not clerical busywork — it is an analytical act. As Braun and Clarke (2006) note in their account of thematic analysis, the researcher who transcribes their own data begins to know it intimately, and that early familiarisation is the first phase of analysis proper. Every decision you make — whether to capture a stammer, a three-second pause, or a nervous laugh — shapes what the text can later tell you.
Why does it matter so much? Because qualitative analysis is grounded in the participant’s own words. A misheard phrase, a dropped negative (“I don’t agree” becoming “I do agree”), or a smoothed-over hesitation can quietly distort meaning. A transcript is also a key part of your audit trail: it supports the dependability and confirmability of your study, lets a supervisor or examiner check your interpretations against the source, and underpins the wider concern with reliability and validity in qualitative work. Transcription belongs to the broader family of methods of data collection and analysis that turn raw fieldwork into defensible findings.
“Transcription, while it may seem time-consuming, frustrating, and at times boring, can be an excellent way to start familiarising yourself with the data.” (Source: Braun & Clarke, 2006)
The three main types of transcription
There is no single “correct” transcript. The right level of detail depends on your research question and your method of analysis. Conversation analysis needs every micro-pause and intake of breath; a study of policy attitudes usually does not. The three styles below sit on a spectrum from most faithful to most readable.
1. Verbatim (full / true verbatim)
Captures everything: every “um”, “er”, false start, repetition, filler word, stutter, and often non-verbal sounds (laughter, sighs) and timed pauses. It is the most faithful record and the most time-consuming to produce.
2. Intelligent verbatim (clean verbatim)
Keeps the speaker’s words and meaning intact but removes non-essential fillers and false starts, lightly tidying grammar without changing sense. This is the default for most dissertations using thematic or content analysis, because it is readable yet faithful to what was said.
3. Edited (clean / edited) transcript
Goes further: it removes irrelevant tangents, corrects grammar fully, and may reorganise for clarity. Useful for member-checking, quotation in a published article, or when only the substantive content matters. Use it carefully — over-editing can strip away the texture analysts rely on.
| Transcription type | What it captures | When to use it | Trade-off |
|---|---|---|---|
| Verbatim | Every word, filler, false start, pause, laughter and non-verbal sound | Conversation analysis, discourse analysis, sociolinguistics, where how something is said matters | Most accurate; slowest and hardest to read |
| Intelligent verbatim | All meaningful speech; fillers and false starts removed; light tidying | Most dissertations using thematic or content analysis | Best balance of fidelity and readability |
| Edited / clean | Substantive content only; grammar corrected; tangents removed | Quotations for publication, member-checking, executive summaries | Most readable; risks losing analytic nuance |
The interview transcription process, step by step
Treat transcription as a defined workflow rather than something you improvise at midnight before a deadline. The figure below shows the end-to-end pipeline, from recording through to analysis.
- Record well. Good transcription starts with good audio. Use an external microphone where possible, sit in a quiet room, test levels, and record to two devices if the interview is important. Poor audio multiplies transcription time.
- Prepare your tools and template. Choose your software and a foot pedal or keyboard shortcuts for play/pause/rewind. Set up a header (date, participant pseudonym, interview number, duration) and decide your style and conventions before you start.
- Transcribe in passes. Do a first pass to get the words down, then a second pass to insert speaker labels, pauses and non-verbal notes, slowing the audio to around 50–75% speed.
- Anonymise as you go. Replace names of people, employers, schools and places with pseudonyms or bracketed labels (e.g. [participant’s manager]) the moment you type them.
- Check against the audio. Listen through once more while reading the transcript to catch mishearings, especially negatives, numbers and technical terms.
- Format for analysis. Add line numbers, consistent speaker labels (e.g. I: for interviewer, P3: for participant 3), and save in a format your coding software accepts.
Transcription conventions and notation
Consistency is everything. Agree a notation key at the start and apply it to every transcript so your data set is comparable. A common, lightweight convention set looks like this:
- Speaker labels:
I:for interviewer,P:orP3:for participants. - Short pause: (.) for a micro-pause; timed pause: (2s) or (pause) for longer silences.
- Inaudible speech: [inaudible] or [inaudible 00:14:32] with a timestamp.
- Uncertain hearing: [unclear: word?] when you have a best guess.
- Overlapping speech: square brackets aligned across lines, or // to mark where the next speaker breaks in.
- Non-verbal cues: (laughs), (sighs), (long exhale), [phone rings] in round or square brackets.
- Emphasis: italics or underlining for a stressed word; CAPITALS for markedly raised volume.
- Redactions: [name], [employer], [city] for anonymised identifiers.
Verbatim:
I: So how did the move to working from home affect you?
P3: Um, honestly it was… (.) it was a lot. Like at first I thought, oh great, no commute, but then… (2s) yeah, the lines just sort of blurred? [laughs] I’d be answering emails at, at like ten at night for [employer].
I: Ten at night [right]
P3: [yeah] every night, basically. It was [inaudible 00:11:04] for months.
Intelligent verbatim (same passage):
I: So how did the move to working from home affect you?
P3: Honestly, it was a lot. At first I thought, great, no commute — but then the lines just blurred. I’d be answering emails at ten at night for [employer]. Every night, basically. It was like that for months.
Tools: manual transcription versus ASR
You can transcribe by hand or lean on automatic speech recognition (ASR). Most researchers now use a hybrid approach: let software produce a rough draft, then correct it against the audio.
- Manual tools. Software such as oTranscribe, Express Scribe or your word processor with hotkeys lets you slow playback and type. Slow, but you control every word and you familiarise yourself deeply with the data.
- ASR / AI tools. Otter.ai, Microsoft Word’s transcribe feature, Descript, Trint, and open-source Whisper generate a draft transcript in minutes and often add speaker diarisation (who spoke when).
ASR accuracy has improved dramatically, but it is not analysis-ready out of the box. Accuracy drops with strong accents, overlapping speech, technical jargon, background noise and poor recordings — and the errors it makes (confident, plausible mishearings) are exactly the kind that distort meaning. Treat any machine draft as a first draft only:
- Always proofread against the original audio — never analyse a raw ASR output.
- Check numbers, names, negatives and discipline-specific terms with special care.
- Add the human layer ASR misses: pauses, emphasis, non-verbal cues and accurate speaker turns.
- Mind data protection: uploading identifiable audio to a cloud transcription service can breach your ethics approval and GDPR if participants did not consent to third-party processing. Whisper can run locally (offline), which is often the safer choice for sensitive data.
How long does transcription take?
Plan your timetable realistically. As a rule of thumb, manual transcription takes around four to six hours per hour of clear audio for an intelligent-verbatim transcript; full verbatim with detailed notation can run to eight to ten times the audio length. ASR-assisted workflows cut this, but you should still budget meaningful correction time.
You have 8 interviews, each averaging 45 minutes of audio, and you want intelligent-verbatim transcripts.
Step 1 — total audio. 8 × 45 min = 360 min = 6 hours of audio.
Step 2 — apply the ratio (manual). At 4×: 6 × 4 = 24 hours. At 6×: 6 × 6 = 36 hours. So plan for roughly 24–36 hours of manual transcription work.
Step 3 — the ASR-assisted route. Suppose ASR produces a draft in near-real-time and you spend about 2× the audio length correcting it: 6 × 2 = 12 hours of correction. Add ~1 hour for uploading, exporting and formatting → about 13 hours total.
Step 4 — convert to working days. At 5 productive transcription hours per day, the manual route is 24–36 ÷ 5 ≈ 5–7 days; the ASR-assisted route is 13 ÷ 5 ≈ 2.6 days.
Takeaway: even the “fast” hybrid route costs you over two full working days for just six hours of audio — so build transcription explicitly into your dissertation Gantt chart rather than treating it as an afterthought.
Anonymisation, ethics and GDPR
Interview data is personal data, and audio of someone’s voice is itself identifiable. In the UK that brings transcription squarely under the UK GDPR and the Data Protection Act 2018, so handle it accordingly:
- Consent. Your participant information sheet and consent form should state that interviews will be recorded, transcribed, and how the data will be stored, who will see it, and when recordings will be destroyed.
- Anonymise / pseudonymise. Remove or replace direct identifiers (names, employers, places, unusual roles) in the transcript. Keep the key linking pseudonyms to identities in a separate, secure file.
- Secure storage. Store recordings and transcripts on encrypted, access-controlled university systems — not on a personal laptop desktop or an unapproved cloud drive.
- Third-party processing. If you use a commercial transcription service or cloud ASR, you are sharing personal data with a processor; this usually needs explicit consent and a data-processing agreement, and may breach ethics approval if not declared.
- Data minimisation and retention. Delete audio once the verified transcript exists if your ethics protocol allows, and keep data only for the period your ethics approval specifies.
From transcript to analysis
A transcript is a means, not an end. Once you have accurate, anonymised, line-numbered transcripts, you move into analysis. For most interview-based dissertations that means thematic analysis, following Braun and Clarke’s six phases: (1) familiarising yourself with the data — which your transcribing has already begun; (2) generating initial codes; (3) searching for themes; (4) reviewing themes; (5) defining and naming themes; and (6) producing the report. Your line numbers and consistent speaker labels make it easy to tag extracts and trace each theme back to its source.
The level of transcription detail you chose now pays off or constrains you. Conversation- or discourse-oriented projects need the fine-grained verbatim notation; theme-focused projects work happily from intelligent verbatim. If your interviews were paired with structured instruments, see how transcripts complement a qualitative research questionnaire and how the same corpus can support a more frequency-oriented content analysis alongside thematic interpretation.
Common mistakes to avoid
- Analysing a raw ASR transcript without proofreading it against the audio.
- Changing transcription style or notation between interviews, making the data set inconsistent.
- Smoothing speech so heavily that meaning, emphasis or hesitation is lost.
- Leaving real names or employers in the transcript and only anonymising later (or never).
- Uploading sensitive audio to a cloud service participants never consented to.
- Underestimating the time, then rushing — which is where mishearings creep in.
- Forgetting line numbers and speaker labels, making later coding and quoting painful.
Doing it well: a quick checklist
- Record clean audio with a backup device.
- Decide style and notation before you start, and document the key.
- Transcribe in passes; slow the playback.
- Use ASR for a draft, then correct every line against the audio.
- Anonymise as you type and store data securely.
- Add line numbers and consistent speaker labels.
- Build realistic transcription time into your project plan.
Turn your interviews into a winning dissertation
From transcription and thematic coding to a fully written methodology and findings chapter, our UK academics can help you analyse and present your qualitative data with confidence.
Related methodology guides
- Interviews in Research
- Qualitative Data Analysis