"> What Does Turnitin Check For? Sources & Similarity - ResearchProspect
Home > Library > Plagiarism > What Does Turnitin Check For? Sources & Similarity

Published by at June 18th, 2026 , Revised On June 18, 2026

Turnitin checks your work for text similarity: it compares every sentence you submit against three vast collections of writing — billions of current and archived web pages, a licensed database of academic journals and books, and its own repository of previously submitted student papers — then reports the overlap as a percentage called the similarity index. It does not, on its own, decide whether you have plagiarised; it only shows where your wording matches text it has seen before. This guide explains exactly what Turnitin compares your writing against, how to read the similarity score, what the report flags versus what it cannot do, what you are allowed to exclude, how its AI-writing indicator works, and how to pre-check your draft with a free plagiarism checker before you submit.

What does Turnitin actually check?

At its core, Turnitin is a text-matching engine, not a plagiarism judge. When you (or your tutor) submit a document, Turnitin breaks the text into overlapping strings of words and fingerprints them. It then searches its databases for identical or near-identical strings and produces a Similarity Report that highlights every match and links it back to a source. The headline figure on that report is the Turnitin similarity index — the percentage of your submission that matches text already in its index.

This distinction matters enormously. Turnitin does not “detect plagiarism” in the way many students imagine. It detects overlap. A perfectly referenced direct quotation, a correctly cited statistic and a copied-and-pasted paragraph can all light up in the same colour. The academic judgement — was this fair use, sloppy paraphrasing or misconduct? — is always made by a human marker reading the report in context. Understanding that boundary is the single most useful thing you can learn about the tool, and it shapes everything else in this guide. If you want the broader picture of how text-matching fits into academic integrity, our overview of what plagiarism is sets the scene.

What does Turnitin check against? The three source categories

The question students ask most often is what does Turnitin check against. The answer is three distinct repositories, each licensed or built separately, and each contributing matches to your report. Below is the branded breakdown, followed by a detailed table.

What Turnitin Compares Your Work AgainstOpen WebBillions of current& archived pages,crawled continuouslyAcademic ContentJournals, books &publications viaCrossref & partnersStudent PapersPast submissionsin the globalstudent repositorySimilarity ReportOne overall similarity index (%)
The three source categories Turnitin compares your work against, combined into a single similarity report.

1. The open web

Turnitin runs its own web crawler that continuously indexes publicly accessible internet content — news sites, blogs, institutional pages, essay mills, Wikipedia, forums and more. Crucially, it also retains archived versions of pages, so deleting a source after you copied it does not remove the match. If your wording overlaps with anything Turnitin has crawled, current or historical, it will surface in the report.

2. Academic publications and journals

Turnitin licenses content from a large network of scholarly publishers and content partners. Through arrangements such as its membership of Crossref and direct agreements with publishers, it can match your text against peer-reviewed journal articles, conference proceedings, books and subscription databases that are not freely available on the open web. This is why paraphrasing a paywalled article you accessed through your library is no safer than paraphrasing a free web page — Turnitin very likely holds a licensed copy.

3. The student paper repository

Every time an institution submits a paper to its standard repository, that paper can be stored and used for future comparison. The result is a colossal archive of previously submitted student work spanning many years and countries. This database is what catches one student reusing another’s essay, recycled coursework bought online, and a surprising amount of self-plagiarism where a student resubmits their own earlier assignment. It is also the part of Turnitin you cannot see: matched student papers usually appear as “submitted to [institution]” without the full text, for privacy reasons.

Source category What it contains How it is built Typical matches it catches
Open web Billions of current and archived public web pages Turnitin’s own continuous web crawler plus archived copies Copied blog text, Wikipedia, essay-mill samples, online notes
Academic content Journals, books, conference papers, subscription databases Licensed via publisher partnerships and Crossref membership Lifted or lightly paraphrased passages from scholarly sources
Student papers Previously submitted assignments worldwide Stored from past submissions to institutional repositories Reused essays, recycled coursework, self-plagiarism

If you are weighing up the different ways overlap can occur — from verbatim copying to mosaic paraphrasing — our guide to the types of plagiarism maps each one to how a similarity report tends to display it.

The Turnitin similarity index explained

The similarity index is the single percentage Turnitin places at the top of the report. It represents how much of your submitted text matched sources in Turnitin’s databases. A 24% index means roughly a quarter of your words overlap with existing text somewhere in those three repositories.

The most important fact about this number is that there is no universal “pass” or “fail” threshold. Turnitin deliberately does not publish a magic acceptable percentage, because context is everything. A literature review packed with correctly cited quotations might legitimately sit at 30% and be entirely sound, while a 12% report could conceal one wholesale copied paragraph that constitutes serious misconduct. Many UK departments offer informal guidance (often quoted in the 10–20% region), but the rules vary by institution, level and assignment type. Always check your own department’s policy rather than chasing a number you read online.

Worked example — reading a similarity breakdown: Imagine a 3,000-word essay returns an overall similarity index of 26%. Drilling into the match breakdown shows: 11% from a correctly quoted and cited journal article (an unavoidable, legitimate match), 8% from your reference list and bibliography (which you can exclude), 4% spread across many one- and two-word common phrases like “on the other hand” (trivial), and a single 3% block matching an un-cited blog post. After excluding quotes and the bibliography, the meaningful figure drops to about 7% — and the only real concern is that 3% un-cited block, which needs a citation or rewriting in your own words. The headline 26% was alarming; the contextual reading tells the true story. This is why you should always read the breakdown, never just the top-line number.

What Turnitin flags — and what it does not do

Turnitin is powerful at one specific task and genuinely limited at several others. Being honest about both keeps you from making poor decisions based on what you assume the tool can see.

What it flags

  • Matching strings of text — identical and near-identical word sequences against its three databases.
  • Direct quotes, whether or not you have referenced them (they still register as matches until excluded).
  • Reference lists and bibliographies, because citations naturally repeat across many documents.
  • Lightly paraphrased text where you have only swapped a few words and kept the original sentence structure.
  • Recycled or self-plagiarised work stored in the student repository.

What it does NOT do

  • It does not judge intent. Turnitin cannot tell accidental overlap from deliberate copying — only a human marker can.
  • It does not detect every paraphrase. Genuinely rewritten ideas in your own words and sentence structure typically will not match, even when the source is in its database.
  • It does not confirm plagiarism. A high score is a prompt to investigate, not a verdict.
  • It does not check ideas or facts for originality — only the wording. Two people can describe the same concept very differently and neither will match.
  • It cannot match against private, offline or un-indexed sources it has never seen.

“The Similarity Report does not check for plagiarism in a piece of work. Instead, it will check a student’s work against our database, and if there are instances where a student’s writing is similar to, or matches against, one of our sources, it will be flagged for review.” — Turnitin Guides, Similarity Report overview

Because Turnitin misses well-executed paraphrasing, some students wrongly conclude that superficial word-swapping is a safe shortcut. It is not. Markers read for sense, and patchwork paraphrasing that fools the software still reads as poor scholarship. The legitimate route is to genuinely understand a source and restate it — our guide on how to paraphrase properly shows the difference, and the roundup of the best paraphrasing tools compares assistants that help you rephrase responsibly rather than disguise copying.

What you can exclude from the similarity score

A large part of learning to read a report is knowing which matches are noise. Instructors can configure the assignment, and within the report you can apply filters, to exclude categories that inflate the index without indicating any problem:

  • Quoted material — text wrapped in quotation marks (or otherwise marked as a quote) can be excluded so genuine, attributed quotations do not count against you.
  • Bibliography and reference list — the citations section can be filtered out, since reference formatting legitimately repeats across documents.
  • Small matches — you can set a threshold to ignore matches below a chosen number of words or a small percentage, removing trivial common-phrase overlaps.

These exclusions are settings, not loopholes: they reveal the meaningful similarity rather than hide misconduct. A marker can switch them back on at any time, and they will still see un-cited copying underneath. Excluding your bibliography is good practice; “excluding” a copied paragraph by burying it is not possible and would be misconduct in any case. Getting your in-text references right in the first place is the cleaner fix — our guide on how to cite sources correctly keeps legitimate matches attributed.

Worked example — legitimately reducing similarity: A student’s methodology paragraph matched a textbook at 100%: “A quantitative approach was adopted because it allows the researcher to test hypotheses using statistical analysis of numerical data collected from a large sample.” The fix is not to find a synonym for every word; it is to rewrite from understanding and cite the source for the idea: “This study uses a quantitative design so that the research questions can be tested statistically across a sizeable participant group, an approach Saunders et al. (2019) recommend when measurable, generalisable findings are the goal (Saunders et al., 2019).” The rewrite changes the sentence structure, adds the student’s own framing, and credits the source — lowering similarity the honest way, by improving the scholarship.

Turnitin’s AI-writing indicator

Separate from the similarity score, Turnitin offers an AI writing indicator that estimates how much of a submission may have been generated by AI tools such as large language models. This is a different system entirely: it does not look for matching strings against a database but analyses statistical patterns in the writing to predict AI authorship, returning its own percentage.

Two honest caveats matter here. First, the AI indicator and the similarity index are independent — AI-generated text is often original in the matching sense (it may score 0% similarity) yet still be flagged by the AI indicator. Second, AI detection is probabilistic and imperfect; false positives and false negatives both occur, which is why responsible institutions treat the indicator as a prompt for a conversation, not as proof. If you want to understand how AI-text detection works and sanity-check your own writing, our standalone AI content detector explains the signals these systems rely on.

How to pre-check your work before you submit

You usually only see your own Turnitin report after submission, which is too late to fix anything. The sensible move is to check a draft yourself first. Our free plagiarism checker scans up to 3,000 words against billions of web pages and gives you a quick similarity read, so you can catch an accidental un-cited passage before it reaches your marker. For a deeper review — a Turnitin-level similarity check plus AI-writing detection and a full, downloadable report — the full plagiarism report goes further than the free web check.

Whatever the score comes back as, treat it the same way a marker would: read the breakdown, exclude quotes and bibliography, and look at what remains. If a real match appears, the fix is always legitimate — cite the source, quote it correctly, or rewrite it genuinely in your own words. Our walkthrough on how to remove plagiarism sets out that process step by step, and our broader resource on avoiding plagiarism covers good citation habits that keep similarity low from the first draft.

Get your full Turnitin-level report

Check your work against billions of sources, see a clear similarity index and AI-writing analysis, and download a full report before you submit.

Key takeaways

  • Turnitin checks your work against three databases: the open web, licensed academic publications, and previously submitted student papers.
  • The similarity index is the percentage of your text that matches those sources — not a plagiarism verdict and not tied to a universal pass mark.
  • It flags matching strings of text; it does not judge intent, confirm plagiarism, or reliably catch genuine paraphrasing.
  • You can exclude quotes, your bibliography and small matches to read the meaningful figure — these are settings, not ways to hide copying.
  • A separate AI-writing indicator estimates AI-generated content and is probabilistic, so treat it as a prompt, not proof.
  • Pre-check drafts with a free or full plagiarism report, then fix any real match the legitimate way: cite, quote or genuinely rewrite.

Frequently Asked Questions

What does Turnitin check your work against?

Turnitin compares your submission against three databases: billions of current and archived open-web pages (via its own crawler), a licensed collection of academic journals, books and publications (accessed through partnerships such as Crossref), and a large repository of previously submitted student papers. Any text that overlaps with these sources is highlighted in your Similarity Report.

There is no universal pass or fail percentage — Turnitin deliberately does not set one, because context matters more than the number. A well-cited literature review might legitimately sit around 30%, while a low score could still hide a copied paragraph. Many UK departments suggest informal guidance in the 10–20% range, but you should always follow your own institution’s policy and read the match breakdown rather than chasing a figure.

Turnitin detects similarity, not plagiarism. It identifies where your wording matches text it has seen before and reports the overlap as a percentage. Whether a match is acceptable (a cited quote), careless (poor paraphrasing) or misconduct is a judgement made by a human marker reading the report in context — the software never decides that on its own.

Turnitin reliably catches light paraphrasing where you keep the original sentence structure and only swap a few words. It generally does not flag genuine paraphrasing where you have understood the source and restated it in your own words and structure. Even so, patchwork paraphrasing that slips past the software still reads as weak scholarship to a marker, so the right approach is to paraphrase properly and cite the source.

You can exclude quoted material (text marked as a quotation), your bibliography and reference list, and small matches below a chosen word or percentage threshold. These filters strip out legitimate, repeating text so you can see the meaningful similarity. They are settings, not loopholes — a marker can switch them back on, and they cannot hide un-cited copying.

Run your draft through a plagiarism checker first. ResearchProspect’s free plagiarism checker scans up to 3,000 words against billions of web pages for a quick similarity read, and the full plagiarism report adds a Turnitin-level similarity check plus AI-writing detection and a downloadable report. Then fix any real match the legitimate way: add a citation, quote it correctly, or rewrite it in your own words.

About Jamie Walker

Avatar for Jamie WalkerJamie is a content specialist holding a master's degree from Stanford University. His research focuses on the Internet of Things, as well as areas such as politics, medicine, sociology, and other academic writing. Jamie is a member of the content management team at ResearchProspect.

WhatsApp Live Chat