How winnowing fingerprints work

How winnowing fingerprints work

Every plagiarism detector that operates over a corpus larger than a few megabytes faces the same uncomfortable arithmetic. A document of n characters contains roughly n overlapping k-grams. Hashing all of them and storing the result produces an index larger than the corpus itself, which is fine for a homework folder and ruinous for the web. The whole problem of … Read more

Support vector machines in suspicious passage detection

Support vector machines in suspicious passage detection

For roughly the decade between 2005 and 2015, almost every machine-learning paper in the plagiarism-detection literature converged on the same classifier. It was not a neural network. It was not a decision tree, an ensemble, or a Bayesian model. It was the support vector machine – a linear classifier with a particular mathematical trick – … Read more

Why exact string matching still matters

Why exact string matching still matters

There is a tendency in any technical field to assume that the newest method is the best one. In plagiarism detection – and the broader family of problems concerned with finding copied text within and across very large document collections – the discourse of the past decade has been dominated by semantic embeddings, transformer-based paraphrase … Read more

Clone plagiarism and why it is still everywhere

Clone plagiarism and why it is still everywhere

Of the ten categories in Turnitin’s widely cited Plagiarism Spectrum, the first is also the bluntest: the clone, defined as “submitting another’s work, word-for-word, as one’s own” (Turnitin, 2015). It is the form of textual misappropriation that most readers picture when they hear the word plagiarism – a passage lifted intact, perhaps with the author’s name swapped at … Read more

What counts as plagiarism when nothing is copied word for word?

What counts as plagiarism when nothing is copied word for word?

Most people picture plagiarism as a side-by-side comparison: two paragraphs, the same sentences, perhaps a clumsily swapped synonym here and there. It is a tidy mental model, and it is the one that copy-detection software was built to enforce. It is also, unfortunately, wrong – or at least so incomplete that it misleads almost everyone … Read more

Reframing academic integrity in the age of generative AI

Generative ai concept - partially generated human face on a yellow background

The emergence of generative Artificial Intelligence (AI) tools like ChatGPT has sent shockwaves through higher education. Universities and educators are grappling with how to maintain academic honesty. Students can now produce essays or code at the click of a button using AI, and this challenges traditional notions of authorship. Plagiarism, traditionally understood as copying someone … Read more

Plagiarism checks and PhD theses: How rigorous is rigorous enough?

Plagiarism detection concept - a detective smoking a pipe

Universities worldwide increasingly subject PhD theses to rigorous plagiarism checks to safeguard academic integrity. Many institutions now mandate that every doctoral thesis be screened using text-matching software before final submission (Karolinska Institutet 2025). For example, Karolinska Institutet in Sweden performs systematic plagiarism scans on all doctoral dissertations as part of standard research practice (Karolinska Institutet … Read more