• Search by category

  • Show all
Group of researchers in white lab coats analysing a rising bar chart on a screen with a glowing red lightbulb above it, symbolising innovation and data-driven insight in scientific research | Niche

The staggering extent of paper mills in cancer research

A revealing study presents one of the most comprehensive investigations to date into the prevalence of paper mill publications within peer-reviewed research literature.

February 19, 2026

A revealing study published in The BMJ on January 30th presents one of the most comprehensive investigations to date into the prevalence of paper mill publications within peer-reviewed research literature [1]. The study focuses on publications in cancer research and underscores the growing threat that paper mills pose to scientific integrity. Paper mills are unscrupulous organisations that produce ready-made or fabricated manuscripts for fee-paying authors, potentially flooding the literature with unreliable findings and eroding confidence in the research record.

In this methodological and cross-sectional analysis, Scancar and colleagues trained a machine learning classifier using a BERT (Bidirectional Encoder Representations from Transformers)-based text model on a set of more than 2,200 retracted paper mill papers sourced from the Retraction Watch database. The model was validated against independent datasets curated by image integrity specialists and achieved high classification performance. It was then applied to screen 2.6 million original cancer research articles indexed in PubMed from 1999 to 2024, using titles and abstracts to identify textual patterns associated with known paper mill products.

The results are striking: approximately 261,245 papers (9.87%) were flagged as textually similar to previously retracted paper mill publications, a higher proportion than the previous 3% estimate of paper mill paper prevalence in biomedical research. This proportion has substantially increased over time, including in the top 10% of journals by impact factor. Scancar and colleagues point out that “the increase in flagged papers in high impact factor journals highlights an important limitation of using impact factors as proxies for research quality.”

Flagged papers were distributed across publishers, countries and cancer research subfields. Over 170,000 papers in the flagged category were affiliated with Chinese institutions.

The authors emphasise that a flagged classification does not confirm fraud. They state that “to prevent unfair paper mill attributions, final decisions are always made by humans, using their expertise and a multitool detection approach to support decision making.”

Yet the scale and breadth of the findings highlight the extent of vulnerabilities in the current editorial and peer-review system. As paper mills continue to exploit the pressures of the publish-or-perish culture, scalable tools such as this machine-learning approach hold promise for augmentation of editorial screening and post-publication surveillance and could form part of a coordinated effort to safeguard research integrity and credibility.

Recent commentary in the scientific literature has begun to explore the potential systemic consequences of large-scale paper mill infiltration. Several authors warn that if the prevalence suggested by Scancar and colleagues is even partially accurate, downstream effects may include distortion of meta-analyses, contamination of clinical guidelines, and misdirection of translational funding streams. Bik has argued that industrialised image and data fabrication risks “polluting the evidentiary substrate” upon which cumulative science depends, particularly in biomedicine where reproducibility challenges are already well documented [2].

Else has suggested that the combination of generative AI tools and paper mill business models may further accelerate manuscript production, overwhelming traditional peer review and increasing the burden on post-publication correction mechanisms [3]. Meanwhile, Ngatuvai et al. cautioned that widespread low-quality or fabricated publications could amplify citation cascades and create self-reinforcing false research narratives that are difficult to unwind [4].

Collectively, these analyses speculate that science may face a bifurcation: either the entrenchment of automated integrity screening, cross-publisher data sharing, and stronger incentive reform, or we will see a gradual erosion of trust in the biomedical literature, with implications for public confidence, policy formation, and clinical care.

References

  1. Scancar B., et al. Machine learning based screening of potential paper mill publications in cancer research: methodological and cross-sectional study BMJ 392, e087581, January 2026.
  2. Bik EM, Casadevall A, Fang FC. The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications. mBio. 2016 Jun 7;7(3):e00809-16.
  3. Mazzoleni S, Ambrosino N. How Artificial Intelligence is changing scientific publishing? Unrequested advices for young researchers II. Pulmonology. 2024 Sep-Oct;30(5):413-415
  4. Ngatuvai M, Autrey C, McKenny M, Elkbuli A. Significance and implications of accurate and proper citations in clinical research studies. Ann Med Surg (Lond). 2021 Sep 11;72:102841.

About the author

Gareth Hardy
Scientific Publications Lead
View profile
Dr Gareth Hardy is Scientific Publications Lead in the Medical Writing Department at Niche Science & Technology Ltd, where he supports regulatory and scientific documentation across the clinical development lifecycle. With extensive experience in scientific communication and technical writing, Gareth plays a key role in ensuring high-quality interpretation, presentation, and reporting of complex clinical data, contributing to regulatory submissions, study reports, and peer-reviewed publications.

His work bridges scientific rigour and clear communication, enabling multidisciplinary teams to articulate evidence with precision — a critical asset in regulated environments such as early-phase and late-stage clinical development. Gareth frequently shares insights on scientific writing practice and professional development through thought leadership on LinkedIn and in industry forums. 

Dr Hardy’s leadership in publication strategy and content quality has supported contributions to scientific literature where professional writing support is acknowledged, reflecting his commitment to excellence in scientific communication.

Social Shares

Subscribe for updates

* indicates required

Related Articles

Get our latest news and publications

Sign up to our news letter

© 2025 Niche.org.uk     All rights reserved

HomePrivacy policy Corporate Social Responsibility