The AI Paper Factory: Why Your Next Medical Study Might Be Written by a Bot

For years, as a university lecturer in health data analytics, I’d reassure my students when they asked, “Can we still trust academic research?” I used to say yes without hesitation.

But lately, I’ve started to pause.

It began innocently enough. Our research team at the University of Surrey in the UK has long been fascinated by the power of big data to drive innovation in healthcare. We’re also interested in how policymakers promote data sharing through Open Science. With the rise of machine learning and generative AI, the potential to unlock insights from massive datasets has never been greater. We were excited—until we saw something strange.

Over the past couple of years, there’s been a huge spike in research papers using public health datasets—especially two-sample Mendelian randomisation studies and simple association analyses that link a single lifestyle factor or biomarker to a single disease. On the surface, it looks like progress. But the more we read, the more we noticed a pattern.

These papers were formulaic. Fast. Identical in structure. Often lacking context. They read like they had been churned out by an algorithm. And that’s when we started asking: What if they were?

Together with colleagues from Aberystwyth University, we launched an investigation. Our focus was the NHANES database—an extensive health and nutrition dataset maintained by the U.S. Centers for Disease Control and Prevention. NHANES is entirely open-access, with APIs that make it easy to plug the data directly into Python or R, apply machine learning models, and—if you’re inclined—generate a paper at the push of a button.

The numbers were shocking. Publications referencing NHANES more than doubled from 4,600 in 2022 to over 8,000 in 2024. It was like mushrooms after rain: sudden, everywhere, and suspiciously uniform.

We reviewed a sample of these studies and uncovered some serious problems. Many papers used narrow, cherry-picked timeframes without justification. Others failed to apply false discovery rate corrections—a key step when testing thousands of potential correlations. Worse still, most of them reduced complex, multi-factorial diseases to single-variable explanations: “Eat more of X, and you’ll avoid Y.” It’s a seductive message, but scientifically shaky.

It reminded me of a student once saying, “If I just test every variable and publish the ones that look interesting, I’m bound to find something.” That’s not science. That’s data dredging.

When we statistically corrected the papers’ findings ourselves, more than half of them lost their significance. In other words, what had been presented as “groundbreaking discoveries” were often just noise. And yet, these papers are entering the scientific record. They’re being indexed, cited, and—most worryingly—used to train the next generation of AI tools.

Because yes, AI learns from these papers. When generative models are trained on open-access research, they absorb the patterns and conclusions—flawed or not. If AI tools are being fed a steady diet of fast-food science, what kind of nutrition are we giving the future of research?

Of course, AI can be transformative. It speeds up workflows, removes barriers, and helps us standardize complex analysis. But let’s not kid ourselves: dishonest actors can move much faster than thoughtful scientists. The result? A flood of low-quality, misleading content masquerading as peer-reviewed research.

Some of these papers, we suspect, are being produced by paper mills—covert businesses that sell fabricated or low-effort manuscripts to paying customers. Others are likely written by individuals who copy-paste templates, tweak the variables, and let ChatGPT do the writing.

We’re not here to vilify technology. But we do need to change the incentive structure that rewards volume over value. Right now, academia is trapped in a "publish or perish" cycle where quantity often trumps quality. Even past reforms—like the move to Open Access journals funded by article processing charges—have introduced new perverse incentives.

There’s no quick fix. But journal editors and peer reviewers must be willing to draw the line. Low-effort, copy-paste studies with no scientific contribution should be rejected. Tools like COSIG (Collection of Open Science Integrity Guides) can help identify red flags in submissions.

And the rest of us—students, researchers, everyday readers—we need to stay critical. Not every study is what it seems. Not every author is who (or what) they claim to be.

We’re entering an age where some of the most “prolific” scientists might not be human at all.

kelemen

Search This Blog

The AI Paper Factory: Why Your Next Medical Study Might Be Written by a Bot