Clinicians struggled to discern whether research abstracts were written by a human or by an AI chatbot, according to a research letter in .
A survey study of more than 100 healthcare professionals revealed just 43% accuracy for identifying when research abstracts were written by OpenAI's ChatGPT 3.5 or by a human, according to Dennis Ren, MD, of the Children's National Hospital in Washington, D.C., and colleagues.
Two-thirds of respondents reported having prior experience with reviewing abstracts (66.7%). However, this experienced group was less accurate than those who reported having no experience reviewing abstracts (39.7% vs 49.3%), the authors found.
"We (healthcare professionals) cannot reliably distinguish between human vs AI-generated research abstracts," Ren told 51˶.
Ren and colleagues recruited 102 participants using a web-based survey that was distributed to healthcare professionals between August 1 and November 30, 2023. Most participants were attendings or consultant physicians (59), while 23 were fellows, 7 were nurses, 5 were residents or hospital medical officers, 2 were medical students, 2 were advanced practitioners, and 4 were "other."
Participants were given 4 research abstracts to analyze: two were generated by human researchers from the Pediatric Academic Societies Meeting in 2020, and two were generated by ChatGPT 3.5.
Overall, 55% and 40% correctly identified the human-generated abstracts, while 20% and 57% correctly identified the AI-generated abstracts.
When stratified by healthcare professional type, overall accuracy ranged from 25% among medical students to 44.6% among fellows, they found. However, they cautioned the study wasn't powered to perform subgroup analyses by training level, and it was also potentially limited by selection bias.
They also found that the majority of respondents (72.5%) thought that using AI to write research abstracts was ethical.
Ren believes there are benefits to using AI in scientific writing, especially as a resource for clinical research. However, he noted there are still no standards for what constitutes "appropriate use" of AI in clinical research.
"We may get to a point where the use of AI in writing and research becomes a default assumption," he said. "We don't report when we use Microsoft Word to write a manuscript."
Ren said he believes AI can become a catalyst for , if a measured standard can be established among authors and publishers. Still, he cautioned that all scientific writing needs to be critically appraised, regardless of the involvement of AI chatbots.
"We have to sift through a lot of papers and research as physicians to stay updated on the evidence and research that is coming out," Ren said. "I encourage everyone to be skeptical whenever they read any research article. Critically appraise everything regardless of origin."
John Ayers, PhD, MA, of the University of California San Diego, who wasn't involved in the study, agreed that the results highlight the need for improved critical appraisal skills among healthcare professionals as AI is increasingly incorporated into clinical research.
He argued that medical schools spend a lot of time teaching students about clinical medicine without spending enough time teaching the critical appraisal skills needed to evaluate new research.
While identifying AI-generated writing appears to be difficult, he said, there are some simple tips to consider when reviewing research writing. For example, AI chatbots have a tendency to overuse uncommon synonyms in research writing, Ayers said.
Disclosures
Ren reported serving as an AI Clinical Fellow for Glass Health.
No other authors reported any relevant financial conflicts of interest.
Primary Source
JAMA Pediatrics
Ren D, et al 'Identification of human-generated vs ai-generated research abstracts by health care professionals' JAMA Pediatr 2024; DOI: 10.1001/jamapediatrics.2024.0760.