New Delhi, Oct 8, 2024
While ChatGPT has shown its potential in interacting with patients and acing medical examinations, the popular generative artificial intelligence (AI) platform by OpenAI can overprescribe unneeded x-rays and antibiotics in emergency care, finds a study on Tuesday.
The study led by researchers from the University of California-San Francisco (UCSF) showed that ChatGPT even admitted people who didn’t require hospital treatment.
In the paper published in the journal Nature Communications, the researchers said that, while the model could be prompted in ways that make its responses more accurate, it’s still no match for the clinical judgement of a human doctor.
“This is a valuable message to clinicians not to blindly trust these models,” said lead author postdoctoral scholar Chris Williams at UCSF.
“ChatGPT can answer medical exam questions and help draft clinical notes, but it’s not currently designed for situations that call for multiple considerations, like the situations in an emergency department,” he added.
A recent study by Williams showed that ChatGPT, a large language model (LLM) was slightly better than humans at determining which of two emergency patients was most acutely unwell — a straightforward choice between patient A and patient B.
In the current study, he challenged the AI model to perform a more complex task: providing the recommendations a physician makes after initially examining a patient in the emergency — whether to admit the patient, get x-rays or other scans, or prescribe antibiotics.
For each of the three decisions, the team compiled a set of 1,000 emergency visits to analyse from an archive of more than 251,000 visits.
The sets had the same ratio of “yes” to “no” responses for decisions on admission, radiology, and antibiotics.
The team entered doctors’ notes on each patient’s symptoms and examination findings into ChatGPT-3.5 and ChatGPT-4. Then, the accuracy of each set was tested with increasingly detailed prompts.
The results showed the AI models recommended services more often than was needed. While ChatGPT-4 was 8 per cent less accurate than resident physicians, ChatGPT-3.5 was 24 per cent less accurate.
“AI’s tend to overprescribe because these models are trained on the internet. To date, legitimate medical advice-giving sites have not been designed, which can answer emergency medical questions.(Agency)