In Harvard study, AI offered more accurate diagnoses than emergency room doctors
Key takeaways
- A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
- The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center.
- In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two attending physicians to those generated by OpenAI’s o1 and 4o models.
Why this matters: a development in AI with implications for how people work, create, and decide.
A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how Open AI’s models compared to human physicians.
In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two attending physicians to those generated by OpenAI’s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.