LLMs and Diagnostic Reasoning: A Randomized Clinical Vignette Study [pdf]

(medrxiv.org)

2 points | by trott 9 hours ago ago

1 comments

trott 9 hours ago ago
TLDR:
Physicians scored 73.7. Physicians armed with GPT-4 scored 76.3. But GPT-4 alone scored 89.2.
The authors think it's unlikely that the materials are in the GPT-4 training data, because the cases have never been publicly released.