Researchers Say ‘Natural Decision-Making’ Prompt Strategy Boosts AI Accuracy in Healthcare Advice

Insider Brief

  • Researchers at Technische Universität Berlin found that prompting large language models to reason more like humans significantly improved their ability to provide medical care-seeking advice, according to a study published in JMIR Biomedical Engineering.
  • The study tested 10 ChatGPT models using psychological decision-making frameworks based on Naturalistic Decision-Making and found self-care recommendation accuracy improved from about 13% with standard prompts to nearly 30% using human reasoning strategies.
  • Researchers said the prompting approach reduced the models’ tendency toward excessive caution while maintaining strong performance identifying true emergencies, though they cautioned additional research is needed in real-world settings.

Researchers at Technische Universität Berlin report finding that prompting large language models to reason more like humans significantly improved their ability to provide medical care-seeking advice, according to a study published in JMIR Biomedical Engineering.

The study focused on a growing problem surrounding AI health tools such as ChatGPT: the tendency to recommend emergency or professional medical care too often, even for relatively minor conditions. Researchers said this “over-triage” can increase healthcare costs and unnecessary patient anxiety.

The research team tested 10 ChatGPT models, including GPT-4o and GPT-5 systems, using prompts based on psychological decision-making frameworks rather than traditional computer-style instructions.

The study centered on a concept known as Naturalistic Decision-Making, which examines how experienced professionals make decisions in uncertain, high-pressure situations. Researchers adapted two human reasoning frameworks for the AI systems:

  • Recognition-Primed Decision-Making, which encourages matching symptoms to familiar situations and mentally simulating outcomes.
  • Data-Frame Theory, which asks the model to build and continuously reevaluate its understanding of a situation as new information appears.

According to the study, the approach improved accuracy across all tested models. The largest gains appeared in self-care recommendations, where accuracy increased from about 13% using standard prompts to nearly 30% with the human reasoning prompts.

Researchers also found that simpler non-reasoning AI models, which previously struggled to recognize situations appropriate for self-care, became more capable when guided with structured “human reasoning blueprint” strategies. At the same time, the systems maintained strong performance identifying genuine emergencies.

The findings suggest that AI systems may perform better in messy, real-world medical situations when guided using models of human cognition rather than strict computational logic alone. Researchers said the prompts helped reduce the models’ tendency toward excessive caution by encouraging them to reassess assumptions and simulate possible outcomes before making recommendations.

Researchers cautioned that additional studies will be needed to determine whether the prompting approach improves medical decision support in everyday non-standardized settings outside controlled testing environments.

The study was conducted by Marvin Kopka and Markus A. Feufel at Technische Universität Berlin’s Division of Ergonomics, Department of Psychology & Ergonomics. Their research focuses on human decision-making and the safe integration of AI systems into real-world environments.

“When testing AI, we too often give it perfect information and then see that it performs extremely well,” Kopka noted. “But many problems in the real world are ill-defined. We have good models for how experts make decisions in such situations, so using them as prompts seemed like an obvious next step. I hope that applying human decision-making to LLMs will help us develop AI tools that are also useful in real-world decision-making.”

Image Credit: Marvin Kopka