Every week, more than 230 million people turn to ChatGPT for answers. They ask about everything from coding bugs to dinner recipes, but increasingly, they are asking about their health. According to OpenAI, users are checking if food is safe to eat, managing chronic allergies, or looking for ways to kick a stubborn cold.
However, a new study published in the journal Nature suggests that while ChatGPT is a brilliant conversationalist, it is a dangerously inconsistent triage nurse. Researchers from Mount Sinai in New York found that while the AI handles "textbook" medical emergencies with ease, it fails to recognize the gravity of more subtle, life-threatening situations more than half the time.
The study, led by Ashwin Ramaswamy, sought to answer a fundamental question: if a user is in the middle of a medical crisis, will ChatGPT tell them to go to the emergency room? To test this, researchers presented the AI with various clinical scenarios.
The results revealed a striking dichotomy. When faced with classic, unmistakable emergencies—such as the sudden facial drooping of a stroke or the hives and wheezing of a severe allergic reaction—ChatGPT performed admirably. It recognized the patterns it had been trained on and correctly advised immediate medical intervention.
But medicine is rarely just a series of textbook definitions. The study found that ChatGPT struggled significantly when the danger was not immediately obvious. In cases where symptoms were more nuanced or required a higher level of clinical suspicion, the AI underestimated the severity of the situation in over 50% of the trials.
To understand why an advanced large language model (LLM) fails here, it helps to use an analogy. Think of ChatGPT as a world-class librarian who has read every medical textbook ever written but has never actually seen a patient. The librarian can recite the symptoms of a rare disease perfectly, but they lack the "clinical intuition" to notice the subtle grayness in a patient’s complexion or the specific way a person describes a "dull ache" that might actually signify internal bleeding.
AI operates on pattern recognition and probability. In a textbook emergency, the patterns are loud and clear. In a subtle emergency, the patterns are muffled. Because the AI cannot ask clarifying physical questions or observe the patient's demeanor, it often defaults to a more conservative, less urgent interpretation of the data provided.
The primary concern for health professionals is the "false green light." When a person asks an AI about a symptom and the AI suggests a home remedy or a "wait and see" approach, the user feels a sense of relief. This cognitive reassurance can lead to dangerous delays in seeking professional help.
| Scenario Type | AI Performance | Typical Example |
|---|---|---|
| Textbook Emergency | High Accuracy | Chest pain radiating to the left arm (Heart Attack) |
| Clear-Cut Trauma | High Accuracy | Deep arterial bleeding or obvious bone fracture |
| Subtle Emergency | Low Accuracy | Ectopic pregnancy symptoms or early-stage sepsis |
| Chronic Management | Moderate Accuracy | Adjusting diet for known Type 2 Diabetes |
As the table suggests, the risk lies in the middle ground. A user might describe a "bad stomach ache" that is actually appendicitis. If the AI focuses on indigestion rather than the risk of rupture, the window for a safe, routine surgery could close.
OpenAI has never claimed that ChatGPT is a medical device. In fact, the platform’s terms of service explicitly state that the tool is not intended for medical advice, diagnosis, or treatment. Most medical queries now trigger a standard disclaimer: "I am an AI, not a doctor. Please consult a healthcare professional."
However, as the Mount Sinai study highlights, these disclaimers are often buried beneath paragraphs of seemingly authoritative advice. When a user is in pain or panicking, they are likely to skip the disclaimer and head straight for the suggested remedy. The study suggests that the current guardrails may not be enough to prevent users from relying on the AI during a critical window of time.
Does this mean you should never use AI for health-related questions? Not necessarily. AI can be an excellent tool for health literacy—explaining complex medical terms, helping you prepare questions for your doctor, or finding healthy recipes. But when it comes to diagnosis and triage, a different approach is required.
Practical Steps for Users:
The integration of AI into healthcare is inevitable and, in many ways, desirable. In the future, specialized medical AI models trained on verified clinical data—rather than the general internet—may become incredibly accurate triage tools.
Until then, the Mount Sinai study serves as a vital reminder: ChatGPT is a powerful mirror of human knowledge, but it lacks the life-saving judgment of a human doctor. When your health is on the line, the best "algorithm" is still a trip to the emergency room.



Our end-to-end encrypted email and cloud storage solution provides the most powerful means of secure data exchange, ensuring the safety and privacy of your data.
/ Create a free account