OpenAI updates ChatGPT safety systems to track risk across sensitive conversations

AISafeguarding

18 May

The changes use limited safety summaries to help ChatGPT respond with more caution when signs of self-harm or harm to others emerge over time.

Laptop screen showing an AI chat interface with digital safety icons in a modern workspace — ***OpenAI has updated ChatGPT safety systems to better recognize risk in sensitive conversations***

OpenAI has updated ChatGPT to better recognize safety risks that develop across sensitive conversations, as AI systems face closer scrutiny over how they respond to users in distress.

The company says the changes are designed to help ChatGPT identify subtle or evolving cues linked to acute risks, including suicide, self-harm, and harm to others. The update includes changes to model policies and training, alongside new "safety summaries" that capture short, factual notes about safety-relevant context from earlier conversations in rare, high-risk situations.

The announcement was also shared on LinkedIn by Declan Grabb, Safety at OpenAI and a mental health provider, who wrote that risk is not always clear from a single message: "As a mental health provider, I am deeply familiar with how important context can be when understanding the gravity or acuity of what someone is saying."

He added: "In a very similar way, safety risks won't always be evident from a single, isolated message. They may emerge gradually, through subtle shifts in context, intent, or behavior over time. And in sensitive conversations, this context can matter as much as a single message."

Safety summaries focus on rare high-risk situations

OpenAI says the new safety summaries are not designed for general personalization or long-term memory. Instead, they are short, narrowly scoped notes created by a model trained for safety reasoning tasks, kept for a limited time, and used only when relevant to a serious safety concern.

The company says the summaries can help ChatGPT connect warning signs when a later request appears ordinary on its own but becomes more concerning when viewed alongside earlier context. In those situations, ChatGPT may respond by de-escalating, refusing harmful details, or redirecting users toward safer alternatives.

OpenAI says the work builds on more than two years of collaboration with mental health and safety experts. The company worked with psychiatrists and psychologists through its Global Physicians Network, including specialists in forensic psychology, suicide prevention, and self-harm.

Grabb continued: "This work was developed with input from mental health professionals, including psychiatrists and psychologists with expertise in suicide prevention, self-harm, and forensic psychology.

"Their insight helped inform how ChatGPT should recognize warning signs that may emerge over the course of a conversation, and how it should use limited, safety-relevant context when responding."

OpenAI reports stronger safe-response performance

OpenAI says internal evaluations showed improvements in how ChatGPT responds when risk becomes clearer over time.

In long single-conversation scenarios, the company says safe-response performance improved by 50 percent in suicide and self-harm cases, and by 16 percent in harm-to-others cases. OpenAI says this means the model was more likely to recognize when earlier parts of a conversation changed the meaning of a later request.

The company also tested performance across multiple conversations and models. On GPT-5.5 Instant, which OpenAI describes as the current default model in ChatGPT, safe-response performance improved by 52 percent in harm-to-others cases and by 39 percent in suicide and self-harm cases.

OpenAI also evaluated the safety summaries themselves across more than 4,000 assessments. The summaries received an average safety relevance score of 4.93 out of five and a factuality score of 4.34 out of five.

The company says it tested whether adding safety context affected ordinary conversations and found responses remained broadly comparable, with no meaningful user preference between responses with or without safety summaries.

Sensitive AI use remains under scrutiny

The update comes as schools, colleges, universities, and families continue to assess how AI tools should be used in settings where users may raise personal, emotional, or safeguarding-related issues.

For education providers, the announcement adds another layer to the wider conversation around AI in education, where adoption is moving faster than many policies around student wellbeing, safeguarding, and responsible use.

OpenAI says the work currently focuses on self-harm and harm-to-others scenarios. It says it may explore similar methods in other high-risk areas, including biology or cyber safety, with safeguards in place.

Grabb wrote: "This is hard, ongoing work. No system will be perfect. But helping AI better recognize when context matters is an important step toward building systems that are useful in everyday moments and take additional care in the moments that matter most."

Anyone in the United States experiencing emotional distress or thoughts of suicide can call or text the 988 Suicide & Crisis Lifeline on 988 for free, 24/7 support. Anyone in the UK or Ireland can contact Samaritans for free on 116 123, 24 hours a day. In an immediate emergency, call local emergency services.