News After teen suicide, OpenAI claims it is “helping people when they need it most”

News

Команда форума
Редактор
Регистрация
17 Февраль 2018
Сообщения
38 925
Лучшие ответы
0
Реакции
0
Баллы
2 093
Offline
#1
ChatGPT allegedly provided suicide encouragement to teen after moderation safeguards failed.


Credit: Benj Edwards / OpenAI

OpenAI published a blog post on Tuesday titled "Helping people when they need it most" that addresses how its ChatGPT AI assistant handles mental health crises, following what the company calls "recent heartbreaking cases of people using ChatGPT in the midst of acute crises."

The post arrives after The New York Times reported on a lawsuit filed by Matt and Maria Raine, whose 16-year-old son Adam died by suicide in April after extensive interactions with ChatGPT, which Ars covered extensively in a previous post. According to the lawsuit, ChatGPT provided detailed instructions, romanticized suicide methods, and discouraged the teen from seeking help from his family while OpenAI's system tracked 377 messages flagged for self-harm content without intervening.

ChatGPT is a system of multiple models interacting as an application. In addition to a main AI model like GPT-4o or GPT-5 providing the bulk of the outputs, the application includes components that are typically invisible to the user, including a moderation layer (another AI model) or classifier that reads the text of the ongoing chat sessions. That layer detects potentially harmful outputs and can cut off the conversation if it veers into unhelpful territory.

OpenAI eased these content safeguards in February following user complaints about overly restrictive ChatGPT moderation that prevented the discussion of topics like sex and violence in some contexts. At the time, Sam Altman wrote on X that he'd like to see ChatGPT with a "grown-up mode" that would relax content safety guardrails. With 700 million active users, what seem like small policy changes can have a large impact over time.

There’s no one home: The illusion of understanding


OpenAI's language throughout Tuesday's blog post reveals a potential problem with how it promotes its AI assistant. The company consistently describes ChatGPT as if it possesses human qualities, a property called anthropomorphism. The post is full of hallmarks of anthropomorphic framing, claiming that ChatGPT can "recognize" distress and "respond with empathy" and that it "nudges people to take a break"—language that obscures what's actually happening under the hood.


ChatGPT is not a person. ChatGPT is a pattern-matching system that generates statistically likely text responses to a user-provided prompt. It doesn't "empathize"—it outputs text strings associated with empathetic responses in its training corpus, not from humanlike concern. This anthropomorphic framing isn't just misleading; it's potentially hazardous when vulnerable users believe they're interacting with something that understands their pain the way a human therapist would.

The lawsuit reveals the alleged consequences of this illusion. ChatGPT mentioned suicide 1,275 times in conversations with Adam—six times more often than the teen himself.

Safety measures that fail precisely when needed


OpenAI acknowledges a particularly troublesome current drawback of ChatGPT's design: Its safety measures may completely break down during extended conversations—exactly when vulnerable users might need them most.

"As the back-and-forth grows, parts of the model's safety training may degrade," the company wrote in its blog post. "For example, ChatGPT may correctly point to a suicide hotline when someone first mentions intent, but after many messages over a long period of time, it might eventually offer an answer that goes against our safeguards."

This degradation reflects a fundamental limitation in Transformer AI architecture, as we previously reported. These models use an "attention mechanism" that compares every new text fragment (token) to every single fragment in the entire conversation history, with computational cost growing quadratically. A 10,000-token conversation requires 100 times more attention operations than a 1,000-token one. As conversations lengthen, the model's ability to maintain consistent behavior—including safety measures—becomes increasingly strained while it begins making associative mistakes.


Additionally, as chats grow longer than the AI model can process, the system "forgets" the oldest parts of the conversation history to stay within the context window limit, causing the model to drop earlier messages and potentially lose important context or instructions from the beginning of the conversation.

This breakdown of safeguards isn’t just a technical limitation—it creates exploitable vulnerabilities called "jailbreaks." In Adam’s case, the lawsuit alleges that once the system’s protective tendencies weakened from conversation steering, he was able to manipulate ChatGPT into providing harmful guidance.


Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself suggested. This vulnerability partly stems from the eased safeguards regarding fantasy roleplay and fictional scenarios implemented in February. In its Tuesday blog post, OpenAI admitted its content blocking systems have gaps where "the classifier underestimates the severity of what it's seeing."

OpenAI states it is "currently not referring self-harm cases to law enforcement to respect people's privacy given the uniquely private nature of ChatGPT interactions." The company prioritizes user privacy even in life-threatening situations, despite its moderation technology detecting self-harm content with up to 99.8 percent accuracy, according to the lawsuit. However, the reality is that detection systems identify statistical patterns associated with self-harm language, not a humanlike comprehension of crisis situations.

OpenAI’s safety plan for the future


In response to these failures, OpenAI describes ongoing refinements and future plans in its blog post. For example, the company says it's consulting with "90+ physicians across 30+ countries" and plans to introduce parental controls "soon," though no timeline has yet been provided.

OpenAI also described plans for "connecting people to certified therapists" through ChatGPT—essentially positioning its chatbot as a mental health platform despite alleged failures like Raine's case. The company wants to build "a network of licensed professionals people could reach directly through ChatGPT," potentially furthering the idea that an AI system should be mediating mental health crises.

Raine reportedly used GPT-4o to generate the suicide assistance instructions; the model is well-known for troublesome tendencies like sycophancy, where an AI model tells users pleasing things even if they are not true. OpenAI claims its recently released model, GPT-5, reduces "non-ideal model responses in mental health emergencies by more than 25% compared to 4o." Yet this seemingly marginal improvement hasn't stopped the company from planning to embed ChatGPT even deeper into mental health services as a gateway to therapists.

As Ars previously explored, breaking free from an AI chatbot's influence when stuck in a deceptive chat spiral often requires outside intervention. Starting a new chat session without conversation history and memories turned off can reveal how responses change without the buildup of previous exchanges—a reality check that becomes impossible in long, isolated conversations where safeguards deteriorate.

However, "breaking free" of that context is very difficult to do when the user actively wishes to continue to engage in the potentially harmful behavior—while using a system that increasingly monetizes their attention and intimacy.
 
Сверху Снизу