OpenAI doesn't really want you to know what its latest AI model is “thinking.” Ever since the company launched its “strawberry” AI model family last week, with o1-preview and o1-mini touting so-called reasoning capabilities, OpenAI has been sending warning emails and ban threats to any user who tries to figure out how the model works.
Unlike OpenAI's previous AI models, such as GPT-4o, the company specifically trained o1 to work through a step-by-step problem-solving process before generating an answer. When users ask the “o1” model a question in ChatGPT, users have the option to view this thought-chain process written in the ChatGPT interface. However, by design, OpenAI hides the raw chain of thought from users, instead presenting a filtered interpretation created by another AI model.
There's nothing more fascinating to enthusiasts than hidden information, so there's a race between hackers and red-teamers to uncover the O1's raw brainwaves, using jailbreaking or prompt injection techniques to attempt to trick the model into revealing its secrets. There have been early reports of some successes, but nothing has been confirmed yet.
Meanwhile, OpenAI is conducting monitoring through the ChatGPT interface, and the company is reportedly taking a tough stance against any attempts to probe O1's logic, even among those who are simply curious.
One X user reported (confirmed by others, including Scale AI prompt engineer Riley Goodside) that they received a warning email if they used the term “reasoning trace” in conversation with O1. Others say they receive a warning simply when asking ChatGPT about the model's “reasoning.”
The warning email from OpenAI said specific user requests had been flagged for violating policies against using security features or bypassing security measures. “Please stop this activity and ensure you are using ChatGPT in accordance with our Terms of Use and our usage policies,” it read. “Additional violations of this policy may result in losing access to GPT-4o with Reasoning,” referring to an internal name for the o1 model.
Marco Figueroa, who manages Mozilla's GenAI bug bounty program, was one of the first to post about the OpenAI warning email on X last Friday, complaining that it hindered his ability to conduct positive red-teaming security research on the model. “I was so lost in focusing on #AIRedTeaming that I didn't realize I got this email from @OpenAI after all my jailbreaks yesterday,” he wrote.I am now on the banned list!!!,
Hidden chains of thoughts
In a post on OpenAI's blog titled “Learning to reason with LLM,” the company says the hidden chains of thought in AI models provide a unique surveillance opportunity, allowing them to “read the mind” of the model and understand its so-called thought process. Those processes are most useful to the company if they are left raw and uncensored, but this may not be in the company's best business interests for a number of reasons.
“For example, in the future we may want to monitor thought chains for signs of user manipulation,” the company writes. “However, for this to work the model must have the freedom to express its thoughts in its unaltered form, so we cannot train any policy compliance or user preferences on thought chains. We also do not want to make thought chains directly visible to users.”
OpenAI decided against showing these raw chains of thoughts to users, citing factors such as the need to maintain the raw feed for its own use, user experience, and “competitive advantage.” The company admits that this decision has disadvantages. “We try to partially compensate for this by teaching the model to reproduce any useful ideas from the chain of thought in response,” they write.
On the issue of “competitive advantage”, independent AI researcher Simon Willison expressed frustration in an article on his personal blog. “I think this is an issue that is a challenge for AI. [this] “We do not expect other models to be able to train against the reasoning task they have invested in,” they write.
It is an open secret in the AI industry that researchers routinely use the output from OpenAI's GPT-4 (and GPT-3 before that) as training data for AI models that often later become competitors, even though this practice violates OpenAI's terms of service. Exposing o1's raw chain of thought would be a huge repository of training data for competitors to train o1-like “reasoning” models.
Willison believes it's detrimental to community transparency that OpenAI is keeping such a close watch on O1's inner workings. “I am not happy at all with this policy decision,” Willison wrote. “As someone who develops against LLM, explainability and transparency are everything to me – the idea that I can run a complex prompt and have key details of how that prompt was evaluated hidden from me seems like a huge step backwards.”