
OpenAI's most capable and intuitive frontier model to date, built for agentic coding, real computer use, and knowledge work that actually gets done without hand-holding.
GPT-5.5 is a frontier-scale multimodal language model engineered to handle complex reasoning, long-context understanding, and tool-driven execution with high reliability. It improves on previous generations by delivering more consistent outputs, stronger logical coherence, and better alignment with user intent.
Unlike earlier models that focused primarily on generating responses, GPT-5.5 is designed to support entire workflows from initial analysis and planning to execution and refinement—without losing context or structure along the way.
GPT-5.5 is OpenAI's strongest coding model to date. It can implement features, refactor large codebases, debug production issues, and write tests — all in a single long-horizon session without losing context. It improves on GPT-5.4 across every coding benchmark while using fewer tokens to get there.
The model can operate software directly: navigate interfaces, fill spreadsheets, submit forms, and move across applications. This isn't screen-reading theater, it interprets intent and translates it into real computer actions, scoring 78.7% on OSWorld-Verified. Think of it as a highly capable digital worker.
GPT-5.5 extends meaningfully into scientific reasoning — a new frontier for this model family. It's designed for intelligence-bottlenecked tasks where the work requires drawing connections across large bodies of information and reasoning through uncertainty rather than just retrieving facts.
Sales presentations, financial models, legal analysis, scheduling, and operational documents, GPT-5.5 scores 84.9% on GDPval, which evaluates AI performance across 44 real-world professional occupations. For many tasks, it matches or exceeds what industry professionals produce.
GPT-5.5 outperforms its predecessor across every major benchmark. Here's how the numbers look.
GPT-5.5 functions as both a coding assistant and a systems-level collaborator. It can generate production-ready code, analyze complex architectures, and identify inefficiencies in existing systems. Developers benefit from reduced iteration cycles, as the model produces more accurate outputs from the first pass and maintains consistency across large codebases.
GPT-5.5 transforms large volumes of information into clear insights. It can interpret datasets, summarize complex materials, and generate detailed analytical reports. The model is especially effective in scenarios that require connecting multiple sources of information into a coherent output.
For operational use, GPT-5.5 supports workflow automation, internal knowledge systems, and decision-support tools. It enables organizations to streamline repetitive processes while maintaining accuracy and contextual awareness, effectively bridging the gap between raw data and actionable outcomes.
How does GPT-5.5 differ from GPT-5.4?
The biggest differences are in agentic capability, long-context performance, and coding efficiency. GPT-5.5 jumps from 36.6% to 74.0% on long-context retrieval at 1M tokens. Its Terminal-Bench score moves from 75.1% to 82.7%. Crucially, it achieves all of this while using fewer tokens per task than GPT-5.4, meaning it's both smarter and cheaper to run per job. GPT-5.5 is also described as significantly more intuitive, requiring less guidance to take on ambiguous work.
What is Codex, and why does it matter for GPT-5.5?
Codex is OpenAI's agentic coding environment — a platform where developers can hand off engineering tasks to an AI agent that works through them autonomously. GPT-5.5 is the new default model inside Codex, and the gains show up clearly there: better context retention across large codebases, smarter handling of ambiguous failures, and improved performance on long-horizon engineering tasks. Over 4 million active users are now on Codex, and over 85% of OpenAI employees use it weekly.
Will GPT-5.4 still be available?
Yes, for now. Paid users can access GPT-5.4 under Legacy Models in the model picker. OpenAI has not announced a specific retirement date for GPT-5.4 at this time, following a similar pattern to previous transitions where older models remain available for several months after a major release.
How does GPT-5.5 compare to Anthropic's latest models?
According to OpenAI's benchmark data, GPT-5.5 scores higher than Gemini 3.1 Pro and Claude Opus 4.5 across the evaluations they published. On Artificial Analysis's Coding Agent Index specifically, GPT-5.5 is reported to deliver state-of-the-art coding intelligence at roughly half the cost of competing frontier models. The competition is fierce: Anthropic's Claude Mythos preview has also been drawing significant attention in enterprise circles, particularly around cybersecurity.
GPT-5.5 is a frontier-scale multimodal language model engineered to handle complex reasoning, long-context understanding, and tool-driven execution with high reliability. It improves on previous generations by delivering more consistent outputs, stronger logical coherence, and better alignment with user intent.
Unlike earlier models that focused primarily on generating responses, GPT-5.5 is designed to support entire workflows from initial analysis and planning to execution and refinement—without losing context or structure along the way.
GPT-5.5 is OpenAI's strongest coding model to date. It can implement features, refactor large codebases, debug production issues, and write tests — all in a single long-horizon session without losing context. It improves on GPT-5.4 across every coding benchmark while using fewer tokens to get there.
The model can operate software directly: navigate interfaces, fill spreadsheets, submit forms, and move across applications. This isn't screen-reading theater, it interprets intent and translates it into real computer actions, scoring 78.7% on OSWorld-Verified. Think of it as a highly capable digital worker.
GPT-5.5 extends meaningfully into scientific reasoning — a new frontier for this model family. It's designed for intelligence-bottlenecked tasks where the work requires drawing connections across large bodies of information and reasoning through uncertainty rather than just retrieving facts.
Sales presentations, financial models, legal analysis, scheduling, and operational documents, GPT-5.5 scores 84.9% on GDPval, which evaluates AI performance across 44 real-world professional occupations. For many tasks, it matches or exceeds what industry professionals produce.
GPT-5.5 outperforms its predecessor across every major benchmark. Here's how the numbers look.
GPT-5.5 functions as both a coding assistant and a systems-level collaborator. It can generate production-ready code, analyze complex architectures, and identify inefficiencies in existing systems. Developers benefit from reduced iteration cycles, as the model produces more accurate outputs from the first pass and maintains consistency across large codebases.
GPT-5.5 transforms large volumes of information into clear insights. It can interpret datasets, summarize complex materials, and generate detailed analytical reports. The model is especially effective in scenarios that require connecting multiple sources of information into a coherent output.
For operational use, GPT-5.5 supports workflow automation, internal knowledge systems, and decision-support tools. It enables organizations to streamline repetitive processes while maintaining accuracy and contextual awareness, effectively bridging the gap between raw data and actionable outcomes.
How does GPT-5.5 differ from GPT-5.4?
The biggest differences are in agentic capability, long-context performance, and coding efficiency. GPT-5.5 jumps from 36.6% to 74.0% on long-context retrieval at 1M tokens. Its Terminal-Bench score moves from 75.1% to 82.7%. Crucially, it achieves all of this while using fewer tokens per task than GPT-5.4, meaning it's both smarter and cheaper to run per job. GPT-5.5 is also described as significantly more intuitive, requiring less guidance to take on ambiguous work.
What is Codex, and why does it matter for GPT-5.5?
Codex is OpenAI's agentic coding environment — a platform where developers can hand off engineering tasks to an AI agent that works through them autonomously. GPT-5.5 is the new default model inside Codex, and the gains show up clearly there: better context retention across large codebases, smarter handling of ambiguous failures, and improved performance on long-horizon engineering tasks. Over 4 million active users are now on Codex, and over 85% of OpenAI employees use it weekly.
Will GPT-5.4 still be available?
Yes, for now. Paid users can access GPT-5.4 under Legacy Models in the model picker. OpenAI has not announced a specific retirement date for GPT-5.4 at this time, following a similar pattern to previous transitions where older models remain available for several months after a major release.
How does GPT-5.5 compare to Anthropic's latest models?
According to OpenAI's benchmark data, GPT-5.5 scores higher than Gemini 3.1 Pro and Claude Opus 4.5 across the evaluations they published. On Artificial Analysis's Coding Agent Index specifically, GPT-5.5 is reported to deliver state-of-the-art coding intelligence at roughly half the cost of competing frontier models. The competition is fierce: Anthropic's Claude Mythos preview has also been drawing significant attention in enterprise circles, particularly around cybersecurity.