OpenAI releases GPT-5.4 mini and nano to target high-volume AI workloads
New smaller models focus on speed, cost, and scalability as demand grows for AI systems that can operate in real-time across coding and multimodal tasks.
OpenAI has released GPT-5.4 mini and GPT-5.4 nano, two smaller AI models designed for high-volume workloads, as organizations look to balance performance, cost, and latency in production environments.
Announced via a LinkedIn post from OpenAI for Business, the models extend the capabilities of GPT-5.4 into more efficient formats, supporting use cases such as coding assistants, subagents, and real-time multimodal applications. The release reflects a broader shift in AI deployment, where speed and responsiveness are becoming as important as raw model capability.
The company said in its LinkedIn post: “Today we’re releasing GPT-5.4 mini and GPT-5.4 nano, our most capable small models yet.”
It added: “They bring many of the strengths of GPT-5.4 to faster, more efficient models built for high-volume workloads.”
Performance gains focus on speed and cost efficiency
GPT-5.4 mini improves on GPT-5 mini across coding, reasoning, multimodal understanding, and tool use, while running more than twice as fast. Benchmarks show it approaches the performance of the larger GPT-5.4 model on evaluations such as SWE-Bench Pro and OSWorld-Verified.
GPT-5.4 nano, positioned as the smallest and lowest-cost model in the series, is designed for tasks including classification, data extraction, ranking, and supporting coding workflows through subagents.
The company said: “GPT-5.4 mini improves on GPT-5 mini across coding, reasoning, multimodal understanding, and tool use while running more than 2x faster.”
It added: “GPT-5.4 nano is our smallest, cheapest GPT-5.4 model, optimized for classification, data extraction, ranking, and coding subagents.”
The models are designed for environments where latency directly affects user experience, including systems that require rapid iteration or real-time interaction.
Shift toward multi-model systems and subagents
OpenAI is positioning the new models as part of a broader architecture where multiple models operate together. In this setup, larger models handle planning and decision-making, while smaller models execute specific tasks quickly at scale.
This approach is particularly relevant for coding workflows and enterprise AI systems, where different levels of reasoning are required across tasks. Smaller models such as GPT-5.4 mini can handle subtasks like navigating codebases or processing documents, while larger models manage coordination and final outputs.
The company said in its LinkedIn post: “These models are designed for responsive coding assistants, subagents, and multimodal applications that need low latency without giving up strong performance.”
Availability across API, Codex, and ChatGPT
GPT-5.4 mini is available across OpenAI’s API, Codex, and ChatGPT, where it can be used for text and image inputs, tool use, and computer-based workflows. GPT-5.4 nano is currently available through the API.
The pricing structure reflects the focus on scalability, with lower costs per token compared to larger models, making them suitable for applications that require frequent or continuous use.
The release highlights a shift in how AI is being deployed across sectors including education and EdTech. Rather than relying solely on large, general-purpose models, organizations are increasingly building systems that combine different model sizes to optimize both performance and cost.
ETIH Innovation Awards 2026
The ETIH Innovation Awards 2026 are now open and recognize education technology organizations delivering measurable impact across K–12, higher education, and lifelong learning. The awards are open to entries from the UK, the Americas, and internationally, with submissions assessed on evidence of outcomes and real-world application.