OpenAI's GPT-5.3-Codex-Spark Achieves 1,000+ Tokens Per Second on Cerebras Hardware

OpenAI has launched GPT-5.3-Codex-Spark, a new AI model optimized for real-time coding, which delivers over 1,000 tokens per second by running exclusively on Cerebras Systems' Wafer-Scale Engine 3 (WSE-3) hardware. This development marks the first public result of a strategic partnership between OpenAI and Cerebras, aimed at enhancing low-latency AI inference. The model, a smaller variant of GPT-5.3-Codex, was released as a research preview for ChatGPT Pro users on February 12, 2026.

The collaboration has led to significant technical advancements, as highlighted by Sherwin Wu in a recent blog post. Wu stated, > "@cerebras made GPT‑5.3‑Codex‑Spark so fast that we had to rethink how Codex uses the Responses API – leading us to build WebSocket support for ultra fast latency. Cerebras speed is just 🤯" These optimizations include the introduction of persistent WebSocket connections and targeted improvements to the Responses API, reducing client/server roundtrip overhead by 80% and time-to-first-token by 50%.

GPT-5.3-Codex-Spark is specifically designed for interactive coding workflows, allowing developers to make targeted edits and refine logic with near-instant feedback. Benchmarks like SWE-Bench Pro and Terminal-Bench 2.0 demonstrate its strong performance, completing tasks in a fraction of the time compared to its predecessors. This focus on speed over deep reasoning addresses the need for immediate responsiveness in developer tools.

The underlying technology, Cerebras' WSE-3, is the world’s largest AI chip, boasting 4 trillion transistors, 900,000 AI-optimized cores, and delivering 125 petaflops of AI compute. Its wafer-scale architecture is crucial for minimizing data movement and reducing latency, a common bottleneck in interactive inference workloads. Cerebras Systems introduced the WSE-3 in March 2025, emphasizing its capability to power scalable AI supercomputers.

OpenAI's partnership with Cerebras, initially announced in January 2026, signifies a broader strategy to diversify its hardware infrastructure beyond Nvidia. The agreement includes a $1 billion loan from OpenAI to Cerebras and plans for Cerebras to provide up to 750 megawatts of computing power through 2028. This collaboration positions Cerebras as a key provider for latency-sensitive AI workloads, complementing OpenAI's continued reliance on Nvidia for training and general inference tasks.