NVIDIA has unveiled the “Rubin CPX” GPU at the AI Infra Summit, a specialized accelerator from the upcoming “Rubin” family designed for massive-context AI models, expected to be available by late 2026.
The Rubin CPX GPU is designed to deliver 30 PetaFLOPS of NVFP4 compute on a monolithic die with 128 GB of GDDR7 memory. This monolithic configuration is a departure from the dual-GPU packages of NVIDIA’s current Blackwell and Blackwell Ultra architectures and what the rest of the Rubin family will follow. The Rubin CPX addresses computational bottlenecks in extended-context scenarios, processing millions of tokens simultaneously for applications like comprehensive software codebase analysis and hour-long video processing, which can require up to one million tokens.
The processor integrates four NVENC and four NVDEC video encoders on-chip, enabling streamlined multimedia workflows. NVIDIA states that the Rubin CPX delivers three times the attention processing speed of its current GB300 Blackwell Ultra accelerator systems. The architecture uses a cost-optimized single-die approach to potentially reduce manufacturing complexity while maintaining computational density. Although memory bandwidth specifications are undisclosed, a 512-bit interface could yield around 1.8 TB/s throughput with 30 Gbps GDDR7 memory chips.
NVIDIA plans to integrate Rubin CPX processors into the Vera Rubin NVL144 CPX platform, combining traditional Rubin GPUs with the specialized CPX variants. This hybrid setup aims for 8 ExaFLOPS of aggregate compute and 1.7 PB/s of memory bandwidth across a complete rack deployment. The “Kyber” rack will include ConnectX-9 network adapters with 1600G networking, Spectrum6 with 102.4T switching, and co-packaged optics.
NVIDIA is marketing the Rubin CPX as a one-off in the Rubin family to handle the complexity of test-time scaling AI systems. As models evolve into sophisticated reasoning agents, inference splits between computationally intensive context processing and memory-bandwidth-dependent token generation. The CPX design is optimized for these dual requirements, handling context prefill operations for enterprise chatbots with 256,000 tokens or code analysis exceeding 100,000 lines. This specialization is critical for AI systems that need persistent memory across extended interactions, which NVIDIA aims to enable seamlessly with this hardware.
NVIDIA’s rapid development cycle has boosted its financial performance, with the company reporting $41.1 billion in data center sales in its most recent quarter.




