ARM has unveiled its next-generation mobile processor technologies, with consumer devices expected by the end of the year, bringing significant updates to its branding, architecture, and AI capabilities.
ARM is rebranding its CPU line, replacing the Cortex-X and A cores with a new C1 series comprising Ultra, Performance, Pro, and Nano cores. The Mali GPUs are also being renamed, with the Immortalis line giving way to G1-Ultra, Premium, and Pro branding.
The new C1 cores are based on ARMv9.3 architecture, simplifying the previous multi-tier Cortex-X lineup. The C1-Ultra and Performance cores succeed the Cortex-X925, the C1-Pro replaces the Cortex-A725, and the C1-Nano is a revamp of the Cortex-A520. Notably, the C1-Performance is a 35% smaller variant of the C1-Ultra, targeting upper-mid-tier chipsets with a slight performance compromise.
The C1-Ultra shows a 12% IPC gain over the Cortex-X925, with an overall performance increase of around 25% when factoring in a 3nm process and a higher clock speed of 4.1GHz, compared to the Cortex-X925’s 3.6GHz. It also offers the same performance as its predecessor while consuming 28% less power. This is achieved through a larger out-of-order window handling around 2,000 instructions in flight versus the X925’s 1,500 and a 33% increase in L1 instruction-cache bandwidth.
The C1-Pro focuses on front-end improvements, including a larger branch predictor and branch target buffer, higher L1 data bandwidth, and lower L2 TLB latency, contributing to power savings. ARM claims the C1-Pro offers the same performance as the Cortex-A725 with a 26% power reduction or 11% more performance for the same power. The C1-Nano offers a 26% boost in power efficiency over the Cortex-A520, with modest performance gains of 5-8%, as it’s intended for background tasks.
A key addition to the new CPUs is SME2, ARM’s latest extension to accelerate machine learning workloads. SME2 builds on the original SME with multi-vector instructions, weight compression, and binary networks and sits outside the core as a shared execution unit. Each C1 series core can decode SME2 instructions, and the unit can shut down when not in use. ARM claims a 4.7x latency reduction in speech recognition, 4.7x faster token encoding, and an average 3.7x performance jump across various workloads compared to the same C1-Pro CPU core without SME2.
The new Mali G1-Ultra GPU offers 20% better performance for games and machine learning inference, 9% less energy per frame, and up to 2x faster ray tracing compared to last year’s Immortalis G925. The 2x faster ray tracing is achieved through hardware support for BVH traversal and a single-ray algorithm. The RTU can be power-gated when not in use. The G1 GPU comes in different branding flavors depending on the number of cores: 10+ cores with ray-tracing is a G1-Ultra, 6-9 cores is a G1-Premium, and 1-5 cores is a G1-Pro.
ARM’s Lumex platform aims to speed up time-to-market with complete platform solutions, including designs ready for chip integration and closer collaborations with foundries like TSMC. The company’s internal Lumex Reference FPGA platform hints at a top-end mobile configuration featuring two 4.1GHz C1-Ultra cores paired with six 3.5GHz C1-Pro cores, two SME2 units, a 16MB L3 cache, a 14-core Mali-G1 Ultra, and 16MB of system-level cache, all on 3nm. For near-flagship grade chipsets, ARM suggests swapping the C1-Ultra for the C1-Premium. Mid-tier chipsets could feature a single Ultra or Premium core paired with three Pro cores and four Nano cores.
The company anticipates the MediaTek Dimensity 9500 will be the first flagship SoC to feature ARM’s new C1 CPU cores and the G1-Ultra GPU, with a possibility that next year’s Google Tensor G6 will also adopt the new C1 series.




