ARM unveiled its next-generation CPU and GPU designs today and it is also introducing the Lumex compute subsystem (CSS). Previously, the likes of MediaTek and Samsung would license CPU and GPU designs and piece together a chipset. The Lumex CSS is a turnkey solution .
To be clear, ARM isn’t about to start selling chips. Instead, it has crafted production-ready implementations for the 3nm semiconductor nodes of multiple foundries.
In ARM’s own words, its silicon and OEM partners will be able to “use the implementations as flexible building blocks, so they can focus on differentiation at the CPU and GPU cluster level.”
CPU | Key benefit | Performance and efficiency gains | Ideal use cases |
---|---|---|---|
C1-Ultra | Flagship peak performance | +25% single-thread performance Double-digit IPC gain year-on-year |
Large-model inference, computational photography, content creation, generative AI |
C1-Premium | C1-Ultra performance with greater area efficiency | 35% smaller area than C1-Ultra | Sub-flagship mobile segments, voice assistants, multitasking |
C1-Pro | Sustained efficiency | +16% sustained performance | Video playback, streaming inference |
C1-Nano | Extremely power-efficient | +26% efficiency, using less area | Wearables, smallest form factors |
So, there is no fixed design, however, the new parts are highly customizable. The new C1-DSU enables designs from 1 to 14 CPU cores with up to three core types choosing from four options, the C1-Ultra, C1-Premium, C1-Pro and C1-Nano that we discussed in the other article. For the GPU, the Mali-G1 scales from 1 to 24 shaders.
ARM has “secret sauce” to make the Lumex better than custom chipset designs that have come before. The new System Interconnect L1 holds the system-level cache (SLC) for the chipset and it has 71% leakage reduction compared to standard RAM designs, which minimizes idle power consumption.
Additionally, the new Memory Management Unit (MMU) L1 enables secure and cost-efficient virtualization (typically used to run multiple OSes simultaneously on the same computer).
ARM has power efficient and secure “connective tissue” for the Lumex CSS
A C1 CPU compute cluster achieves an average of 30% higher performance across six industry benchmarks. For applications like gaming and video streaming, it is an average of 15% faster. For other workloads, things like video playback, web browsing and social media, it is on average 12% more efficient. This is compared to ARM’s previous designs.
Focusing on the top-end hardware, the ARM C1-Ultra CPU offers double-digit Instructions Per Cycle (IPC) improvements over the Cortex-X925. The Mali-G1 Ultra GPU is 20% at rasterization and twice as fast on ray tracing tasks compared to the Immortalis-G925.
The newly introduced Scalable Matrix Extension 2 (SME2) is at the heart of ARM’s push for higher on-device AI performance – the new CPUs are up to 5x faster and can be up to 3x more efficient than previous designs. Additionally, the G1 GPU is 20% faster on inference compared to the previous generation.
ARM’s new hardware brings massive on-device AI improvements
Here are some quotes from key industry partners, featuring the likes of Samsung, Honor and Google:
“At Samsung, we’re excited to continue our collaboration with Arm by leveraging Arm’s compute subsystem platform to develop the next generation of flagship mobile products. This partnership enables us to push the boundaries of on-device AI, delivering smarter, faster, and more efficient experiences for our users.” Nak Hee Seong, Vice President and Head of SOC IP Development Team at Samsung Electronics
“At Honor, our mission is to bring premium experiences to more users, especially through our upper mid-range smartphones. By leveraging the Arm Lumex CSS platform, we’re able to deliver smooth performance, intelligent AI features, and outstanding power efficiency that elevate everyday mobile experiences.” Honor
“SME2-enhanced hardware enables more advanced AI models, like Gemma 3, to run directly on a wide range of devices. As SME2 continues to scale, it will enable mobile developers to seamlessly deploy the next generation of AI features across ecosystems. This will ultimately benefit end-users with low-latency experiences that are widely available on their smartphones.” Iliyan Malchev, Distinguished Software Engineer, Android at Google