ARM V3, N3 CSS for custom chip and chiplet designs

ARM V3, N3 CSS for custom chip and chiplet designs

Technology News |
By Nick Flaherty

ARM has launched two high performance compute sub-systems (CSS) to help companies develop custom chips and chiplets for datacentres and AI.

The CSS are based on up to 64 high performance Neoverse V3 Poseiden cores for the first time and up to 32 N3 Hermes cores for more balance between power consumption, performance and cost. These will almost certainly fall under the US trade restrictions ono AI chips.  

ARM is also planning a CSS codenamed Vega based on its next generation Adonis core  as well as a CSS codenamed Ranger with the Dionysus core that is under development.

These are the first CSS implementations based on the V3 and N3 cores which use the V9.2 instruction set architecture (ISA) and follow the first CSS which used the Neoverse N2. Microsoft used this for its recent Cobalt 100 custom data centre chip and Amazon used the V2 for its Graviton4 chips.

Both are supported by a framework that adds high speed interconnect IP to build large custom chips as well as chiplets. ARM says it has customers at all three major foundries – Intel, Samsung and TSMC – for bot custom chips and chiplets using the V3 and N3 CSS designs.

The cores have been optimised for specific workloads in the data centre for high performance computing (HPC), says Dermot O’Driscoll, vice president of product solutions at ARM.  

The optimisations include larger 2Mbit private L2 memory caches for the N3 and improved branch prediction and cache management for particular workloads in the data centre.

“Companies want the microarchitecture optimised for specific workloads and ARM engages at a fundamentally deeper level than any other company in the industry so this co-design goes beyond the chip and requires looking at the entire platform with memory and IO,” he said.

Smallest version of the N3 CSS has eight cores and a thermal envelope of 40W, a small fraction of most datacentre chips.

The V3 CSS allows 128 cores per socket with two chips with private L2 caches up to 3MBits high speed HBM3 memory and PCI Express Gen5 interconnect. It also reduces the mean time between failure (MTBF) and improves overall reliability by providing telemetry technology such as branch record buffer Extension (BRBE).

The CSS N3 is built on top of Neoverse S3 system IP which includes the CMN S3 coherent mesh network, system memory management unit MMU S3, interrupt controller NOC S3. CSS N3 also includes system management and local control processors, with CPU and System IP co-design and co-development for optimized PPA and system-level feature enablement.

The Neoverse CSS N3 also enables chiplet-based designs with support for the UCIe die-to-die connection standard, together with ARM’s new AMBA CHI C2C protocol. This is important as Nvidia’s Grace Hopper chiplet uses the previous Neoverse N2 core.

The N3 offers a wide range of cache configurations catering to different compute scenarios. Many scale-out cloud data analytics and database applications benefit from larger cache closer to the core, so ARM is adding the 2MB L2 cache option. The 1MB L2 cache option offers a good performance and area tradeoff for general-purpose compute in variety of tasks ranging from 5G/6G wireless infrastructure, enterprise networking, DPU and SmartNIC, to hyperscale server; while the minimal 32KB L1 and 128KB option is suitable for workloads that are not cache sensitive, yet still demanding respected computing power in a small footprint.

The Neoverse S3 chassis is the third generation of infrastructure-specific system IP for SoCs and chiplets. This includes ARM’s “Real Management Extension” (RME) extensions  for Confidential Compute with device assignment and is aligned with industry standard DPE for ‘in use’ data protection as well as support for PCIe Gen6, CXL 3.1, DDR5 and HBM3 and the standardized chiplet interfaces with AMBA CHI C2C over UCIe with a defined chiplet development kit.

ARM has performed interoperability testing with key third party IPs such as controllers and PHYs for the interconnect and is also looking to sell its Cortex-X CPU into custom datacentre designs but without the CSS infrastructure.


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles