
5nm Telum II processor for IBM AI
IBM has developed its next generation Telum II Processor and Spyre Accelerator for large language models and generative AI on Samsung’s 5nm technology.
IBM has developed a scalable I/O sub-system designed to reduce energy consumption and data centre footprint for the chips, which follow the original Telum AI inference chip in 2021..
The eight core Telum II will be used in IBM Z mainframes running at up to 5.5GHz, with 36MB L2 cache per core and a 40% increase in on-chip cache capacity for a total of 360MB. The virtual level-4 cache of 2.88GB per processor drawer provides a 40% increase over the previous generation. The integrated AI accelerator allows for low-latency 24TOPS of AI inferencing, for example enhancing fraud detection during financial transactions, and provides a fourfold increase in compute capacity per chip over the previous generation.
The processor also has an integrated AI accelerator core as well as a coherently attached Data Processing Unit (DPU) for the I/O.
IBM adopts Neureality for AI inference
The Spyre accelerator chip is attached to the Tellum II via a 75-watt PCIe adapter and is based on technology developed in collaboration with IBM Research. It can access up to 1TB of memory, built to work in tandem across the eight cards of a regular IO drawer, to supports AI model workloads across the mainframe while designed to consume no more than 75W per card. Each chip will have 32 compute cores supporting int4, int8, fp8, and fp16 datatypes for both low-latency and high-throughput AI applications.
With many generative AI projects leveraging Large Language Models (LLMs) moving from proof-of-concept to production, the demands for power-efficient, secured and scalable solutions have emerged as key priorities.
“Our robust, multi-generation roadmap positions us to remain ahead of the curve on technology trends, including escalating demands of AI,” said Tina Tarquinio, VP, Product Management, IBM Z and LinuxONE. “The Telum II Processor and Spyre Accelerator are designed to deliver high-performance, secured, and more power efficient enterprise computing solutions. After years in development, these innovations will be introduced in our next generation IBM Z platform so clients can leverage LLMs and generative AI at scale.”
Both chips will be manufactured by IBM’s long-standing fabrication partner, Samsung Foundry on a 5nm process and systems using the chips are expected in 2025.
