Intel launches its chiplet-based CPU and GPU
Intel has finally launched its long delayed chiplet-based processor codenamed Sapphire Rapids and Ponte Vecchio GPU graphics processor.
The CPU, built on the Intel 7 process technology and now renamed the 4th generation Xeon, adds AI accelerator blocks (see below) and is being used by data centre operators and board and server makers, as well as supercomputer operators. There is even one version for the Internet of Things at a fraction of the price of the high end CPUs.
The CPUs all support the latest PCIexpress 5.0 interconnect and focus on higher power efficiency, saving up to 70W in power-saving mode in the data centre. This helps reduce the cost of cooling and air conditioning.
Range
The range varies from the parts optimised for HPC at $12,000 with 64 gigabytes of high bandwidth memory (HBM2e) memory to 5G parts at $1200 and the 4410T with a power envelope of 150W and a long lifetime for the IoT at $624.
- Ponte Vecchio 3D supercomputer processor uses five processes
- Intel shows engineering silicon of its biggest ever ‘chip’
- SiPearl, Intel team for supercomputer GPU
The launch is important for upgrading the installed base of over 100m Xeon processor already out in the market. These are used from on-site servers running IT services, including new as-a-service business models, to networking equipment managing Internet traffic, to wireless base station computing at the edge, to cloud services.
Intel has also teamed up with European chip designer SiPearl to use the GPU, now called the Data Center GPU Max Series, in supercomputer designs. This has over 100bn transistors in a package in 47 different tiles and up to 128Gbytes of memory. This is not available as an individual chip but as a PCIe cards and modules.
The 1100 GPU is a 300-watt double-wide PCIe card with 56 Xe cores and 48 GB of HBM2E memory. Multiple cards can be connected via Intel Xe Link bridges. The 1350 is a 450-watt OAM module with 112 Xe cores and 96 GB of HBM, while the 1550 is the maximum performance 600- watt OAM module with 128 Xe cores and 128 GB of HBM. There is also a subsystem with x4 GPU OAM carrier board and Intel Xe Link to enable multi-GPU communication within the subsystem.
Xeon customers
Customers for the CPU include AWS, Cisco, Cloudera, CoreWeave, Dell Technologies, Dropbox, Ericsson, Fujitsu, Google Cloud, Hewlett Packard Enterprise, IBM Cloud, Inspur Information, IONOS, Lenovo, Los Alamos National Laboratory, Microsoft Azure, Oracle Cloud, OVHcloud, phoenixNAP, RedHat, SAP, SuperMicro, Telefonica and VMware.
Nvidia is also working with Intel to use the CPU alongside its Hopper H100 GPU for exascale AI processing in data centres.
Chiplets are a key technology for Intel in its development of devices with a trillion transistors in and package. It is also planning to build a plant in Italy to assemble the technology.
“The launch of 4th Gen Xeon Scalable processors and the Max Series product family is a pivotal moment in fueling Intel’s turnaround, reigniting our path to leadership in the data center and growing our footprint in new arenas,” said Sandra Rivera, Intel executive vice president and general manager of the Data Centre and AI Group.
“Intel’s 4th Gen Xeon and the Max Series product family deliver what customers truly want – leadership performance and reliability within a secure environment for their real-world requirements – driving faster time to value and powering their pace of innovation.”
- Intel prepares for trillion transistor era shake up
- Intel picks Italian site for chiplet plant
- Intel plans 3nm chiplet for satellite terminal
The 4th Gen Xeon and the Intel Max Series product family were the first designed for the oneAPI architecture that integrates CPU and GPU with an open software ecosystem.
AI accelerators
Advanced Matrix Extensions (AMX) include fine-tuning and small and medium deep learning training models. Intel AMX is a built-in accelerator that improves the performance of deep learning training and inference. It is aimed at workloads like natural language processing, recommendation systems and image recognition.
QuickAssist Technology (QAT) offloads the encryption, decryption and compression to a built-in accelerator to help free up processor cores so systems can serve a larger number of clients or use less power.
Data Streaming Accelerator (DSA) drives high performance for storage, networking and data-intensive workloads by improving streaming data movement and transformation operations. Designed to offload the most common data movement tasks that cause overhead in data center-scale deployments, DSA helps speed up data movement across the CPU, memory and caches, as well as all attached memory, storage and network devices.
Dynamic Load Balancer (DLB) helps improve system performance related to handling network data on multicore Xeon Scalable processors. It enables the efficient distribution of network processing across multiple CPU cores/threads and dynamically distributes network data across multiple CPU cores for processing as the system load varies. DLB also restores the order of networking data packets processed simultaneously on CPU cores.
In-Memory Analytics Accelerator (IAA) helps run database and analytics workloads faster, with potentially greater power efficiency. This built-in accelerator increases query throughput and decreases the memory footprint for in-memory database and big data analytics workloads.
Advanced Vector Extensions 512 (AVX-512) is the latest x86 vector instruction set, with up to two fused-multiply add (FMA) units and other optimizations to accelerate performance for demanding computational tasks, including scientific simulations, financial analytics, and 3D modeling and analysis.
There is also a variant of AVX-512 for virtualized radio access network (vRAN) to provide greater capacity at the same power envelope for vRAN workloads. This helps communications service providers increase their performance per watt to meet critical performance, scaling and energy efficiency requirements.
Crypto Acceleration reduces the impact of implementing pervasive data encryption and increases the performance of encryption-sensitive workloads, like secure sockets layer (SSL) web servers, 5G infrastructure and VPNs/firewalls.
Speed Select Technology (SST) grants more active and expansive control over CPU performance to improve server utilization and reduced qualification costs by allowing customers to configure a single server to match fluctuating workloads.
Data Direct I/O Technology (DDIO) enables direct communication between Intel Ethernet controllers and adapters and host processor cache. Eliminating frequent visits to main memory can help reduce power consumption, provide greater I/O bandwidth scalability and reduce latency.
Software Guard Extensions (SGX) supports confidential computing to improve the isolation of sensitive data with enhanced hardware-based memory protections. Support for Intel SGX on the Xeon CPU Max Series is on DDR flat mode only.
Trust Domain Extension (TDX) is a new capability available through select cloud providers in 2023 that offers increased confidentiality at the virtual machine (VM) level, enhancing privacy and control over data. Within an Intel TDX confidential VM, the guest OS and VM applications are isolated from access by the cloud host, hypervisor and other VMs on the platform.
Control-Flow Enforcement Technology (CET) provides enhanced hardware-based protections against return-oriented and jump/call oriented programming attacks, two of the most common software-based attack techniques. Using this technology helps shut down an entire class of system memory attacks that long evaded software-only solutions.