Samsung has developed in-memory processing to boost the performance of AI systems in data centres using the latest interconnect standards.
The HBM-PIM (high bandwidth memory, processing in memory) chips are being used in AMD’s instinct Mi100 AI accelerator. Samsung then developed an HBM-PIM Cluster with 96 Mi100 cards and applied it to various large-scale AI and High-Performance Computing (HPC) applications using 200Gbit/s Infiniband switches.
Compared to existing GPU accelerators, tests showed that, on average, the addition of HBM-PIM improved performance by more than double and energy consumption reduced by more than 50%.
For the latest AI models, accuracy tends to have a direct correlation with volume size, which points to a major hurdle. With existing memory solutions, computing this amount of data can be bottlenecked if the DRAM capacity and bandwidth for data transference are not adequately supported for Hyperscale AI models.
If a large capacity language model proposed by Google is trained on a cluster consisting of 8 accelerators, using a GPU accelerator equipped with HBM-PIM can save 2,100 GWh of energy per year and cut down 960 thousand tons of carbon emissions.
With software integration, pairing commercially available GPUs with HBM-PIM can reduce the bottleneck caused by memory capacity and bandwidth limitations in Hyperscale AI data centres.
Samsung has developed software using SYCL, an open software standard, to define specifications that can use GPU accelerators. With this software, customers will be able to use PIM memory solutions in an integrated software environment. Codeplay, recently acquired by Intel, is a key developer of SYCL.
“Codeplay is proud to have been deeply involved in defining the SYCL standard and playing a role in creating the first conformant product.” said Charles Macfarlane, Chief Business Officer for at Codeplay Software, and the one in charge of working together on the SYCL standardization. “Our work with Samsung in simplifying software development via Samsung’s PIM systems opens up a much greater ecosystem of tools for scientists, allowing them to focus on algorithm development rather than hardware-level details.”
- The drivers behind Intel’s Scottish software deal
- Software defined memory drives IP maker into chips
The other strand in Samsung’s development is to use the CXL (Compute Express Link) open standard for high-speed processer to device and processer to memory interface which allows for more efficient use of memory and accelerators used with processors.
CXL can be utilized in conjunction with other technologies such as Processing-near-Memory (PNM) to help facilitate memory capacity expansion.
Like PIM, this reduces data movement between CPU and memory by using memory for data calculation. In the case of PNM, calculation functions are performed closer to the memory to reduce the bottleneck that occurs between the CPU and memory data transference.
Samsung launched PNM technology with CXL earlier this month for high-capacity AI model processing. In testing, CXL interface-based PNM systems double performance in applications such as recommendation systems or in-memory databases that require high memory bandwidth.
Data used in AI models are classified into dense data and sparse data according to their characteristics. Dense data happens when the ratio of valid data is high inside the whole data cluster and thus dense while connected and sparse data has a low ratio of valid data.
AI applications such as autonomous driving and voice recognition fall into dense data category, and user-based recommendation algorithms (Facebook friend recommendation) are examples of sparse data. Each model requires use-specific memory solutions matched to the application.
Samsung has applied PIM technology to AI models based on dense data and PNM technology for AI models based on sparse data.
“HBM-PIM Cluster technology is the industry’s first customized memory solution for large-scale artificial intelligence.” said Cheolmin Park, head of the New Business Planning Team at Samsung Electronics Memory Business Division.
“By integrating CXL-PNM solutions with HBM-PIM through comprehensive software standardization process, we can offer a new standard of high-efficiency, high-performance memory solutions that can contribute to eco-conscious data management by reducing and optimizing the movement of massive data volumes needed for AI applications.”
“We are very interested in applying computational memory techniques to address the memory bandwidth and power efficiency challenges common in many of our high-performance computing and AI applications.” added Jeffrey Vetter, Corporate Fellow and Section Head of Computer Science and Math Division at the Oak Ridge National Laboratory.
“We look forward to working with Samsung to evaluate how these developing technologies can be applied to Oak Ridge National Laboratory systems to enhance efficiency.”
Related CXL articles
- Chip makers back CXL 3.0 for data centre memor
- CXL smart memory controllers for data centres
- Marvell boosts CXL business with Tanzanite buy
Other articles on eeNews Europe
- Google builds embedded AI operating system in Rust
- TSMC heads below 1nm with 2D transistors at IEDM
- Philips layoffs as supply chain, recall bites