Technology conglomerate Meta Platforms has announced its next-generation GPU-powered AI platform for data center scale machine learning and inference. Compared to the company’s previous generation Zion EX platform, the Grand Teton system packs in more memory, network bandwidth and compute capacity, says the company.
The company announced the platform at this year’s Open Compute Project (OCP) Global Summit.
“Today, some of the greatest challenges our industry is facing at scale are around AI,” says Alexis Bjorlin, VP, Engineering at Meta. “How can we continue to facilitate and run the models that drive the experiences behind today’s innovative products and services? And what will it take to enable the AI behind the innovative products and services of the future? As we move into the next computing platform, the metaverse, the need for new open innovations to power AI becomes even clearer.”
AI models are used extensively across Facebook for services such as news feed, content recommendations and hate-speech identification, among many other applications. Named after the 13,000-foot mountain that crowns one of Wyoming’s two national parks, Grand Teton uses Nvidia H100 Tensor Core GPUs to train and run AI models that are rapidly growing in their size and capabilities, requiring greater compute.
Grand Teton sports 2x the network bandwidth and 4x the bandwidth between host processors and GPU accelerators compared to Meta’s prior Zion system, says the company. The added network bandwidth enables Meta to create larger clusters of systems for training AI models. It also packs more memory than Zion to store and run larger AI models.
While the Zion-EX platform consisted of multiple connected subsystems, Grand Teton unifies those components in a single hardware chassis. It comprises a single box with integrated power, compute, and fabric interfaces, resulting in better performance, signal integrity, and thermal performance. The design, says the company, makes datacenter integration easier and enhances reliability.
Grand Teton has been engineered to better handle memory-bandwidth-bound workloads like deep learning recommendation models (DLRMs), which can require a zetaflop of compute power just to train. It’s also optimized for compute-bound workloads like content understanding.