An accelerated compute server equipped with accelerators such as a GPU, FPGA, or custom ASIC can generally handle AI workloads with much greater efficiency than a general purpose CPU but still represent only a fraction of Cloud service providers’ overall server footprint.
These accelerators can cost ten times a general-purpose server but are becoming a substantial portion of the data centre capex, says analyst Baron Fung.
Leading cloud service providers are increasing their spending on new infrastructure tailored for AI workloads. For example, Facebook is planning to increase capex by more than 50 percent in 2022 with investment in AI and machine learning to improve ranking and recommendations. In the longer term, as the company shifts its business model to the metaverse, capex investments will be driven by video and compute-intensive applications such as AR and VR.
Cloud service providers such as Amazon, Google, and Microsoft are also planning to increase spending on AI-focused infrastructure to enable their enterprise customers to deploy applications with enhanced intelligence and automation says Fung.
New architectures are key to the plans.
Intel is scheduled to launch its next-generation Sapphire Rapids processor next year. With its AMX (Advanced matrix Extension) instruction set, Sapphire Rapids is optimized for AI and ML workloads. The CXL memory bus, which will be offered with Sapphire Rapids for the first time, will establish a memory-coherent, high-speed link PCIe Gen 5 interface between the host CPU and accelerators. This, in turn, will reduce system bottlenecks by enabling lower latencies and more efficient sharing of resources across devices. Intel also has deals to use its Ponte Vecchio discrete GPU for data centre applications.
AMD is expected to also offer CXL on its EPYC Genoa processor. For ARM, competing coherent interfaces will also be offered, such as CCIX with Ampere’s Altra processor and NVlink on Nvidia’s upcoming Grace processor.
AI applications are bandwidth hungry. For this reason, the fastest networks available would need to be deployed to connect host servers to accelerated servers to facilitate the movement of large volumes of unstructured data and training models between the host CPU and accelerators, andamong accelerators in a high-performance computing cluster.
Some Tier 1 Cloud service providers are deploying 400 Gbps Ethernet networks and beyond. The network interface card (NIC) must also evolve to ensure that server connectivity is not inhibited as data sets become larger. 100 Gbps NICs have been the standard server access speed for most accelerated compute servers.
Most recently, however, 200 Gbps NICs are increasingly used with these high-end workloads, especially by Tier 1 Cloud service providers. Some vendors have added an additional layer of performance by integrating accelerated compute servers with Smart NICs or Data Processing Units (DPUs). For example, Nvidia’s DGX system could be configured with two Bluefield-2 DPUs to facilitate packet processing of large datasets and provide multi-tenant isolation.
Accelerated compute servers, generally equipped with four or more GPUs, tend to be power hungry. For example, an Nvidia DGX system with 8 A100 GPUs has a maximum system power usage rated at 6.5kW.
As rack power densities are on the rise to support accelerated computing hardware, air-cooling efficiencies and limits are being reached. Novel liquid-based, thermal management solutions, including immersion cooling, are under development to further enhance the thermal efficiencies of accelerated compute servers.
The power consumption has also been key considerations for the design of GraphCore’s AI Colossus chip and IPU-PoD racks that are being adopted in data centres.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.