Nvidia details GPU roadmap with Rubin HBM4 memory
Nvidia has revealed a little more of its chip roadmap ahead of disclosing details of its Blackwell GPU architecture at the Hot Chips conference next week in the US.
The company is planning a version with 288Gbytes of HBM3e memory, called Blackwell Ultra, alongside a new CPU called Vera in 2026 and a new GPU architecture called Rubin in 2025 using next generation HBM4 memory and a version with even more memory called Rubin Ultra.
“Of course, in Blackwell Ultra will also be increasing the amount of compute. We’re not going to disclose how much, but we do have more compute capability coming in Blackwell Ultra,” said Dave Salvator director of accelerating computing at Nvidia.
“And then as we move into 2026, we’ll be getting into our Ruben architecture and then of course, Ruben Ultra little further out in time, right on the CPU front, we’ll be making our way from Grace, which is our current generation towards the Vera CPU architecture in the 2026 time frame and then you see some of the cadencing on our network products as well, which is also again important because we are at the end of the day a data centre platform company and it takes all these components to make that platform happen.”
The JEDEC Solid State Technology Association is nearing completion of the HBM4 standard for high performance memory. HBM4 is set to introduce a doubled channel count per stack compared to HBM3, with a larger physical footprint. To support device compatibility, the standard ensures that a single controller can work with both HBM3 and HBM4 if needed. Different configurations will require various interposers to accommodate the differing footprints. HBM4 will specify 24 Gb and 32 Gb layers, with options for supporting 4-high, 8-high, 12-high and 16-high TSV stacks. The committee has initial agreement on speeds bins up to 6.4 Gbps with discussion ongoing for higher frequencies.
Senior engineers at Nvidia will present the latest advancements powering the Blackwell platform, plus research on liquid cooling for data centres and AI agents for chip design.
The GB200 NVL72 liquid cooled rack system connects 72 Blackwell GPUs and 36 Grace CPUs.
Ajay Tirumala and Raymond Wong, directors of architecture at Nvidia, will provide a first look at the platform and explain how these technologies work together to deliver a new standard for AI and accelerated computing performance while improving energy efficiency.
Ali Heydari, director of data centre cooling and infrastructure at NVIDIA, will present several designs for hybrid-cooled data centres.
Some designs retrofit existing air-cooled data centers with liquid-cooling units, offering a quick and easy solution to add liquid-cooling capabilities to existing racks. Other designs require the installation of piping for direct-to-chip liquid cooling using cooling distribution units or by entirely submerging servers in immersion cooling tanks. Although these options demand a larger upfront investment, they lead to substantial savings in both energy consumption and operational costs.
Heydari will also share his team’s work as part of COOLERCHIPS, a U.S. Department of Energy program to develop advanced data centre cooling technologies. As part of the project, the team is using the Omniverse platform to create physics-informed digital twins that will help them model energy consumption and cooling efficiency to optimize their data center designs.