Following the mishap with the production of the latest Blackwell chip, Nivida’s chief executive says it is on track to ship the Blackwell Ultra chip later this year followed by its next generation processor and GPU, codenamed Vera and Rubin.
The slip in production did not hit the company results, which saw a record $130bn in revenue in 2024.
“We successfully and incredibly ramped up Grace Blackwell, delivering some $11 billion of revenues last quarter. We’re going to have to continue to scale as demand is quite high, and customers are anxious and impatient to get their Blackwell systems. We have a fairly large installation of Grace Blackwell for our own engineering and our own design teams and software teams,” said Jensen Huang, CEO of Nvidia.
A key part of the delivery is NV72 full rack systems with a choice of cooling. “There are 350 plants manufacturing the 1.5 million components that go into of the Blackwell racks, Grace Blackwell racks. It’s extremely complicated,” he said.
“Blackwell Ultra is second half. As you know, the first Blackwell was we had a hiccup that probably cost us a couple of months. We’re fully recovered, of course,” said Jense Huang, CEO of Nvidia. This uses 288Gbytes of HBM3e high speed memory.
Blackwell Ultra will have new networking, new memories, and new processors, he says, and that is coming online in the second half of 2025. Details of this, as well as Vera and Rubin and the latest Drive Thor chips are expected next month.
“We have been working with all of our partners and customers, laying this out,” he said. “They have all of the necessary information, and we’ll work with everybody to do the proper transition. This time between Blackwell and Blackwell Ultra, the system architecture is exactly the same. It’s a lot harder going from Hopper to Blackwell because we went from an NVLink 8 system to an NVLink 72-based system.
“So, the chassis, the architecture of the system, the hardware, the power delivery, all of that had to change. This was quite a challenging transition. But the next transition will slot right in Blackwell Ultra will slot right in. We’ve also already revealed and been working very closely with all of our partners on the click after that that is called Vera Rubin and all of our partners are getting up to speed on the transition of that and so preparing for that transition. And again, we’re going to provide a big, huge step-up.”
New models that cost less to train such as DeekSeek are driving the demand for inference
“Customers are racing to scale infrastructure to train the next generation of cutting-edge models and unlock the next level of AI capabilities. With Blackwell, it will be common for these clusters to start with 100,000 GPUs or more. Shipments have already started for multiple infrastructures of this size,” said Collette Kress, CFO at Nvidia.
“Blackwell was architected for reasoning AI inference. Blackwell supercharges reasoning AI models with up to 25x higher token throughput and 20x lower cost versus Hopper 100. The NVLink delivers 14x the throughput of PCIe Gen 5, ensuring the response time, throughput, and cost efficiency needed to tackle the growing complexity of scale.”
This range of inference needs is boosting the drive for programmable GPUs and the specification of the next generation Vera and Rubin chips, says Huang. “We designed Blackwell with the idea of reasoning models in mind.”
“There are now multiple scaling laws. There’s the pre-training scaling law, and that’s going to continue to scale because we have multimodality, we have data that came from reasoning that are now used to do pretraining. And then the second is post-training skilling, using reinforcement learning human feedback, reinforcement learning AI feedback, reinforcement learning, verifiable rewards.”
“The amount of computation you use for post-training is actually higher than pretraining. And it’s kind of sensible in the sense that you could, while you’re using reinforcement learning, generate an enormous amount of synthetic data or synthetically generated tokens. AI models are basically generating tokens to train AI models. And that’s post-trade.
“And the third part, this is the part that you mentioned is test-time compute or reasoning, long thinking, inference scaling. They’re all basically the same ideas. And there, you have a chain of thought, you have search. The amount of tokens generated the amount of inference compute needed is already 100x more than the one-shot examples and the one-shot capabilities of large language models in the beginning.”
And that’s just the beginning. This is just the beginning. The idea that the next generation could have thousands of times and even, hopefully, extremely thoughtful and simulation-based and search-based models that could be hundreds of thousands, millions of times more compute than today is in our future. And so, the question is, how do you design such an architecture?”
“So, it’s hard to figure out what is the best configuration of a data center, which is the reason why Nvidia’s architecture is so popular. We run every model. We are great at training. The vast majority of our compute today is actually inference and Blackwell takes all of that to a new level.”
“So, we’re seeing, in fact, much, much more concentration of a unified architecture than ever before.”
