We Heard You Like HBM – Nvidia's Blackwell Ultra GPUs Will Have 288 GB Of It

GTC Nvidia's Blackwell GPU architecture is barely out of the cradle – and the graphics chip giant is already looking to extend its lead over rival AMD with an Ultra-themed refresh of the technology.
Announced on stage at Nvidia's GPU Technology Conference (GTC) in San Jose, California, on Tuesday by CEO and leather jacket aficionado Jensen Huang, the Blackwell Ultra family of accelerators boasts up to 15 petaFLOPS of dense 4-bit floating-point performance and up to 288 GB of HBM3e memory per chip.
And if you're primarily interested in deploying GPUs for AI inference, that's a bigger deal than you might think. While training is generally limited by how much compute you can throw at the problem, inference is primarily a memory-bound workload. The more memory you have, the bigger the model you can serve.
According to Ian Buck, Nvidia veep of hyperscale and HPC, the Blackwell Ultra will enable reasoning models including DeepSeek-R1 to be served at 10x the throughput of the Hopper generation, meaning questions that previously may have taken more than a minute to be answered can now be done in as little as ten seconds.
With 288 GB of capacity across eight stacks of HBM3e memory onboard, a single Blackwell Ultra GPU can now run substantially larger models. At FP4, Meta's Llama 405B could fit on a single GPU with plenty of vRAM left over for key-value caches.
To achieve this higher capacity, Nvidia's Blackwell Ultra swapped last-gen's eight-high HBM3e stacks for fatter 12-high modules, boosting capacity by 50 percent. However, we're told that memory bandwidth remains the same at a still class-leading 8 TB/s.
If any of this sounds familiar, this isn't the first time we've seen Nvidia employ this strategy. In fact, Nv is following a similar playbook to its H200, which was essentially just an H100 with faster, higher-capacity HBM3e onboard. However, this time around, with these latest Blackwells, Nvidia isn't just strapping on more memory, it's also juiced the peak floating-point performance by 50 percent – at least for FP4 anyway.
Nvidia tells us that FP8 and FP16/BF16 performance is unchanged from last gen.
More memory, more compute, more 'GPUs'
While many have fixated on Nvidia's $30,000 or $40,000 chips, it's worth remembering that Hopper, Blackwell, and now its Ultra refresh aren't one chip so much as a family of products ranging the gamut from PCIe add-in cards and servers to rack-scale systems and even entire supercomputing clusters.
In the datacenter, Nvidia will offer Blackwell Ultra in both its more traditional HGX servers and its rack-scale NVL72 offerings.
Nvidia's HGX form factor has, at least for the past few generations, featured up to eight air-cooled GPUs stitched together by a high-speed NVLink switch fabric. However, this time it a new config it’s calling the B300 NVL16, which might sound like they’re cramming twice as many GPUs into a box, something they’re previously done with the HGX V100. In reality, Nvidia changed its mind and has decided to count the individual compute dies on the package as GPUs.
According to Nvidia, the Blackwell-based B300 NVL16 will deliver 7x the compute and 4x the memory capacity of its Hopper generation, which we’ve learned refers to its 80GB H100s and not the higher capacity H200 systems. By our calculation works out to 112 petaFLOPS of dense FP4 compute or about 7 petaFLOPS of dense FP4 per GPU die, or 14 petaFLOPS per SXM module. That’s quite a performance uplift, with each B300 die performing on par with the Blackwell B100-series chips announced last year.
Nvidia does appear to have done a fair bit of rounding with its memory claims. By our calculation the HGX B300 systems actually deliver closer to 3.6x more memory at 2.3 TBs versus 640GB of the HGX H100.
For even larger workloads, Nvidia will also offer the accelerators in its Superchip form-factor. Unlike last year's GB200, the GB300 Superchip will pair four Blackwell Ultra GPUs with a combined 1,152GB of HBM3e memory with two 72-core Grace Arm-compatible CPUs.
Up to 18 of these Superchips can be stitched together using Nvidia's NVLink switches to form an NVL72 rack-scale system. But rather than the 13.5 terabytes of HBM3e of last year's model, the Grace-Blackwell GB300-based systems will offer up to 20 terabytes of vRAM. What's more, Buck says the system has been redesigned for this generation with an eye toward improved energy efficiency and serviceability.
And if that's still not big enough, eight of these racks can be combined to form a GB300 SuperPOD system containing 576 Blackwell Ultra GPUs and 288 Grace CPUs.
Where does this leave Blackwell?
Given its larger memory capacity, it'd be easy to look at Nvidia's line-up and question whether Blackwell Ultra will end up cannibalizing shipments of the non-Ultra variant.
However, the two platforms are clearly aimed at different markets, with Nvidia presumably charging a premium for its Ultra SKUs.
In a press briefing ahead of Huang's keynote address today, Nvidia's Buck described three distinct AI scaling laws, including pre-training scaling, post-training scaling, and test-time scaling, each of which require compute resources to be applied in different ways.
At least on paper, Blackwell Ultra's higher memory capacity should make it well suited to the third one of these regimes, as they allow customers to either serve up larger models – AKA inference – faster or at higher volumes.
Meanwhile, for those building large clusters for compute-bound training workloads, we expect the standard Blackwell parts to continue to see strong demand. After all, there's little sense in paying extra for memory you don't necessarily need.
With that said, there's no reason why you wouldn't use a GB300 for training. Nvidia tells us the higher HBM capacity and faster 800G networking offered by its ConnectX-8 NICs will contribute to higher training performance.
- DeepSeek-R1-beating perf in a 32B package? El Reg digs its claws into Alibaba's QwQ
- AI bubble? What AI bubble? Datacenter investors all in despite whispers of a pop
- AI running out of juice despite Microsoft's hard squeezing
- Nvidia won the AI training race, but inference is still anyone's game
Competition
With Nvidia's Blackwell Ultra processors expected to start trickling out sometime in the second half of 2025, this puts it in contention with AMD's upcoming Instinct MI355X accelerators, which are in an awkward spot. We would say the same about Intel's Gaudi3 but that was already true when it was announced.
Since launching its MI300-series GPUs in late 2023, AMD's main point of differentiation was that its accelerators had more memory (192 GB and later 256 GB) than Nvidia's (141 GB and later 192 GB), making them attractive to customers, such as Microsoft or Meta, deploying large multi-hundred- or even trillion-parameter-scale models.
MI355X will also see AMD juice memory capacities to 288 GB of HBM3e and bandwidth to 8 TB/s. What's more, AMD claims the chips will close the gap considerably, promising floating-point performance roughly on par with Nvidia's B200.
However, at a system level, Nvidia’s new HGX B300 NVL16 systems will offer the same amount of memory, and significantly higher FP4 floating-point performance. If that weren't enough, AMD's answer to Nvidia's NVL72 is still another generation away with its forthcoming MI400 platform.
This may explain why, during its last earnings call, AMD CEO Lisa Su revealed that her company planned to move up the release of its MI355X from late in the second half to the middle of the year. Team Red also has the potential to undercut its rival on pricing and availability, a strategy it's used to great effect in its ongoing effort to steal share from Intel. ®
Updated at 15.55 UTC on March 19, 2025, to add
This article was updated to clarify the memory configuration of the HGX B300 NVL16. We had asked Nvidia about this following our earlier chat with them, and it transpires the silicon goliath changed the definition of a GPU and NVL naming convention for the B300 but confusingly not the GB300.
Never mind that, our friends over at The Next Platform have more here on Nvidia's roadmap to 2028
From Chip War To Cloud War: The Next Frontier In Global Tech Competition
The global chip war, characterized by intense competition among nations and corporations for supremacy in semiconductor ... Read more
The High Stakes Of Tech Regulation: Security Risks And Market Dynamics
The influence of tech giants in the global economy continues to grow, raising crucial questions about how to balance sec... Read more
The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics
Instagram has become a dominant force in shaping interior design trends, offering a seemingly endless stream of inspirat... Read more
The Data Crunch In AI: Strategies For Sustainability
Exploring solutions to the imminent exhaustion of internet data for AI training.As the artificial intelligence (AI) indu... Read more
Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser
After four years of dedicated effort, Google has decided to abandon its plan to remove third-party cookies from its Chro... Read more
LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue
In an effort to tackle slowing revenue growth and enhance user engagement, LinkedIn is turning to artificial intelligenc... Read more