Oracle Boasts Zettascale 'AI Supercomputer,' Just Dont Ask About Precision

Wednesday September 11, 2024.

Comment Oracle says it's already taking orders on a 2.4 zettaFLOPS cluster with "three times as many GPUs as the Frontier supercomputer."

But let's get precise about precision: Oracle hasn't actually managed a 2,000x performance boost over the United States' top-ranked supercomputer — those are "AI zettaFLOPS" and the tiniest, sparsest ones Nvidia's chips can muster versus the standard 64-bit way of measuring super performance.

This might be common knowledge for most of our readers, but it seems Oracle needs a reminder: FLOPS are pretty much meaningless unless they're accompanied by a unit of measurement. In scientific computing, FLOPS are usually measured at double precision or 64-bit.

However, for the kind of fuzzy math that makes up modern AI algorithms, we can usually get away with far lower precision, down to 16-bit, 8-bit, or in the case of Oracle's new Blackwell cluster, 4-bit floating points.

The 131,072 Blackwell accelerators that make up Oracle's "AI supercomputer" are in fact capable of churning out 2.4 zettaFLOPS of sparse FP4 compute, but the claim is more marketing than reality.

That's because most models today are still trained at 16-bit floating point or brain float precision. But at that precision, Oracle's cluster only manages about 459 exaFLOPS of AI compute, which just doesn't have the same ring to it as "zettascale".

Note that there's nothing technically stopping you from training models at FP8 or even FP4, but doing so comes at the cost of accuracy. Instead, these lower precisions are more commonly used to speed up the inferencing of quantized models, a scenario where you'll pretty much never need all 131,072 chips even when serving up a multi-trillion parameter model.

What's funny is that if Oracle can actually network all those GPUs together with RoCEv2 or InfiniBand, we're still talking about a pretty beefy HPC cluster. At FP64, the peak performance of Oracle's Blackwell Supercluster is between 5.2 and 5.9 exaFLOPS, depending on whether we're talking about the B200 or GB200. That's more than 3x that of the AMD Frontier system's peak performance.

Interconnect overheats being what they are, we'll note that even getting close to peak performance as these scales is next to impossible.

Oracle already offers H100 and H200 superclusters capable of scaling to 16,384 and 65,536 GPUs, respectively. Blackwell-based superclusters, including Nvidia's flagship GB200 NVL72 rack-scale systems, will be available beginning in the first half of 2025. ®

Oracle Boasts Zettascale 'AI Supercomputer,' Just Dont Ask About Precision

From Chip War To Cloud War: The Next Frontier In Global Tech Competition

The High Stakes Of Tech Regulation: Security Risks And Market Dynamics

The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics

The Data Crunch In AI: Strategies For Sustainability

Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser

LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue

navigation

media coverage

follow us