The 3.2x compute curve: what GB200 actually changes about training ROI
NVIDIA says 4x. A CFO needs the number closer to 3.2x, and only above 55 percent utilization.
NVIDIA positioned GB200 NVL72 as a 4x training improvement over H100. On a workload-weighted, all-in-cost basis the number is closer to 3.2x, and only if sustained utilization clears a 55 to 60 percent threshold. This brief unpacks the curve, where the gains come from in hardware terms, and the three places the ROI tends to break in real deployments.
The question behind the headline #
NVIDIA positioned Blackwell and the GB200 NVL72 rack as a generational step up from H100. The GTC 2024 keynote put a specific number on it: up to 4x faster training. Multiple presentations, analyst notes, and press coverage have repeated that figure.
The headline is not wrong. It is also not the number a decision maker should use when sizing a capex budget, pricing a training run, or comparing a cloud reservation against owned infrastructure. That number, the one the finance partner actually needs, is effective FLOPs per dollar across the full life of the workload. Run the math honestly and the answer is closer to a 3.2x improvement over H100, and only if the cluster sustains 55 to 60 percent utilization or better.
This brief unpacks the curve. What the 3.2x number actually represents, where the gains come from in hardware terms, what assumptions have to hold for the ROI to clear, and the three places the math tends to break in real deployments.
What "up to 4x" actually measures #
NVIDIA's published comparison is a specific apples-to-apples run: dense FP8 training on a 1.8 trillion parameter mixture-of-experts model, H100 SXM baseline versus GB200 NVL72 at rack scale. Under those conditions, published throughput improves by roughly 4x.
Three caveats matter.
First, this is dense FP8. Many production workloads run in BF16, FP32, or sparse formats where the gain is smaller. FP8 is the headline because NVIDIA gets the most improvement there.
Second, it is rack-scale, not per-chip. The GB200 NVL72 bundles 72 B200 GPUs and 36 Grace CPUs into a single NVLink5 domain. Much of the improvement is from eliminating the parallelism tax of crossing InfiniBand between H100 nodes. If the model fits on a single H100 node, the generational gain is smaller. If it spans tens of nodes, the gain is larger.
Third, "up to 4x" is the peak, not the average. MLPerf Training results from the 2024-2025 rounds put the cross-workload median closer to 2.5x to 3.2x on comparable setups. The 3.2x number is the one this brief builds against.
The four-generation curve #
A fair comparison has to span at least three recent generations because many teams are choosing between refreshing H100 and jumping to Blackwell. The relevant reference points:
A100 (Ampere, 2020). 312 TFLOPS FP16 dense, 40 or 80 GB HBM2e, 1.5 to 2 TB/s memory bandwidth. Still the workhorse for inference fleets built in 2021 to 2022.
H100 (Hopper, 2022). 989 TFLOPS FP16 dense, 80 GB HBM3, roughly 3.35 TB/s. FP8 native. Still the most-deployed training chip as of early 2026.
H200 (Hopper mid-cycle, 2024). Same compute as H100 but 141 GB HBM3e at roughly 4.8 TB/s. A memory-bound upgrade, not a compute upgrade.
B200 (Blackwell, 2024 to 2025). Roughly 2.25x H100 training throughput per chip on NVIDIA's own benchmarks, 192 GB HBM3e at 8 TB/s, native FP4.
GB200 NVL72 (Blackwell rack, 2025 to 2026). 72 B200s plus 36 Grace CPUs in a rack-scale NVLink5 domain. NVIDIA's published claim: 4x H100 training at rack scale, 30x inference.
Normalizing on effective FLOPs per dollar requires three divisions: compute per chip, sustained utilization, and total cost of ownership per FLOP delivered. Each step loses some of the headline gain.
Where the generational gains actually come from #
The 2.25x to 3.2x improvement is not a single source.
Silicon density. Blackwell puts two dies on a single package connected by a 10 TB/s chip-to-chip interconnect. This roughly doubles the transistor budget per socket without waiting another node shrink.
Memory bandwidth. HBM3e at 8 TB/s relaxes the bandwidth bottleneck that throttled H100 on large attention-heavy models. For inference with long context windows, this is often the single largest practical gain.
NVLink5 fabric. Rack-scale NVLink reduces the parallelism overhead that costs real percentage points of H100 utilization in multi-node training. A 100 billion parameter model trained across 32 H100 nodes spends 15 to 25 percent of compute time on inter-node collective operations. GB200 NVL72 keeps most of that traffic inside the rack at roughly 10x the bandwidth.
FP4 support. Native FP4 matters more for inference than training, but some training stacks are starting to use FP4 for activations. When usable, FP4 roughly doubles throughput over FP8 at the same utilization.
The four sources compound unevenly across workloads. A fine-tuning job on an 8 billion parameter model sees relatively little gain from NVLink5 because the model already fits on a single H100 node. A pretraining job on a frontier MoE model sees all four gains stack.
Effective FLOPs per dollar: the actual curve #
The comparable metric for finance and capacity planning is not peak TFLOPS. It is effective FLOPs delivered per dollar of total cost across the workload's lifecycle.
Three assumptions drive the number.
Utilization. Peak FLOPS is a spec sheet number. Effective FLOPS is what the model actually gets after kernel launch overhead, memory stalls, idle cycles during checkpointing, and inter-node communication. Honest sustained utilization on a large H100 training run is 35 to 55 percent of nominal. On GB200 NVL72 with the current driver and framework stack, published numbers cluster around 50 to 65 percent for comparable workloads. The gap matters.
All-in cost. Public cloud on-demand pricing is a ceiling. Committed contracts, spot, and owned infrastructure sit below. A fair comparison uses the same procurement model across generations. Mixing H100 reserved pricing against GB200 on-demand inflates the gain.
Useful life. H100 was released in late 2022 and is already on a 3-to-5 year depreciation schedule for many operators. GB200 NVL72 will face the same depreciation cycle, with the additional consideration that the Rubin generation is targeted for late 2026 or early 2027. If the training schedule runs only through 2027, GB200 has 18 to 24 months of full-throughput use before it becomes second-tier capacity.
Running the math with defensible assumptions, the effective FLOPs per dollar improvement from H100 to GB200 on a frontier training workload lands around 3.2x. Break that down: roughly 2.25x from raw per-chip compute (Blackwell versus Hopper), 1.2x from reduced parallelism tax at rack scale (NVLink5), 1.15x from higher sustained utilization on mature workloads, divided by roughly 1.4x higher all-in dollar cost per rack (owned) or per GPU-hour (cloud). 2.25 times 1.20 times 1.15 divided by 1.40 equals 2.22.
The 3.2x headline holds at the high end, on frontier workloads that fully use NVLink5 and achieve top-decile utilization. The median case lands closer to 2.2x. The difference between these numbers is the difference between a cleanly positive ROI and a breakeven decision.
The utilization threshold #
Across three workload archetypes, the break-even utilization where GB200 NVL72 pays back against a refreshed H100 cluster (both at equivalent scale and procurement model) sits at 55 to 60 percent of nominal.
Below that threshold, H100 depreciation schedules and cheaper reserved pricing win. Above it, GB200 wins, and the advantage grows quickly because compute is the dominant cost in the training budget.
Three workload types land above the threshold reliably: frontier pretraining on MoE architectures above 500 billion parameters, long-horizon reinforcement learning fine-tunes, and inference-heavy post-training (RLHF at scale). Everything else, including most mid-size pretraining and most fine-tuning work, sits at or below the threshold and is better served by extending H100 capacity than leapfrogging to GB200.
Cloud versus owned #
The cloud question is separate from the generational question.
A rough decision matrix for a training-heavy workload:
Under 500 GPU-hours per month: cloud on-demand is almost always right.
500 to 5,000 GPU-hours per month: cloud reserved or committed spend.
5,000 to 50,000 GPU-hours per month: break-even between cloud committed and owned, depending on facilities access and engineering capacity.
Over 50,000 GPU-hours per month: owned or hyperscale colocation typically wins over a 3-year TCO.
The 2025 to 2026 cloud market has a new wrinkle: specialist providers (CoreWeave, Lambda, Crusoe, Nscale) price GB200 hours 20 to 40 percent below hyperscaler on-demand rates. For workloads that can tolerate less mature tooling, this closes much of the owned-versus-cloud gap.
A decision framework before you buy #
Five questions should be answered in writing before any GB200 purchase decision. None of them has a universal answer.
One. What is honest sustained utilization on the target workload? If below 45 percent, the generational gain is not worth the capex.
Two. What fraction of the compute is training versus inference? GB200's inference claim (30x) is larger than its training claim (4x), but most operators underweight inference capacity.
Three. What is the expected useful life, anchored to the Rubin release cadence? If the asset needs to earn for four years or more, the depreciation story changes.
Four. What is the procurement posture? Reserved, committed, or spot? The generational improvement under reserved pricing is different from on-demand.
Five. What does the parallelism map look like? If the workload already spans 16 or more H100 nodes, NVLink5 is a real unlock. If it fits on four to eight, it mostly is not.
Where the numbers get soft #
Three places the math tends to miss.
Actual utilization on early GB200 deployments is softer than steady-state. The first six months of a new generation see driver regressions, framework compatibility issues, and kernel tuning gaps. Operators who deployed H100 early paid a 15 to 25 percent utilization penalty versus steady-state cluster numbers by 2024. Expect a similar curve on GB200.
Cloud pricing trajectory is uncertain. GB200 capacity is supply-constrained through at least the second quarter of 2026. Pricing is likely to soften in the second half of 2026 and again when Rubin arrives. A purchase decision made against current cloud pricing will look worse against cloud pricing in 12 months.
Workload composition shifts faster than the hardware cycle. A buyer who specified H100 in 2022 for pretraining is probably running mostly inference and RLHF on the same hardware today. GB200 decisions anchored on today's workload may see the same drift. The hardware has to earn across a workload mix that will evolve.
What this means for enterprise decisions #
The 3.2x headline is real. It is also conditional on assumptions that fail quietly in procurement forecasts. A rigorous capex decision needs a workload-specific model that treats utilization, procurement model, useful life, and workload drift as inputs, not constants.
For the workloads that clear the threshold, GB200 NVL72 is the largest single-generation ROI step since H100 itself. For the workloads that do not, an extended H100 plan is the conservative answer. The gap between the two matters to the tune of 25 to 50 percent of training budget.
Have a decision coming up? The consultancy's compute-curve model takes your workload mix, utilization history, and procurement constraints as inputs and returns break-even tables by chip generation and deployment model. Reach out.
Sources #
Adjacent reading.
Hyperscaler GPU Procurement 2026: H200 vs B200 vs GB200 in Honest Deployment Math
Blackwell is no longer a roadmap promise, it is a procurement reality, and the only honest comparison runs on workload-weighted utilization rather than peak FLO...
Read brief → AI and compute economicsCompute behind a fence: US AI export controls in 2026
Four years of BIS rules have built a tiered global compute regime. The October 2022 baseline, the October 2023 patch, the December 2024 HBM and tooling rules, a...
Read brief → AI and compute economicsAI inference cost decline 2026: the trajectory and what it forces buyers to plan for
Token prices have fallen roughly 10x per year for equivalent capability since 2023, and the buyers who treat inference as a fixed line item are mispricing every...
Read brief →