Framework

Compute Cost Curve

Six-step GPU TCO decomposition across acquisition, power, cooling, network, depreciation, and idle.

Problem solved

Hyperscaler procurement teams, neocloud capex committees, and PE infrastructure investors need an apples-to-apples cost per effective FLOP across chip generations and deployment archetypes. Spec sheets do not produce this number. The framework produces it under named utilization, useful-life, and energy assumptions, with the sensitivity ranking that matters for procurement.

Inputs

Acquisition price per accelerator (H200, B200, GB200, MI300X, TPU v5p, custom), with 12 month forward curve where observable
MLPerf training and inference results, normalized to dense FP8 throughput
Power draw at sustained load and idle, including NVLink and host CPU overhead
Cooling architecture (air, direct-to-chip, immersion) with PUE assumption by site type
Industrial electricity tariff at the site (EIA 861, ISO/RTO LMP, named PPA)
Network fabric (NVLink, NDR Infiniband, RoCE) and the implied scaling efficiency from MLPerf strong-scaling tables
Useful-life and depreciation schedule (24, 36, or 48 month base cases)
Utilization profile by workload (training-heavy, inference-heavy, mixed) with idle fraction

Outputs

All-in dollars per effective FP8 PFLOP-hour at named utilization
Channel decomposition: capex per hour, power per hour, cooling per hour, network amortization per hour, idle drag
Tornado chart of TCO sensitivity to electricity price, utilization, useful life, and PUE
Dollars per million tokens for inference workloads at named batch sizes and quantization
Crossover utilization: the point at which a newer chip undercuts an older chip on TCO terms
Scenario tree for forward chip prices and electricity prices

Method

Step 1. Normalize throughput. Take MLPerf dense FP8 training and inference results, scale to the workload mix in the engagement scope, and convert to effective PFLOP-hours. NVLink, Infiniband, and weak-scaling versus strong-scaling deltas are kept separate.
Step 2. Build the capex line. Acquisition price per accelerator plus the network fabric, host servers, and rack-level integration. Spread across the useful-life base case (default 36 months for training, 48 months for inference). Salvage value defaults to 5 percent.
Step 3. Build the power line. Sustained load times utilization plus idle draw times idle fraction, multiplied by the site electricity tariff. Behind-the-meter PPA prices override grid tariffs where contractually applicable.
Step 4. Build the cooling line. Apply PUE by cooling architecture: 1.40 for legacy air, 1.20 for direct-to-chip, 1.05 for immersion. Multiply IT load by PUE minus one to get cooling overhead. Water-side cooling adds a separate water-cost line for jurisdictions where water is constrained.
Step 5. Build the network line. NVLink within the rack is amortized into capex; cross-rack NDR Infiniband or RoCE has its own capex amortization plus power. Scaling efficiency loss above 1,024 GPUs gets a tax that lowers effective FLOPs.
Step 6. Sum, divide, sensitivity. Add all lines, divide by effective FLOP-hours at named utilization, run the tornado on electricity price, utilization, useful life, and PUE. Report the crossover utilization where the candidate undercuts the incumbent.

Assumptions

MLPerf dense FP8 throughput is the comparison currency. Sparse and reduced-precision throughput is reported separately as a sensitivity case.
Useful life defaults to 36 months for training and 48 months for inference. Hyperscalers are increasingly running 60-month inference deployments; the framework reports both.
Idle draw is a real cost, not a rounding error. The framework requires an explicit idle-fraction input.
Electricity price is annualized and held constant within the depreciation window for the base case. A separate scenario tree handles forward-curve sensitivity.
NVLink and Infiniband scaling losses follow the published MLPerf strong-scaling tables. Custom fabrics require an engagement-specific calibration.

Limitations

Real-world utilization at hyperscaler scale is rarely the spec-sheet utilization. The framework requires the operator's own utilization data; in the absence of it, results carry an explicit utilization-band warning.
Custom accelerator pricing is often opaque. The framework runs three pricing scenarios (low, mid, high) where contracts are confidential.
Software stack maturity (CUDA, ROCm, custom) does not appear directly in the cost stack but is captured implicitly through MLPerf throughput.
Inference token economics are workload-specific. Generic dollar per million tokens numbers are not portable across model class, context length, or batching strategy.

Example application

Applied to a 2026 hyperscaler procurement decision: H200 versus B200 versus GB200 NVL72 for a 50,000 accelerator training cluster on PJM with a 120 dollar per MWh blended power tariff and 36 month depreciation. The framework runs the six steps, produces the dollar per effective FP8 PFLOP-hour for each candidate at 65 percent and 80 percent utilization, and identifies the utilization crossover where GB200 economics dominate H200 even at the higher acquisition price. See Hyperscaler GPU procurement 2026.

Briefs that demonstrate this framework

Where the method has been applied.

2026-04-26

Hyperscaler GPU Procurement 2026: H200 vs B200 vs GB200 in Honest Deployment Math

Blackwell is no longer a roadmap promise, it is a procurement reality, and the only honest comparison runs on workload-weighted utilization rather than peak FLO...

Read brief → 2026-04-26

AI inference cost decline 2026: the trajectory and what it forces buyers to plan for

Token prices have fallen roughly 10x per year for equivalent capability since 2023, and the buyers who treat inference as a fixed line item are mispricing every...

Read brief → 2026-04-25

AI capex met the grid: when the megawatt curve breaks

Hyperscaler capital spending crossed 500 billion dollars across 2025 and 2026 while the average US interconnection wait sits above 4 years. The constraint is no...

Read brief → 2026-04-26

AI Talent Compensation 2026: Where Comp Is Going Across Labs, Hyperscalers, and Finance

Frontier labs, hyperscaler ML orgs, and quant funds are converging on a narrow pool of researchers, with equity scaling on private valuations and skill premiums...

Read brief → 2026-04-25

Quebec hydropower and the new gating of AI compute

Quebec spent two decades selling itself as the cheapest, greenest place on the continent to plug in a data center. In 2026 Hydro-Quebec is throttling new connec...

Read brief → 2026-04-26

Compute behind a fence: US AI export controls in 2026

Four years of BIS rules have built a tiered global compute regime. The October 2022 baseline, the October 2023 patch, the December 2024 HBM and tooling rules, a...

Read brief → 2026-04-26

Small modular reactors meet the hyperscaler load curve

Eighteen months after the Google Kairos and Amazon X-energy announcements, the SMR thesis has moved from PowerPoint to procurement. The binding constraints are ...

Read brief → 2026-04-26

AI inference economics in 2026: GPT, Claude, Gemini, and the pricing war that is rewriting the application stack

Token prices are falling roughly 10x per year at constant capability, the marginal frontier provider is now a Chinese open weight lab, and hyperscaler capex is ...

Read brief → 2026-04-26

Korea memory in 2026: Samsung versus SK Hynix, NVIDIA qualification, and the HBM share war

SK Hynix turned a two year qualification lead at NVIDIA into roughly half of the global HBM market and most of its profit pool, while Samsung is rebuilding its ...

Read brief → 2026-04-26

Frontier AI training cost trajectory 2026: the run rate, the deal stack, and the power-bound horizon

Frontier pretraining budgets crossed the half billion mark in 2025 and are heading toward one to three billion dollars per model by 2027, with cluster power, no...

Read brief → 2026-04-26

The Custom Silicon Insurgency Against Nvidia in 2026

AWS Trainium 2, Google TPU v5p and Trillium, Microsoft Maia, Meta MTIA, and a possible OpenAI ASIC are reshaping where AI compute margin lives, but the binding ...

Read brief → 2026-04-26

ASEAN Sovereign AI in 2026: Models, Compute, and the Regulatory Patchwork

Singapore is buying TPU access while building SEA-LION, Indonesia is shipping Sahabat-AI in five languages, Thailand is scaling Typhoon, Malaysia is funding chi...

Read brief →

Other frameworks

Related methods.

FEOC Stack

Foreign-entity-of-concern decomposition for IRA Section 30D and 45X bills of materials.

Pass-Through Decomp

Compute Cost Curve

Problem solved

Inputs

Outputs

Method

Assumptions

Limitations

Example application

Where the method has been applied.

Hyperscaler GPU Procurement 2026: H200 vs B200 vs GB200 in Honest Deployment Math

AI inference cost decline 2026: the trajectory and what it forces buyers to plan for

AI capex met the grid: when the megawatt curve breaks

AI Talent Compensation 2026: Where Comp Is Going Across Labs, Hyperscalers, and Finance

Quebec hydropower and the new gating of AI compute

Compute behind a fence: US AI export controls in 2026

Small modular reactors meet the hyperscaler load curve

AI inference economics in 2026: GPT, Claude, Gemini, and the pricing war that is rewriting the application stack

Korea memory in 2026: Samsung versus SK Hynix, NVIDIA qualification, and the HBM share war

Frontier AI training cost trajectory 2026: the run rate, the deal stack, and the power-bound horizon

The Custom Silicon Insurgency Against Nvidia in 2026

ASEAN Sovereign AI in 2026: Models, Compute, and the Regulatory Patchwork

Related methods.

FEOC Stack

Tariff Pass-Through Decomposition

Fiscal Multiplier Bench

Energy-AI Siting Score

Industrial Policy Maturity Index

Tariff-Substitution Elasticity Map

Sovereign Restructuring Stack

Critical Minerals Concentration Index

AI Capex Absorption Score

Sovereign Default Probability Stack

Election Outcome Translation Matrix

Cross-Border Payments Stack

Climate Adaptation Bond Score