Training Cost Economics 2026
Training Cost Economics 2026
TL;DR
- DeepSeek-V3 final pre-training: 2.788M H800-hours = $5.576M at $2/hr rental (paper, Table 1). Excludes R&D, salaries, ~$1B of owned H800 hardware, ablations, post-training.
- Llama 3.1 405B: ~$170M est., 3.8e25 FLOP on 16K H100s over ~54 days.
- Grok 4 median: $490M (Epoch AI, 2025) — two independent methods (H100 rental + amortized hardware+power) both yielded ~$490M.
- GPT-5 total compute: ~5e25 FLOP (Epoch AI estimate; less than GPT-4.5 at >1e26).
- OpenAI 2024 cloud spend:
$7B ($5B R&D + ~$2B inference); 2025 projected ~$9B R&D.
What is and isn't in the headlines
| Cost category | DeepSeek-V3 $5.5M | Llama 3.1 $170M | GPT-5 (Epoch est.) |
|---|---|---|---|
| Final pre-training run | ✅ | ✅ | ✅ |
| Hardware capex | ❌ (rental price) | partial | ❌ |
| Ablations / experiments | ❌ | ❌ | ❌ |
| R&D salaries | ❌ | ❌ | ❌ |
| Failed runs | ❌ | ❌ | ❌ |
The CFO interpretation
Headline numbers understate true cost by 5–20×. Use Epoch AI's amortized-hardware methodology for "all-in" comparisons.
References
- arXiv:2412.19437; arXiv:2407.21783; epoch.ai/gradient-updates/why-gpt5-used-less-training-compute-than-gpt45-but-gpt6-probably-wont; epoch.ai/data-insights/grok-4-training-resources; epoch.ai/data-insights/openai-compute-spend