Tag: llm | 6AI6

Quality Filtering for LLM Pre-Training: FineWeb-Edu, DCLM, Nemotron-CC, Ultra-FineWeb

How frontier labs filter trillions of web tokens — heuristic, perplexity-based, and classifier-based filtering, with concrete recipes from FineWeb-Edu and DCLM.

May 25, 2026 · 2 minutes reading time

Training Cost Economics 2026

May 25, 2026 · 1 minute reading time

2026 大模型预训练：前沿数据全景

May 24, 2026 · 2 minutes reading time

LLM Pre-Training in 2026: The Frontier in Numbers

The state of frontier LLM pre-training in 2026 — token counts, parameter counts, cluster sizes, costs, and what it all means for CTOs and ML leads.

May 24, 2026 · 3 minutes reading time

Multi-head Latent Attention (MLA): The KV-Cache Compression Behind DeepSeek-V3

How DeepSeek's Multi-head Latent Attention compresses the KV cache via low-rank projections + decoupled RoPE, achieving large memory reductions versus MHA at equal or better quality.

May 23, 2026 · 2 minutes reading time

Optimizers in 2026: AdamW, Muon, Shampoo/SOAP