3 posts with tag qwen

llm
pre-training
machine-learning
ai
deepseek
qwen
frontier

LLM Pre-Training in 2026: The Frontier in Numbers

The state of frontier LLM pre-training in 2026 — token counts, parameter counts, cluster sizes, costs, and what it all means for CTOs and ML leads.

May 24, 2026 · 3 minutes reading time

llm
mixture-of-experts
architecture
deepseek-v3
qwen
llama

Mixture-of-Experts Design: DeepSeek-V3, Qwen 3, Llama 4 Compared

Fine-grained experts, shared experts, auxiliary-loss-free routing — the modern MoE recipe in 2026, with side-by-side comparison of DeepSeek-V3, Qwen 3, Llama 4.

May 22, 2026 · 1 minute reading time

llm
pre-training
datasets
common-crawl
fineweb
qwen
deepseek

Pre-Training Data Sources & Token Budgets: From Common Crawl to 36T Tokens

Where 2026 frontier LLMs get their pre-training data — Common Crawl, FineWeb, DCLM, StackV2, multilingual, PDFs — and how Qwen 3, DeepSeek-V3, Llama 3.1 sized their corpora.

May 20, 2026 · 2 minutes reading time