3 posts with tag deepseek-v3
·
2 minutes reading time
Mixture-of-Experts Design: DeepSeek-V3, Qwen 3, Llama 4 Compared
Fine-grained experts, shared experts, auxiliary-loss-free routing — the modern MoE recipe in 2026, with side-by-side comparison of DeepSeek-V3, Qwen 3, Llama 4.
·
1 minute reading time
Multi-head Latent Attention (MLA): The KV-Cache Compression Behind DeepSeek-V3
How DeepSeek's Multi-head Latent Attention compresses the KV cache via low-rank projections + decoupled RoPE, achieving large memory reductions versus MHA at equal or better quality.
·
2 minutes reading time