1 post with tag mla
Multi-head Latent Attention (MLA): The KV-Cache Compression Behind DeepSeek-V3
How DeepSeek's Multi-head Latent Attention compresses the KV cache via low-rank projections + decoupled RoPE, achieving large memory reductions versus MHA at equal or better quality.
·
2 minutes reading time