Blog

算法从业者

formath ·2025-02-24

· DeepSeek · LLM · MLA · Attention · Multi-Head Attention

DeepSeek-V2论文中提出了新的Attention模块Multi-head Latent Attention（MLA），通过Lora和矩阵消融的方式将KV Cache大幅缩小，但矩阵消融只是一笔带过，本文细说一下过程。