MLA(Multi-Head Latency Attention) was proposed in [DeepSeek-v2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf) for efficient inference.
MLA(Multi-Head Latency Attention) was proposed in DeepSeek-v2 for efficient inference.