GitHub - DAGroup-PKU/MHLA: MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Kewei Zhang^1*, Ye Huang^1*, Yufan Deng¹, Jincheng Yu², Junsong Chen²,
Huan Ling², Enze Xie², Daquan Zhou¹

¹Peking University ²NVIDIA

MHLA is a universal high-efficiency linear attention operator. MHLA can be applied to image classification, image generation, language modeling, and video generation tasks, maintaining performance consistent with Flash Attention while achieving significant speed advantages over Flash Attention under long-sequence conditions. For more details, please refer to our paper.

This repository is organized into four sub-projects: mhla_dit, mhla_image_classification, mhla_nlp, and mhla_videogen. Each corresponds to the experimental code for the four tasks presented in our paper. Each sub-project contains its own README.md with detailed instructions.

🔥 News

[2026.01.12] 🔥 Our paper is available at arxiv.
[2026.01.12] 🔥 We release the code of MHLA, including training and inference code for image classification, image generation, language modeling, and video generation.

🎥 Demo

Please note that the following video is a compressed version. You can view the full HD demo by visiting this link.

MHLA_demo.mp4

Todo List

Installation & Usage

git clone -b main --single-branch https://github.com/DAGroup-PKU/MHLA

Please refer to the README.md files in the following sub-projects for detailed information:

Performance & Efficiency

On Wan2.1-1.3B

Method	Quality score	Semantic score	Total	Latency
Wan2.1 1.3B	85.23	75.65	83.31	139s
Full MHLA	83.93	78.40	82.83	62s
Full Linear	69.96	11.38	58.24	62s
MHLA Hybrid 2/3	84.87	79.59	83.82	84s

Wan-MHLA and Wan-LA replace all layers with MHLA and Linear Attention respectively. Wan-MHLA-H only replace 2/3 layers.

Excellent Convergence

As shown in the figure, linear attention fails to converge during training in ultra-long sequence scenarios for video generation, while MHLA demonstrates excellent convergence.

Acknowledgement

Our project is built on multiple inspiring projects including: timm, DiT, Sana and flash-linear-attention.

Support Us

If you find this work useful, please consider:

Starring the repository
Citing our paper
Contributing to the codebase

Citation

@misc{mhla,
      title={MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head}, 
      author={Kewei Zhang and Ye Huang and Yufan Deng and Jincheng Yu and Junsong Chen and Huan Ling and Enze Xie and Daquan Zhou},
      year={2026},
      eprint={2601.07832},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.07832}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
mhla_dit		mhla_dit
mhla_image_classification		mhla_image_classification
mhla_nlp		mhla_nlp
mhla_videogen		mhla_videogen
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

🔥 News

🎥 Demo

Todo List

Installation & Usage

Performance & Efficiency

On Wan2.1-1.3B

Excellent Convergence

Acknowledgement

Support Us

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

DAGroup-PKU/MHLA

Folders and files

Latest commit

History

Repository files navigation

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

🔥 News

🎥 Demo

Todo List

Installation & Usage

Performance & Efficiency

On Wan2.1-1.3B

Excellent Convergence

Acknowledgement

Support Us

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages