MNNVL MoE All-to-All Support by cyx-6 · Pull Request #1134 · flashinfer-ai/flashinfer

cyx-6 · 2025-06-10T06:10:34Z

📌 Description

Introduce the MnnvlMemory and MnnvlMoe from TensorRT-LLM, for large scale expert parallism. The MnnvlMoe features a MnnvlMemory workspace for all-to-all(v) communication operation, aligned to mpi alltoallv interface and functionality.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

cyx-6 · 2025-06-11T04:41:39Z

above issues are all fixed

cyx-6 · 2025-06-11T05:20:56Z

removed and decoupled

yongwww · 2025-06-11T15:51:15Z

The multi-gpu tests are skipped in CI due to the ci resource limit. They pass on my multi-B200 node.

Update (Jun 15, 2025): It turns out the MNNVL fabric wasn’t actually being used for data transfers in the multi-gpu tests, so I’ll remove those tests. The MNNVL setup along with the updated multi-GPU and multi-node tests will be added shortly

## 📌 Description Install the python packages for CI docker: mpi4py, pynvml. They will be used for the comm ops. ## 🔍 Related Issues #1145, #1134 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes

Use trtllm_alltoall.cuh instead of trtllm_alltoall.cu Upd Use pytorch_extension_utils upd upd upd compiled Fix build Register python ops fix Upd Add unittest fix fix fix Add multi-gpu test cases Add cross-gpu test Remove the invalid cross-gpu test add mnnvl (wip) comm module

upd . . . . .

yzh119 · 2025-06-23T06:49:44Z

Thanks @yongwww and @cyx-6 for the great work.

As our communication kernel dependencies become complicated, we should update the documentation on how to install MPI/gdrcopy/etc.

yzh119 · 2025-06-24T06:45:40Z

cc @yyihuang for a another look on 26cdc5e

…comm-all2all

yzh119 reviewed Jun 11, 2025

View reviewed changes

Comment thread csrc/trtllm_alltoall.cu Outdated

Comment thread csrc/trtllm_alltoall.cu Outdated

Comment thread csrc/trtllm_alltoall.cu Outdated

Comment thread csrc/trtllm_alltoall.cu Outdated

yzh119 reviewed Jun 11, 2025

View reviewed changes

Comment thread csrc/pytorch_extension_utils.h Outdated

Comment thread csrc/pytorch_extension_utils.h Outdated

Comment thread flashinfer/comm.py Outdated

yongwww marked this pull request as ready for review June 11, 2025 15:47

yongwww changed the title ~~All-to-all communication operator support~~ alltoallv communication operator support Jun 11, 2025

yzh119 mentioned this pull request Jun 12, 2025

[wip] refactor: add more features to vec_t #1142

Draft

5 tasks

yongwww changed the title ~~alltoallv communication operator support~~ MNNVL AllToAllV communication operator support Jun 16, 2025

yongwww mentioned this pull request Jun 16, 2025

ci: Install mpi4py #1149

Merged

5 tasks

yongwww and others added 2 commits June 22, 2025 04:59

add pynvml dependency

19c849c

cyx-6 force-pushed the comm-all2all branch from cd8e81b to 19c849c Compare June 22, 2025 05:06

cyx-6 added 2 commits June 23, 2025 00:48

wip

1861093

upd . . . . .

format

b0563ba

yzh119 reviewed Jun 23, 2025

View reviewed changes

Comment thread flashinfer/comm/mnnvl.py

cyx-6 and others added 3 commits June 23, 2025 06:28

add dependency

6cf8513

format

1d79b59

Merge branch 'main' into comm-all2all

1aa35d5

yzh119 approved these changes Jun 23, 2025

View reviewed changes

refactor

26cdc5e

yyihuang and others added 5 commits June 24, 2025 03:43

upd refactor

2e2a483

upd

f6414b6

upd

e5735bc

Merge branch 'comm-all2all' of github.com:yzh119/flashinfer-dev into …

7d05e68

…comm-all2all

fix mpicomm

6986ee8

yzh119 merged commit 3dd4f03 into flashinfer-ai:main Jun 24, 2025
2 checks passed

cyx-6 changed the title ~~MNNVL AllToAllV communication operator support~~ MNNVL MoE All-to-All Support Jun 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNNVL MoE All-to-All Support#1134

MNNVL MoE All-to-All Support#1134
yzh119 merged 13 commits intoflashinfer-ai:mainfrom
yzh119:comm-all2all

cyx-6 commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyx-6 commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyx-6 commented Jun 11, 2025

Uh oh!

yongwww commented Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

yzh119 commented Jun 23, 2025

Uh oh!

yzh119 commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cyx-6 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyx-6 commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyx-6 commented Jun 11, 2025

Uh oh!

yongwww commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yzh119 commented Jun 23, 2025

Uh oh!

yzh119 commented Jun 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cyx-6 commented Jun 10, 2025 •

edited

Loading

yongwww commented Jun 11, 2025 •

edited

Loading