Skip to content

feat: move AR fusion kernels from trtllm#1061

Closed
yyihuang wants to merge 36 commits intoflashinfer-ai:mainfrom
yyihuang:trt_ar_fusion
Closed

feat: move AR fusion kernels from trtllm#1061
yyihuang wants to merge 36 commits intoflashinfer-ai:mainfrom
yyihuang:trt_ar_fusion

Conversation

@yyihuang
Copy link
Copy Markdown
Collaborator

@yyihuang yyihuang commented May 15, 2025

Move All-reduce fusion kernels from trtllm to flashinfer

Requirements:

Changes:

  • Add trtllm AR-fusion kernels
  • Add python interface
  • Add quantization utils
  • (optional) Add trtllm checker/assertion - unused yet, but replaced by torch check. Please eval it in this code review.

To be discussed:

  • maintain trtllm-style assertion/check? (currently with torch check)
  • interface level at allreduce_fusion_kernel_XXXX or allreduce_fusion_op? (currently at allreduce_fusion_op)

next todo:

  • compile
  • minimize dependency
  • add flashinfer logger, exception, check (maybe torch_check)
  • design communication module unified interface
  • unit test on python interface
  • benchmark (optional?)
  • unit test on C++ interface (not in plan)

@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented May 28, 2025

Closed for now because it introduce deep trtllm dependency and hard to maintain.
We will split the trtllm comm kernels into three pieces:

  1. one and two-shot allreduce kernels (w/ rmsnorm fusion), feat: add trtllm all-reduce (non-MoE) #1096
  2. low-precision allreduce kernels
  3. moe allreduce kernels

@yzh119 yzh119 closed this May 28, 2025
yzh119 added a commit that referenced this pull request Jun 2, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description

We add trt-llm custom all-reduce to flashinfer comm module.

## 🔍 Related Issues

We split this PR into multiple. 
#1061
MoE kernels are also in progress.

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->

---------

Co-authored-by: Zihao Ye <expye@outlook.com>
yzh119 added a commit that referenced this pull request Jun 17, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description

We try to add moe_all_reduce_fusion kernels from trt-llm. 

## 🔍 Related Issues

We split this PR into multiple ones. #1061
And all_reduce_fusion will be the next.

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->

---------

Co-authored-by: Zihao Ye <expye@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants