Skip to content

PR #2772 might have introduced a device side compilation regression#3056

Merged
aleozlx merged 1 commit intoflashinfer-ai:mainfrom
aleozlx:claude/eloquent-wright
Apr 14, 2026
Merged

PR #2772 might have introduced a device side compilation regression#3056
aleozlx merged 1 commit intoflashinfer-ai:mainfrom
aleozlx:claude/eloquent-wright

Conversation

@aleozlx
Copy link
Copy Markdown
Collaborator

@aleozlx aleozlx commented Apr 14, 2026

📌 Description

🔍 Related Issues

#2772

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Refactor
    • Updated internal CUDA device code for improved compatibility and consistency in quantization and memory computation kernels used in AllReduce fusion operations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

This pull request updates two CUDA header files (trtllm_allreduce_fusion.cuh and trtllm_moe_allreduce_fusion.cuh) to replace std::optional with cuda::std::optional in device code. The host-side <optional> include is removed, and function signatures and call sites are adjusted accordingly to maintain consistency.

Changes

Cohort / File(s) Summary
CUDA Device Optional Replacement
include/flashinfer/comm/trtllm_allreduce_fusion.cuh, include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh
Removed #include <optional>, replaced std::optional<int> with cuda::std::optional<int> in device function parameters (get_sf_out_offset_128x4, cvt_quant_to_fp4_get_sf_out_offset), and updated call sites to use cuda::std::nullopt instead of std::nullopt.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

run-ci, op: comm

Suggested reviewers

  • yzh119
  • bkryu
  • jimmyzho
  • nv-yunzheq

Poem

🐰 A bunny hops through CUDA's domain,
Where device code opts for a better name—
From std::optional, now cuda::std reign,
Type-safe nullopt flows through each vein,
Device-side sanity, crystal and clear!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; only the template structure is provided without filling in the required 'Description' section explaining what changes are made and why. Fill in the 'Description' section with details about the changes made to address the device-side compilation regression from PR #2772, explaining the fix and its rationale.
Title check ❓ Inconclusive The title references a specific issue from PR #2772 but the changes are a fix addressing compilation regressions in two CUDA device code files by replacing std::optional with cuda::std::optional. Clarify whether this PR introduces the regression or fixes it. A more specific title like 'Fix device-side compilation regression from PR #2772 with cuda::std::optional' would be clearer.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces std::optional and std::nullopt with cuda::std::optional and cuda::std::nullopt in the trtllm_allreduce_fusion.cuh and trtllm_moe_allreduce_fusion.cuh headers to ensure better compatibility with CUDA device code. I have no feedback to provide.

@aleozlx aleozlx added the v0.6.8 release blocker label for 0.6.8 label Apr 14, 2026
@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 14, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

Mirroring Failed

Failed to mirror PR to GitLab. Check logs for details.

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 14, 2026

possible chain of events

  1. feat: enable and update all-reduce fused quantization #1164 introduced std::optional in device code (latent bug)
  2. AOT only compiled these for SM100, so arm64 cu126 CI never built them
  3. Fix compilation error: add missing <optional> header #2772 added #include as a fix for some other build config — CI passed because arm64 cu126 still wasn't compiling these modules
  4. feat: Add DCP All-to-All kernel for context-parallel attention reduction #2951 changed the guard to has_sm90 or has_sm100, now arm64 cu126 compiles the trtllm comm modules → nvcc rejects std::optional in device code

@aleozlx aleozlx merged commit dfcafb1 into flashinfer-ai:main Apr 14, 2026
33 of 72 checks passed
aleozlx added a commit that referenced this pull request Apr 15, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

#2772

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Updated internal CUDA device code for improved compatibility and
consistency in quantization and memory computation kernels used in
AllReduce fusion operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@coderabbitai coderabbitai Bot mentioned this pull request Apr 20, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

op: comm v0.6.8 release blocker label for 0.6.8

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants