PR #2772 might have introduced a device side compilation regression by aleozlx · Pull Request #3056 · flashinfer-ai/flashinfer

aleozlx · 2026-04-14T00:58:20Z

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Refactor
- Updated internal CUDA device code for improved compatibility and consistency in quantization and memory computation kernels used in AllReduce fusion operations.

… regression

coderabbitai · 2026-04-14T00:58:33Z

📝 Walkthrough

Walkthrough

This pull request updates two CUDA header files (trtllm_allreduce_fusion.cuh and trtllm_moe_allreduce_fusion.cuh) to replace std::optional with cuda::std::optional in device code. The host-side <optional> include is removed, and function signatures and call sites are adjusted accordingly to maintain consistency.

Changes

Cohort / File(s)	Summary
CUDA Device Optional Replacement `include/flashinfer/comm/trtllm_allreduce_fusion.cuh`, `include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh`	Removed `#include <optional>`, replaced `std::optional<int>` with `cuda::std::optional<int>` in device function parameters (`get_sf_out_offset_128x4`, `cvt_quant_to_fp4_get_sf_out_offset`), and updated call sites to use `cuda::std::nullopt` instead of `std::nullopt`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Fix compilation error: add missing <optional> header #2772: Directly related—modifies the same CUDA header files' handling of optional types, with this PR switching to device-side cuda::std::optional instead of host std::optional.
Optimize quantization function in large problem size #2343: Modifies the same device-side quantization helper functions (get_sf_out_offset_128x4 and cvt_quant_to_fp4_get_sf_out_offset), overlapping at the signature level.

Suggested labels

run-ci, op: comm

Suggested reviewers

yzh119
bkryu
jimmyzho
nv-yunzheq

Poem

🐰 A bunny hops through CUDA's domain,
Where device code opts for a better name—
From std::optional, now cuda::std reign,
Type-safe nullopt flows through each vein, ✨
Device-side sanity, crystal and clear!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; only the template structure is provided without filling in the required 'Description' section explaining what changes are made and why.	Fill in the 'Description' section with details about the changes made to address the device-side compilation regression from PR `#2772`, explaining the fix and its rationale.
Title check	❓ Inconclusive	The title references a specific issue from PR `#2772` but the changes are a fix addressing compilation regressions in two CUDA device code files by replacing std::optional with cuda::std::optional.	Clarify whether this PR introduces the regression or fixes it. A more specific title like 'Fix device-side compilation regression from PR `#2772` with cuda::std::optional' would be clearer.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request replaces std::optional and std::nullopt with cuda::std::optional and cuda::std::nullopt in the trtllm_allreduce_fusion.cuh and trtllm_moe_allreduce_fusion.cuh headers to ensure better compatibility with CUDA device code. I have no feedback to provide.

aleozlx · 2026-04-14T01:04:05Z

/bot run

flashinfer-bot · 2026-04-14T01:06:13Z

Mirroring Failed

Failed to mirror PR to GitLab. Check logs for details.

aleozlx · 2026-04-14T03:08:52Z

possible chain of events

feat: enable and update all-reduce fused quantization #1164 introduced std::optional in device code (latent bug)
AOT only compiled these for SM100, so arm64 cu126 CI never built them
Fix compilation error: add missing <optional> header #2772 added #include as a fix for some other build config — CI passed because arm64 cu126 still wasn't compiling these modules
feat: Add DCP All-to-All kernel for context-parallel attention reduction #2951 changed the guard to has_sm90 or has_sm100, now arm64 cu126 compiles the trtllm comm modules → nvcc rejects std::optional in device code

## 📌 Description  ## 🔍 Related Issues #2772 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Refactor** * Updated internal CUDA device code for improved compatibility and consistency in quantization and memory computation kernels used in AllReduce fusion operations.

PR flashinfer-ai#2772 might have introduced a device side compilation…

383f4c6

… regression

aleozlx requested review from bkryu, jimmyzho, nv-yunzheq and yzh119 as code owners April 14, 2026 00:58

aleozlx added the run-ci label Apr 14, 2026

flashinfer-bot added op: comm and removed run-ci labels Apr 14, 2026

aleozlx mentioned this pull request Apr 14, 2026

feat: Add DCP All-to-All kernel for context-parallel attention reduction #2951

Merged

4 tasks

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

aleozlx added the v0.6.8 release blocker label for 0.6.8 label Apr 14, 2026

bkryu approved these changes Apr 14, 2026

View reviewed changes

aleozlx merged commit dfcafb1 into flashinfer-ai:main Apr 14, 2026
33 of 72 checks passed

coderabbitai Bot mentioned this pull request Apr 20, 2026

Port TRT-LLM fused qk norm rope #3117

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #2772 might have introduced a device side compilation regression#3056

PR #2772 might have introduced a device side compilation regression#3056
aleozlx merged 1 commit intoflashinfer-ai:mainfrom
aleozlx:claude/eloquent-wright

aleozlx commented Apr 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

aleozlx commented Apr 14, 2026

Uh oh!

flashinfer-bot commented Apr 14, 2026

Uh oh!

aleozlx commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aleozlx commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

aleozlx commented Apr 14, 2026

Uh oh!

flashinfer-bot commented Apr 14, 2026

Uh oh!

aleozlx commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aleozlx commented Apr 14, 2026 •

edited

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading