bump version to 0.6.9 by aleozlx · Pull Request #3123 · flashinfer-ai/flashinfer

aleozlx · 2026-04-20T03:57:26Z

Description

Bump version to 0.6.9 for release.

Related Issues (Gated-by PRs)

https://github.com/flashinfer-ai/flashinfer/issues?q=is%3Aopen+label%3Av0.6.9

Reviewer Notes

API changes review

API changes since v0.6.8.post1

$ git diff v0.6.8.post1..main -- "*.py" | grep -B5 -A20 "@flashinfer_api"
         ...     output = moe.run(x, x_sf, topk_ids, topk_weights, w1, w1_sf, ...)
     """
 
-    @supported_compute_capability([100, 103])
+    @supported_compute_capability([100, 103, 120, 121])
     @flashinfer_api
     def __init__(
         self,
@@ -388,7 +436,19 @@ class CuteDslMoEWrapper:
         self.device = device
         self.enable_pdl = enable_pdl
 
-        # Pre-allocated buffers
+        # Detect SM120 for architecture-specific dispatch
+        major, minor = torch.cuda.get_device_capability(device)
+        self._is_sm120 = major == 12
+        if self._is_sm120:
+            from ...jit.cpp_ext import get_cuda_version
+
+            if get_cuda_version().major < 13:
+                raise ValueError(
+                    "SM120 CuTe DSL fused MoE requires CUDA 13 or later. "
+                    f"Current CUDA version: {get_cuda_version()}."
+                )
+
+        # Pre-allocated buffers (SM100 path)
--
     )
 
 
-@supported_compute_capability([100, 103])
+@supported_compute_capability([100, 103, 120, 121])
 @flashinfer_api
 def cute_dsl_fused_moe_nvfp4(
     x: torch.Tensor,
@@ -712,7 +869,7 @@ def cute_dsl_fused_moe_nvfp4(
 ) -> torch.Tensor:
     """Run fused MoE computation using CuteDSL NVFP4 kernels.
 
-    Supported architectures: SM100, SM103.
+    Supported architectures: SM100, SM103, SM120, SM121.
 
     This is the simple functional API. For CUDA graph support, use
     `CuteDslMoEWrapper` instead.
@@ -723,8 +880,12 @@ def cute_dsl_fused_moe_nvfp4(
         ...     output = cute_dsl_fused_moe_nvfp4(...)
 
     Args:
-        x: Input tensor, NVFP4 quantized [num_tokens, hidden_size // 2].
-        x_sf: Scale factors for x.
+        x: Input tensor. On SM100/SM103: NVFP4 quantized
+            [num_tokens, hidden_size // 2]. On SM120/SM121: bf16
+            activations [num_tokens, hidden_size] (kernel fuses

Summary of API changes:

CuteDslMoEWrapper.__init__ / cute_dsl_fused_moe_nvfp4: @supported_compute_capability widened from [100, 103] to [100, 103, 120, 121] (SM120 Blackwell support). No signature change — backward-compatible.
gated_delta_rule_decode_pretranspose: New optional parameter output_state_indices: Optional[torch.Tensor] = None. Backward-compatible (new param with default).
Internal: tactic pre-filtering in core.py for SM89→SM120 occupancy. No API surface change.
No breaking changes detected.

Summary by CodeRabbit

Chores
- Version update to 0.6.9 (patch release)

coderabbitai · 2026-04-20T03:57:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fa46b46-b770-45a7-a142-8da400668657

📥 Commits

Reviewing files that changed from the base of the PR and between 8559397 and 9f89b0e.

📒 Files selected for processing (1)

version.txt

📝 Walkthrough

Walkthrough

The project version in version.txt is updated from 0.6.8 to 0.6.9. This is a minor version increment with no changes to any source code files.

Changes

Cohort / File(s)	Summary
Version Bump `version.txt`	Version number incremented from 0.6.8 to 0.6.9.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

bump version to 0.6.8 #3042: Sequential version bump from 0.6.7→0.6.8 immediately preceding this 0.6.8→0.6.9 change.
release: bump version to 0.6.2 #2411: Another version.txt bump following the same pattern in the release cycle.
bump version to 0.6.4 #2565: Earlier version bump in the same file updating to an intermediate patch version.

Suggested labels

run-ci

Suggested reviewers

yzh119
sricketts
bkryu
jimmyzho
nv-yunzheq

Poem

🐰 A tiny hop from eight to nine,
The version climbs, all looks fine,
No code was changed, just digits dance,
A patch release gets its chance! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change—bumping the version to 0.6.9—which matches the primary objective of this release PR.
Description check	✅ Passed	The description includes the required sections (Description, Related Issues, and Reviewer Notes with API changes analysis), though the pull request checklist items are not marked as complete.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bump-version-0.6.9

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

The pull request updates the version in version.txt from 0.6.8 to 0.6.9. I have no feedback to provide.

claude · 2026-04-20T03:59:02Z

Code Review: Bump version to 0.6.9

Overview

Single-line change to version.txt: 0.6.8 → 0.6.9. The PR description includes a solid pre-flight API-diff summary against v0.6.8.post1.

Versioning Convention

Per CLAUDE.md, patch covers small backwards-compatible features (new kernels, new SM support) and backwards-compatible bug fixes. The API changes summarized in the PR description are all backward-compatible:

SM120/SM121 support in CuteDslMoEWrapper / cute_dsl_fused_moe_nvfp4 — new compute capability, consistent with a patch bump.
New optional parameter output_state_indices on gated_delta_rule_decode_pretranspose — default None, backward-compatible.
Internal tactic pre-filtering — no public API surface change.

The 0.6.8 → 0.6.9 patch increment is correct.

Observations

Baseline mismatch: Reviewer notes diff against v0.6.8.post1, but version.txt currently reads 0.6.8 (no .post1). Verify that the tag v0.6.8.post1 exists in the repo before this ships, so release tooling has a clean history baseline.
PR is still DRAFT: Gated PRs in the v0.6.9 milestone should be merged and CI-verified before marking ready.
No test changes needed — version bumps don't require test updates; CI validates the full suite on the tagged commit.

Summary

Change is correct, minimal, and follows project conventions. Main pre-merge checklist: all v0.6.9 gated PRs merged and tag v0.6.8.post1 confirmed.

aleozlx · 2026-04-23T17:38:39Z

RC1: https://github.com/flashinfer-ai/flashinfer/actions/runs/24849577873

aleozlx · 2026-04-23T18:25:55Z

to bot-run after #3158 then merge

aleozlx · 2026-04-24T06:09:29Z

CI is grinding on many irrelevant problems, little gain waiting...

release-v0.6.9 branch

git cherry-pick --no-commit ecf99101 03a87b09
git commit -m "Cherry-pick PR #3151: tinygemm bf16 no bias"
git cherry-pick ebd7fda

Bump version to 0.6.9 for release. https://github.com/flashinfer-ai/flashinfer/issues?q=is%3Aopen+label%3Av0.6.9 **API changes review** API changes since v0.6.8.post1 ```diff $ git diff v0.6.8.post1..main -- "*.py" | grep -B5 -A20 "@flashinfer_api" ... output = moe.run(x, x_sf, topk_ids, topk_weights, w1, w1_sf, ...) """ - @supported_compute_capability([100, 103]) + @supported_compute_capability([100, 103, 120, 121]) @flashinfer_api def __init__( self, @@ -388,7 +436,19 @@ class CuteDslMoEWrapper: self.device = device self.enable_pdl = enable_pdl - # Pre-allocated buffers + # Detect SM120 for architecture-specific dispatch + major, minor = torch.cuda.get_device_capability(device) + self._is_sm120 = major == 12 + if self._is_sm120: + from ...jit.cpp_ext import get_cuda_version + + if get_cuda_version().major < 13: + raise ValueError( + "SM120 CuTe DSL fused MoE requires CUDA 13 or later. " + f"Current CUDA version: {get_cuda_version()}." + ) + + # Pre-allocated buffers (SM100 path) -- ) -@supported_compute_capability([100, 103]) +@supported_compute_capability([100, 103, 120, 121]) @flashinfer_api def cute_dsl_fused_moe_nvfp4( x: torch.Tensor, @@ -712,7 +869,7 @@ def cute_dsl_fused_moe_nvfp4( ) -> torch.Tensor: """Run fused MoE computation using CuteDSL NVFP4 kernels. - Supported architectures: SM100, SM103. + Supported architectures: SM100, SM103, SM120, SM121. This is the simple functional API. For CUDA graph support, use `CuteDslMoEWrapper` instead. @@ -723,8 +880,12 @@ def cute_dsl_fused_moe_nvfp4( ... output = cute_dsl_fused_moe_nvfp4(...) Args: - x: Input tensor, NVFP4 quantized [num_tokens, hidden_size // 2]. - x_sf: Scale factors for x. + x: Input tensor. On SM100/SM103: NVFP4 quantized + [num_tokens, hidden_size // 2]. On SM120/SM121: bf16 + activations [num_tokens, hidden_size] (kernel fuses ``` **Summary of API changes:** - `CuteDslMoEWrapper.__init__` / `cute_dsl_fused_moe_nvfp4`: `@supported_compute_capability` widened from `[100, 103]` to `[100, 103, 120, 121]` (SM120 Blackwell support). **No signature change** — backward-compatible. - `gated_delta_rule_decode_pretranspose`: New optional parameter `output_state_indices: Optional[torch.Tensor] = None`. **Backward-compatible** (new param with default). - Internal: tactic pre-filtering in `core.py` for SM89→SM120 occupancy. No API surface change. - **No breaking changes detected.**  * **Chores** * Version update to 0.6.9 (patch release)

bump version to 0.6.9

9f89b0e

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

aleozlx marked this pull request as ready for review April 23, 2026 18:25

aleozlx requested review from bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 23, 2026 18:26

bkryu approved these changes Apr 23, 2026

View reviewed changes

aleozlx merged commit ebd7fda into main Apr 24, 2026
33 checks passed

aleozlx deleted the bump-version-0.6.9 branch April 24, 2026 06:03

coderabbitai Bot mentioned this pull request May 5, 2026

bump version to 0.6.10 #3179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump version to 0.6.9#3123

bump version to 0.6.9#3123
aleozlx merged 1 commit intomainfrom
bump-version-0.6.9

aleozlx commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot commented Apr 20, 2026

Uh oh!

aleozlx commented Apr 23, 2026

Uh oh!

aleozlx commented Apr 23, 2026

Uh oh!

Uh oh!

aleozlx commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aleozlx commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues (Gated-by PRs)

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot commented Apr 20, 2026

Code Review: Bump version to 0.6.9

Overview

Versioning Convention

Observations

Summary

Uh oh!

aleozlx commented Apr 23, 2026

Uh oh!

aleozlx commented Apr 23, 2026

Uh oh!

Uh oh!

aleozlx commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aleozlx commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading