Skip to content

bump version to 0.6.9#3123

Merged
aleozlx merged 1 commit intomainfrom
bump-version-0.6.9
Apr 24, 2026
Merged

bump version to 0.6.9#3123
aleozlx merged 1 commit intomainfrom
bump-version-0.6.9

Conversation

@aleozlx
Copy link
Copy Markdown
Collaborator

@aleozlx aleozlx commented Apr 20, 2026

Description

Bump version to 0.6.9 for release.

Related Issues (Gated-by PRs)

https://github.com/flashinfer-ai/flashinfer/issues?q=is%3Aopen+label%3Av0.6.9

Reviewer Notes

API changes review

API changes since v0.6.8.post1

$ git diff v0.6.8.post1..main -- "*.py" | grep -B5 -A20 "@flashinfer_api"
         ...     output = moe.run(x, x_sf, topk_ids, topk_weights, w1, w1_sf, ...)
     """
 
-    @supported_compute_capability([100, 103])
+    @supported_compute_capability([100, 103, 120, 121])
     @flashinfer_api
     def __init__(
         self,
@@ -388,7 +436,19 @@ class CuteDslMoEWrapper:
         self.device = device
         self.enable_pdl = enable_pdl
 
-        # Pre-allocated buffers
+        # Detect SM120 for architecture-specific dispatch
+        major, minor = torch.cuda.get_device_capability(device)
+        self._is_sm120 = major == 12
+        if self._is_sm120:
+            from ...jit.cpp_ext import get_cuda_version
+
+            if get_cuda_version().major < 13:
+                raise ValueError(
+                    "SM120 CuTe DSL fused MoE requires CUDA 13 or later. "
+                    f"Current CUDA version: {get_cuda_version()}."
+                )
+
+        # Pre-allocated buffers (SM100 path)
--
     )
 
 
-@supported_compute_capability([100, 103])
+@supported_compute_capability([100, 103, 120, 121])
 @flashinfer_api
 def cute_dsl_fused_moe_nvfp4(
     x: torch.Tensor,
@@ -712,7 +869,7 @@ def cute_dsl_fused_moe_nvfp4(
 ) -> torch.Tensor:
     """Run fused MoE computation using CuteDSL NVFP4 kernels.
 
-    Supported architectures: SM100, SM103.
+    Supported architectures: SM100, SM103, SM120, SM121.
 
     This is the simple functional API. For CUDA graph support, use
     `CuteDslMoEWrapper` instead.
@@ -723,8 +880,12 @@ def cute_dsl_fused_moe_nvfp4(
         ...     output = cute_dsl_fused_moe_nvfp4(...)
 
     Args:
-        x: Input tensor, NVFP4 quantized [num_tokens, hidden_size // 2].
-        x_sf: Scale factors for x.
+        x: Input tensor. On SM100/SM103: NVFP4 quantized
+            [num_tokens, hidden_size // 2]. On SM120/SM121: bf16
+            activations [num_tokens, hidden_size] (kernel fuses

Summary of API changes:

  • CuteDslMoEWrapper.__init__ / cute_dsl_fused_moe_nvfp4: @supported_compute_capability widened from [100, 103] to [100, 103, 120, 121] (SM120 Blackwell support). No signature change — backward-compatible.
  • gated_delta_rule_decode_pretranspose: New optional parameter output_state_indices: Optional[torch.Tensor] = None. Backward-compatible (new param with default).
  • Internal: tactic pre-filtering in core.py for SM89→SM120 occupancy. No API surface change.
  • No breaking changes detected.

Summary by CodeRabbit

  • Chores
    • Version update to 0.6.9 (patch release)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fa46b46-b770-45a7-a142-8da400668657

📥 Commits

Reviewing files that changed from the base of the PR and between 8559397 and 9f89b0e.

📒 Files selected for processing (1)
  • version.txt

📝 Walkthrough

Walkthrough

The project version in version.txt is updated from 0.6.8 to 0.6.9. This is a minor version increment with no changes to any source code files.

Changes

Cohort / File(s) Summary
Version Bump
version.txt
Version number incremented from 0.6.8 to 0.6.9.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • yzh119
  • sricketts
  • bkryu
  • jimmyzho
  • nv-yunzheq

Poem

🐰 A tiny hop from eight to nine,
The version climbs, all looks fine,
No code was changed, just digits dance,
A patch release gets its chance! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change—bumping the version to 0.6.9—which matches the primary objective of this release PR.
Description check ✅ Passed The description includes the required sections (Description, Related Issues, and Reviewer Notes with API changes analysis), though the pull request checklist items are not marked as complete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bump-version-0.6.9

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates the version in version.txt from 0.6.8 to 0.6.9. I have no feedback to provide.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Code Review: Bump version to 0.6.9

Overview

Single-line change to version.txt: 0.6.80.6.9. The PR description includes a solid pre-flight API-diff summary against v0.6.8.post1.

Versioning Convention

Per CLAUDE.md, patch covers small backwards-compatible features (new kernels, new SM support) and backwards-compatible bug fixes. The API changes summarized in the PR description are all backward-compatible:

  • SM120/SM121 support in CuteDslMoEWrapper / cute_dsl_fused_moe_nvfp4 — new compute capability, consistent with a patch bump.
  • New optional parameter output_state_indices on gated_delta_rule_decode_pretranspose — default None, backward-compatible.
  • Internal tactic pre-filtering — no public API surface change.

The 0.6.8 → 0.6.9 patch increment is correct.

Observations

  1. Baseline mismatch: Reviewer notes diff against v0.6.8.post1, but version.txt currently reads 0.6.8 (no .post1). Verify that the tag v0.6.8.post1 exists in the repo before this ships, so release tooling has a clean history baseline.

  2. PR is still DRAFT: Gated PRs in the v0.6.9 milestone should be merged and CI-verified before marking ready.

  3. No test changes needed — version bumps don't require test updates; CI validates the full suite on the tagged commit.

Summary

Change is correct, minimal, and follows project conventions. Main pre-merge checklist: all v0.6.9 gated PRs merged and tag v0.6.8.post1 confirmed.

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 23, 2026

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 23, 2026

to bot-run after #3158 then merge

@aleozlx aleozlx merged commit ebd7fda into main Apr 24, 2026
33 checks passed
@aleozlx aleozlx deleted the bump-version-0.6.9 branch April 24, 2026 06:03
@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 24, 2026

CI is grinding on many irrelevant problems, little gain waiting...

release-v0.6.9 branch

git cherry-pick --no-commit ecf99101 03a87b09
git commit -m "Cherry-pick PR #3151: tinygemm bf16 no bias"
git cherry-pick ebd7fda

aleozlx added a commit that referenced this pull request Apr 24, 2026
Bump version to 0.6.9 for release.

https://github.com/flashinfer-ai/flashinfer/issues?q=is%3Aopen+label%3Av0.6.9

**API changes review**

API changes since v0.6.8.post1

```diff
$ git diff v0.6.8.post1..main -- "*.py" | grep -B5 -A20 "@flashinfer_api"
         ...     output = moe.run(x, x_sf, topk_ids, topk_weights, w1, w1_sf, ...)
     """

-    @supported_compute_capability([100, 103])
+    @supported_compute_capability([100, 103, 120, 121])
     @flashinfer_api
     def __init__(
         self,
@@ -388,7 +436,19 @@ class CuteDslMoEWrapper:
         self.device = device
         self.enable_pdl = enable_pdl

-        # Pre-allocated buffers
+        # Detect SM120 for architecture-specific dispatch
+        major, minor = torch.cuda.get_device_capability(device)
+        self._is_sm120 = major == 12
+        if self._is_sm120:
+            from ...jit.cpp_ext import get_cuda_version
+
+            if get_cuda_version().major < 13:
+                raise ValueError(
+                    "SM120 CuTe DSL fused MoE requires CUDA 13 or later. "
+                    f"Current CUDA version: {get_cuda_version()}."
+                )
+
+        # Pre-allocated buffers (SM100 path)
--
     )

-@supported_compute_capability([100, 103])
+@supported_compute_capability([100, 103, 120, 121])
 @flashinfer_api
 def cute_dsl_fused_moe_nvfp4(
     x: torch.Tensor,
@@ -712,7 +869,7 @@ def cute_dsl_fused_moe_nvfp4(
 ) -> torch.Tensor:
     """Run fused MoE computation using CuteDSL NVFP4 kernels.

-    Supported architectures: SM100, SM103.
+    Supported architectures: SM100, SM103, SM120, SM121.

     This is the simple functional API. For CUDA graph support, use
     `CuteDslMoEWrapper` instead.
@@ -723,8 +880,12 @@ def cute_dsl_fused_moe_nvfp4(
         ...     output = cute_dsl_fused_moe_nvfp4(...)

     Args:
-        x: Input tensor, NVFP4 quantized [num_tokens, hidden_size // 2].
-        x_sf: Scale factors for x.
+        x: Input tensor. On SM100/SM103: NVFP4 quantized
+            [num_tokens, hidden_size // 2]. On SM120/SM121: bf16
+            activations [num_tokens, hidden_size] (kernel fuses
```

**Summary of API changes:**

- `CuteDslMoEWrapper.__init__` / `cute_dsl_fused_moe_nvfp4`:
`@supported_compute_capability` widened from `[100, 103]` to `[100, 103,
120, 121]` (SM120 Blackwell support). **No signature change** —
backward-compatible.
- `gated_delta_rule_decode_pretranspose`: New optional parameter
`output_state_indices: Optional[torch.Tensor] = None`.
**Backward-compatible** (new param with default).
- Internal: tactic pre-filtering in `core.py` for SM89→SM120 occupancy.
No API surface change.
- **No breaking changes detected.**

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

* **Chores**
  * Version update to 0.6.9 (patch release)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@coderabbitai coderabbitai Bot mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants