Skip to content

bump version to 0.6.8#3042

Merged
aleozlx merged 2 commits intomainfrom
bump-version-0.6.8
Apr 14, 2026
Merged

bump version to 0.6.8#3042
aleozlx merged 2 commits intomainfrom
bump-version-0.6.8

Conversation

@aleozlx
Copy link
Copy Markdown
Collaborator

@aleozlx aleozlx commented Apr 13, 2026

Description

Bump version to 0.6.8 for release.

Related Issues (Gated-by PRs)

https://github.com/flashinfer-ai/flashinfer/issues?q=is%3Aopen+label%3Av0.6.8

Reviewer Notes

API changes review

API changes since v0.6.7.post3

$ git diff v0.6.7.post3..main -- "*.py" | grep -B5 -A20 "@flashinfer_api"
-            Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-        ] = None,
+        kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
     ) -> Tuple[torch.Tensor, torch.Tensor]: ...
 
     @flashinfer_api
@@ -1227,9 +1232,7 @@ class BatchDecodeWithPagedKVCacheWrapper:
         sinks: Optional[torch.Tensor] = None,
         q_len_per_req: Optional[int] = 1,
         skip_softmax_threshold_scale_factor: Optional[float] = None,
-        kv_block_scales: Optional[
-            Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-        ] = None,
+        kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
     ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
         r"""Compute batch decode attention between query and paged kv cache.
 
@@ -1288,14 +1291,22 @@ class BatchDecodeWithPagedKVCacheWrapper:
             enable_pdl = device_support_pdl(q.device)
         k_cache, v_cache = _unpack_paged_kv_cache(paged_kv_cache, self._kv_layout)
 
-        # Unpack kv_block_scales
+        if (
+            k_cache.dtype == torch.uint8 or v_cache.dtype == torch.uint8
+        ) and kv_cache_sf is None:
+            raise ValueError("kv_cache_sf must be provided for NVFP4 KV cache.")
--
 
     return SimpleNamespace(gdn_prefill=gdn_prefill)
 
 
-def chunk_gated_delta_rule_hopper(
+@flashinfer_api
+def chunk_gated_delta_rule(
     q: torch.Tensor,
     k: torch.Tensor,
     v: torch.Tensor,
@@ -104,6 +106,9 @@ def chunk_gated_delta_rule_hopper(
     use_qk_l2norm_in_kernel: bool = False,
     output: Optional[torch.Tensor] = None,
     output_state: Optional[torch.Tensor] = None,
+    state_checkpoints: Optional[torch.Tensor] = None,
+    checkpoint_cu_starts: Optional[torch.Tensor] = None,
+    checkpoint_every_n_tokens: int = 0,
 ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
     r"""Chunked Gated Delta Rule (GDN) attention for prefill.
 
@@ -111,12 +116,82 @@ def chunk_gated_delta_rule_hopper(
     training and inference. Supports both GQA (grouped query attention) and GVA
     (grouped value attention) configurations.
 
+    Args:
+        q (torch.Tensor):
--
-
-@backend_requirement(
-    {},
-    common_check=_check_gdn_prefill,
-)
-@flashinfer_api
-def chunk_gated_delta_rule(
-    q: torch.Tensor,
-    k: torch.Tensor,
-    v: torch.Tensor,
-    g: Optional[torch.Tensor] = None,
-    beta: Optional[torch.Tensor] = None,
-    scale: Optional[float] = None,
-    initial_state: Optional[torch.Tensor] = None,
-    output_final_state: bool = False,
-    cu_seqlens: Optional[torch.Tensor] = None,
-    use_qk_l2norm_in_kernel: bool = False,
-    output: Optional[torch.Tensor] = None,
-    output_state: Optional[torch.Tensor] = None,
-) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
-    r"""Chunked Gated Delta Rule (GDN) attention for prefill.
-
-    Args:
-        q (torch.Tensor):
-            Queries of shape ``[total_seq_len, num_q_heads, head_size]``.
-            Must be contiguous and on CUDA.
--
+
+@backend_requirement(
+    {},
+    common_check=_check_group_gemm_nvfp4_nt_groupwise_problem_size,
+)
+@flashinfer_api
+def group_gemm_nvfp4_nt_groupwise(
+    a: torch.Tensor,  # (cum_m, k)
+    b: torch.Tensor,  # (batch_size, n, k // 2)
+    a_scale: torch.Tensor,  # (cum_m_padded, k // 16)
+    b_scale: torch.Tensor,  # (batch_size, n_padded, k // 16)
+    m_indptr: torch.Tensor,  # (batch_size + 1, )
+    alpha: Optional[torch.Tensor] = None,  # (batch_size, )
+    tile_m: int = 128,
+    tile_n: int = 128,
+    tile_k: int = 128,
+    out: Optional[torch.Tensor] = None,  # (cum_m, n)
+    out_dtype: Optional[torch.dtype] = None,
+) -> torch.Tensor:
+    r"""Perform group GEMM with NVFP4 data types using groupwise scaling. Currently only implemented on NVIDIA
+    Blackwell Geforce, and DGX Spark architectures.
+
+    Parameters
+    ----------
+    a: torch.Tensor
+        Row-major input tensor, shape ``(cum_m, k // 2)``, data type is ``torch.uint8`` (packed NVFP4).
--
+        "Unsupported output dtype for fused_rmsnorm_silu: "
+        f"{dtype}. Supported dtypes: bfloat16, float8_e4m3fn, float4_e2m1fn_x2"
+    )
+
+
+@flashinfer_api
+def fused_rmsnorm_silu(
+    input: torch.Tensor,
+    weight: torch.Tensor,
+    eps: float = 1e-6,
+    out: Optional[torch.Tensor] = None,
+    block_scale: Optional[torch.Tensor] = None,
+) -> Union[torch.Tensor, tuple]:
+    r"""Fused RMSNorm + SiLU activation.
+
+    ``out[i] = SiLU(RMSNorm(input[i], weight, eps))``
+
+    where ``SiLU(x) = x / (1 + exp(-x))``
+
+    Optimized for SM100 (B200) for WAN VAE decoder problem sizes.
+    Other shapes and architectures (SM80+) use conservative fallback heuristics.
+
+    Parameters
+    ----------
+    input: torch.Tensor
+        Input tensor, shape ``(num_tokens, hidden_size)``, dtype ``bfloat16``.
--
+            f"(sf_vec_size=16, sf_use_ue8m0=False) for NVFP4, "
+            f"(sf_vec_size=32, sf_use_ue8m0=True) for MXFP4."
+        )
+
+
 @flashinfer_api
 def block_scale_interleave(unswizzled_sf: torch.Tensor) -> torch.Tensor:
     """Swizzle block scale tensor for FP4 format.
@@ -833,55 +931,95 @@ def nvfp4_quantize(
     do_shuffle=False,
     sf_vec_size=16,
     enable_pdl=None,
+    backend: str = "cuda",
 ):
     """
     Quantize input tensor to NVFP4 format.
 
     Parameters:
-        a (torch.Tensor): Input tensor of shape [M, K] with dtype fp16/bf16.
+        a (torch.Tensor): Input tensor of shape [M, K] with dtype fp16/bf16/float8_e4m3fn.
         a_global_sf (torch.Tensor): Global scale factor of shape [1] with dtype float32.
         sfLayout (SfLayout, optional): Scale factor layout. Defaults to SfLayout.layout_128x4.
         do_shuffle (bool, optional): Whether to shuffle the scale factors. Defaults to False. Only TRTLLM backend needs to shuffle the tensor B scale factors.
         sf_vec_size (int, optional): Scale factor vector size. Defaults to 16.
         enable_pdl (Optional[bool], optional): Whether to enable PDL (Programmatic Dependent Launch).
             If None, automatically detects based on device capability. Defaults to None.
--
 
-    return kv_cache_fp4, kv_block_scales, k_gs_ret, v_gs_ret
+    return kv_cache_fp4, kv_cache_sf, k_gs_ret, v_gs_ret
 
 
 @flashinfer_api
diff --git a/flashinfer/quantization/kernels/__init__.py b/flashinfer/quantization/kernels/__init__.py
index 7e99b74a..0f30455b 100644
--- a/flashinfer/quantization/kernels/__init__.py
+++ b/flashinfer/quantization/kernels/__init__.py
@@ -27,6 +27,7 @@ SM100+ (Blackwell) GPUs and the nvidia-cutlass-dsl package.
 """
 
 from .mxfp4_quantize import (
+    MXFP4QuantizeLinearKernel,
     MXFP4QuantizeSwizzledKernel,
     mxfp4_quantize_cute_dsl,
 )
@@ -35,11 +36,18 @@ from .mxfp8_quantize import (
     MXFP8QuantizeSwizzledKernel,
     mxfp8_quantize_cute_dsl,
 )
+from .nvfp4_quantize import (
+    NVFP4QuantizeSwizzledKernel,
+    nvfp4_quantize_cute_dsl,
+)
--
+        )
+
+        return compiled_kernel, swizzled_obj.rows_per_block
 
 
 @flashinfer_api
 def mxfp4_quantize_cute_dsl(
     input: torch.Tensor,
+    sf_layout: int = SF_LAYOUT_128x4,
     enable_pdl: bool | None = None,
 ) -> Tuple[torch.Tensor, torch.Tensor]:
     """
     Quantize input tensor to MXFP4 format using CuTe-DSL kernel.
 
-    This is a GPU implementation matching FlashInfer's mxfp4_quantize() behavior:
-    - Global scale computed as (448 * 6) / max(|input|)
-    - UE8M0 scale factors
-    - E2M1 output format (4-bit, 2 values per byte)
-    - Swizzled (128x4) scale factor layout
+    This is a GPU implementation with dual-path optimization:
+    - LINEAR layout: flat SF-block based iteration with adaptive 1T/4T per SF
+      block dispatch — uses 4T/SF on low-SM GPUs (<=80 SMs) for coalesced
+      memory access, and 1T/SF on high-SM GPUs where enough SMs generate
+      sufficient outstanding memory requests
+    - SWIZZLED layout: row-based iteration with padding fast path (optimized)
 
--
+    )
+
+    return compiled_kernel, kernel_obj.rows_per_block
+
+
+@flashinfer_api
+def nvfp4_quantize_cute_dsl(
+    input: torch.Tensor,
+    global_scale: torch.Tensor,
+    sf_layout: int = SF_LAYOUT_128x4,
+    enable_pdl: bool | None = None,
+) -> Tuple[torch.Tensor, torch.Tensor]:
+    """
+    Quantize input tensor to NVFP4 format using CuTe-DSL kernel.
+
+    This is a GPU implementation matching FlashInfer's nvfp4_quantize() behavior:
+    - E4M3 scale factors (FP8)
+    - E2M1 output format (4-bit, 2 values per byte)
+    - Supports 128x4, 8x4, and linear scale factor layouts
+    - sf_vec_size=16
+
+    The kernel is compiled once per (K, dtype, sf_layout, pdl) combination and
+    handles varying M (batch size) at runtime without recompilation.
+
+    Args:
+        input: Input tensor of shape [M, K] with dtype fp16/bf16/float8_e4m3fn

prefill.py BatchPrefillWithPagedKVCacheWrapper.run() and trtllm_batch_context_with_kv_cache() overload stubs fall outside the grep window above:

$ git diff v0.6.7.post3..main -- 'flashinfer/prefill.py' | grep -B5 -A10 'kv_block_scales|kv_cache_sf'
         if backend == "cudnn":
@@ -2098,9 +2104,7 @@ class BatchPrefillWithPagedKVCacheWrapper:
         enable_pdl: Optional[bool] = None,
         window_left: Optional[int] = None,
         sinks: Optional[torch.Tensor] = None,
-        kv_block_scales: Optional[
-            Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-        ] = None,
+        kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
         skip_softmax_threshold_scale_factor: Optional[float] = None,
     ) -> torch.Tensor: ...
 
@@ -2118,9 +2122,7 @@ class BatchPrefillWithPagedKVCacheWrapper:
         enable_pdl: Optional[bool] = None,
         window_left: Optional[int] = None,
         sinks: Optional[torch.Tensor] = None,
-        kv_block_scales: Optional[
-            Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-        ] = None,
+        kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
         skip_softmax_threshold_scale_factor: Optional[float] = None,
     ) -> Tuple[torch.Tensor, torch.Tensor]: ...
 
@@ -2139,9 +2141,7 @@ class BatchPrefillWithPagedKVCacheWrapper:
         enable_pdl: Optional[bool] = None,
         window_left: Optional[int] = None,
         sinks: Optional[torch.Tensor] = None,
-        kv_block_scales: Optional[
-            Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-        ] = None,
+        kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
         skip_softmax_threshold_scale_factor: Optional[float] = None,
     ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
         r"""Compute batch prefill/append attention between query and paged kv-cache.
@@ -2181,6 +2181,21 @@ class BatchPrefillWithPagedKVCacheWrapper:
         enable_pdl : bool
             Whether to enable Programmatic Dependent Launch (PDL). See https://docs.nvidia.com/cuda/cuda-c-programming-guide/#programmatic-dependent-launch-and-synchronization
             Only supported for >= sm90, and currently only for FA2 and CUDA core decode.
+        kv_cache_sf : Optional[Tuple[torch.Tensor, torch.Tensor]]
+            Per-block scale factors for NVFP4 KV cache, as a tuple of ``(k_scales, v_scales)``.
+            Scale tensors must follow the same :attr:`kv_layout` as the KV cache:
+
+            * **HND**: ``[num_pages, num_kv_heads, page_size, head_dim // 16]``
+            * **NHD**: ``[num_pages, page_size, num_kv_heads, head_dim // 16]``
+
+            Both tensors have dtype ``torch.float8_e4m3fn``. ``k_scales`` uses a linear
+            (row-major) layout, while ``v_scales`` must use TRT-LLM's 4-token interleaved
+            layout within each ``[page_size, head_dim // 16]`` tile. Use
+            :func:`flashinfer.fp4_quantization.nvfp4_quantize_paged_kv_cache` to produce
--
         Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
@@ -2212,14 +2227,22 @@ class BatchPrefillWithPagedKVCacheWrapper:
                     f"where total_tokens = qo_indptr[-1]."
                 )
 
-        # Unpack kv_block_scales
+        if (
+            k_cache.dtype == torch.uint8 or v_cache.dtype == torch.uint8
+        ) and kv_cache_sf is None:
+            raise ValueError("kv_cache_sf must be provided for NVFP4 KV cache.")
         key_block_scales = None
         value_block_scales = None
-        if kv_block_scales is not None:
-            if isinstance(kv_block_scales, tuple):
-                key_block_scales, value_block_scales = kv_block_scales
-            else:
-                key_block_scales, value_block_scales = kv_block_scales.unbind(dim=1)
+        if kv_cache_sf is not None:
+            if (
+                not isinstance(kv_cache_sf, (tuple, list))
+                or len(kv_cache_sf) != 2
+                or not all(torch.is_tensor(x) for x in kv_cache_sf)
+            ):
+                raise TypeError(
+                    "kv_cache_sf must be a tuple/list of two tensors: (k_scales, v_scales)."
+                )
+            key_block_scales, value_block_scales = kv_cache_sf
 
         o_dtype = self._cached_o_data_type
         if out is not None and out.dtype != o_dtype:
@@ -2265,7 +2288,7 @@ class BatchPrefillWithPagedKVCacheWrapper:
 
         # For NVFP4 KV (uint8 packed), v_cache last dim is head_dim//2;
         # use q's head_dim for output instead
-        out_head_dim = q.shape[-1] if kv_block_scales is not None else v_cache.shape[-1]
+        out_head_dim = q.shape[-1] if kv_cache_sf is not None else v_cache.shape[-1]
         if out is None:
             # Use cached output data type if available (for FP8 attention with FP16 output)
             out_dtype = getattr(self, "_cached_o_data_type", None) or q.dtype
@@ -2355,7 +2378,19 @@ class BatchPrefillWithPagedKVCacheWrapper:
                 enable_pdl,
             ]
             if self._jit_module is not None:
-                run_args.extend(list(args))
+                run_args.extend(
+                    prepare_jit_additional_args(
--
     attention_sinks: Optional[torch.Tensor] = None,
@@ -3731,9 +3781,7 @@ def trtllm_batch_context_with_kv_cache(
     kv_layout: str = "HND",
     enable_pdl: Optional[bool] = None,
     sinks: Optional[List[torch.Tensor]] = None,
-    kv_block_scales: Optional[
-        Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]
-    ] = None,
+    kv_cache_sf: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
     skip_softmax_threshold_scale_factor: Optional[float] = None,
     uses_shared_paged_kv_idx: bool = True,
 ) -> Union[torch.Tensor, FP4Tensor]:
@@ -3800,11 +3848,21 @@ def trtllm_batch_context_with_kv_cache(
         data copy overhead. Use ``HND`` for better performance.
     sinks : Optional[List[torch.Tensor]] = None
         additional value per head in the denominator of the softmax.
-    kv_block_scales : Optional[Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]] = None
-        Per-block scale factors for NVFP4 KV cache. Either a tuple of (k_scales, v_scales) or
-        a single tensor with shape ``[num_pages, 2, ...]`` that will be unbound along dim=1.
-        Each scale tensor has shape ``[num_pages, num_kv_heads, page_size, head_dim // 16]``
-        in HND layout, with dtype ``torch.float8_e4m3fn``.
+    kv_cache_sf : Optional[Tuple[torch.Tensor, torch.Tensor]] = None
+        Per-block scale factors for NVFP4 KV cache, as a tuple of ``(k_scales, v_scales)``.
+        Scale tensors must follow the same :attr:`kv_layout` as the KV cache:
+
+        * **HND**: ``[num_pages, num_kv_heads, page_size, head_dim // 16]``
+        * **NHD**: ``[num_pages, page_size, num_kv_heads, head_dim // 16]``
+
+        Both tensors have dtype ``torch.float8_e4m3fn``. ``k_scales`` uses a linear
+        (row-major) layout, while ``v_scales`` must use TRT-LLM's 4-token interleaved
+        layout within each ``[page_size, head_dim // 16]`` tile. Use
+        :func:`flashinfer.fp4_quantization.nvfp4_quantize_paged_kv_cache` to produce
--
 
@@ -3845,20 +3903,22 @@ def trtllm_batch_context_with_kv_cache(
             # it doesn't change underlying storage
             k_cache, v_cache = kv_cache.unbind(dim=1)
 
-    # Unpack kv_block_scales
+    if (
+        k_cache.dtype == torch.uint8 or v_cache.dtype == torch.uint8
+    ) and kv_cache_sf is None:
+        raise ValueError("kv_cache_sf must be provided for NVFP4 KV cache.")
     key_block_scales = None
     value_block_scales = None
-    if kv_block_scales is not None:
-        if isinstance(kv_block_scales, tuple):
-            key_block_scales, value_block_scales = kv_block_scales
-        else:
-            if kv_block_scales.shape[1] == 1:
-                key_block_scales, value_block_scales = kv_block_scales, kv_block_scales
-            else:
-                assert kv_block_scales.shape[1] == 2, (
-                    "When kv_block_scales is a single tensor, the second dimension must be 1 or 2"
-                )
-                key_block_scales, value_block_scales = kv_block_scales.unbind(dim=1)
+    if kv_cache_sf is not None:
+        if (
+            not isinstance(kv_cache_sf, (tuple, list))
+            or len(kv_cache_sf) != 2
+            or not all(torch.is_tensor(x) for x in kv_cache_sf)
+        ):
+            raise TypeError(
+                "kv_cache_sf must be a tuple/list of two tensors: (k_scales, v_scales)."
+            )
+        key_block_scales, value_block_scales = kv_cache_sf
 
     # Convert NHD layout to HND if necessary
     if kv_layout == "NHD":

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The project version was incremented from 0.6.7 to 0.6.8 in the version.txt file. This is a standard version bump with no functional code changes.

Changes

Cohort / File(s) Summary
Version Increment
version.txt
Updated version from 0.6.7 to 0.6.8

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~1 minute

Poem

🐰 A hop, a skip, a version new,
From point-six-seven to point-six-two,
The numbers dance, so small, so slight,
Yet progress marches on through the night! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'bump version to 0.6.8' is concise, clear, and directly reflects the main change of updating the version file from 0.6.7 to 0.6.8.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description includes all required sections from the template: a clear description of the changes (version bump), related issues with a proper link, and reviewer notes with detailed API changes analysis.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bump-version-0.6.8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the version number in version.txt from 0.6.7 to 0.6.8. I have no feedback to provide.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 13, 2026

PR Review: Version Bump to 0.6.8

The change itself is correct — a single-line edit to version.txt (0.6.7 to 0.6.8). This is the right place for the version string (read by build_backend.py), and the patch increment is consistent with the project versioning scheme.

Open issues labeled v0.6.8:

Two bugs remain open at time of review and both are labeled for this milestone:

Issue 3029 — test_trtllm_gen_attention.py fails with AssertionError on GB200 (head_dim=256 precision regression, suspected introduced by PR 2988). Risk: Medium — correctness regression in a shipped attention path. PR 2988 already skipped xqa+head_dim=256 for a precision issue; this suggests the underlying problem is broader than that skip. Recommendation: fix the precision or explicitly skip trtllm-gen + head_dim=256 with a tracking comment before releasing.

Issue 3030 — test_prefill_delta_rule OOM-kills nvcc on H100 during JIT compilation (suspected introduced by PR 2908, which doubled kernel variants from 32 to 64, creating excessive parallel nvcc memory pressure; exit code 137). Risk: Low-medium — affects CI reliability and user JIT builds on memory-constrained machines. Recommendation: the suggested fix (cap MAX_JOBS in CI) is low-risk and easy to land before the release tag.

Minor notes: no changelog entry is updated, but reviewing past version-bump PRs suggests that is intentional for this repo. auto-merge is disabled — worth verifying CI is fully green before enabling.

Overall the mechanical change is clean. The main question is whether issue 3029 (correctness) and issue 3030 (OOM/CI reliability) are considered blocking for the 0.6.8 release.

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 14, 2026

after this pull/2882 gets merged, i will bot-run this PR

cc @cindyzxq

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 14, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !545 has been created, and the CI pipeline #48467669 is currently running. I'll report back once the pipeline job completes.

@aleozlx
Copy link
Copy Markdown
Collaborator Author

aleozlx commented Apr 14, 2026

H100:

=========== 1 failed, 1462 passed, 580 skipped in 363.49s (0:06:03) ============
❌ FAILED: tests/gdn/test_prefill_delta_rule.py

This is tracked in issue #3030 already

@aleozlx aleozlx merged commit 8063bc5 into main Apr 14, 2026
20 checks passed
@aleozlx aleozlx deleted the bump-version-0.6.8 branch April 14, 2026 16:25
@coderabbitai coderabbitai Bot mentioned this pull request Apr 23, 2026
This was referenced May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants