[feat] Trtllm-gen Per-token Nvfp4 MoE by IwakuraRein · Pull Request #3027 · flashinfer-ai/flashinfer

IwakuraRein · 2026-04-09T22:59:37Z

📌 Description

This PR aims to enable the per-token quantization for Trtllm-gen MoE.

In trtllm_fp4_block_scale_moe and trtllm_fp4_block_scale_routed_moe, added a new optional argument per_token_scale.
Optimize fp4 quantization kernel. Use 256bit vectorized load. Add cvt_warp_fp16_to_fp4_with_vec_max to use the cached local amax.
Add explicit amax and quantization kernel after FC1 to generate the per-token scales for FC2.
Generate the new cubins.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Per‑token FP4 quantization with a new nvfp4_quant_and_per_token_scale API exposed to Python.
- Per‑token scaling integrated end‑to‑end across fused MoE paths; FP4 quantize adds row‑wise and inverse‑global‑scale controls.
- FP4 API now returns packed FP4 weights, block scales, and per‑token scales.
Chores
- Updated backend artifact reference and checksum for batched GEMM.
Tests
- Added end‑to‑end per‑token routed fused MoE test.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

coderabbitai · 2026-04-09T22:59:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds NVFP4 per‑token quantization producing per‑token scales, extends FP4 quantization with row‑wise and inverse‑scale options, refactors quantization kernels/dispatch and host launchers, threads per‑token scaling through fused MoE runners/launchers/GEMM, and exposes new Python APIs and tests.

Changes

Cohort / File(s)	Summary
Quant kernels & helpers `csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh`, `csrc/nv_internal/tensorrt_llm/kernels/quantization_utils.cuh`, `csrc/nv_internal/cpp/kernels/quantization.cu`	Added packed-vector load utilities and `CVT_FP16_TO_FP4_ELTS_PER_THREAD`; introduced `cvt_warp_fp16_to_fp4_with_vec_max`; added NVFP4 per‑token kernels (`nvfp4QuantAndPerTokenScale*`); refactored FP4/MxFP8 kernels to support `USE_ROW_WISE_SCALE`/`USE_INVERSE_SCALE`, renamed `SFOuput`→`SFOutput`, adjusted block sizing, and updated template/packed-output types.
Host kernel headers & launchers `csrc/nv_internal/tensorrt_llm/kernels/quantization.h`, `csrc/nv_internal/cpp/kernels/quantization.cu`	Added `invokeNvfp4QuantAndPerTokenScale` and `invokeRowWiseAmax`; extended `invokeFP4Quantization`/`invokeMxFP8Quantization` signatures (renamed SF output param, added `use_row_wise_scale` and `inverse_scale`), changed dispatch branches and dynamic shared memory attribute placement.
FP4 Python ops & bindings `csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp`, `csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h`, `flashinfer/quantization/fp4_quantization.py`, `flashinfer/quantization/__init__.py`, `flashinfer/fp4_quantization.py`, `flashinfer/__init__.py`	Added SM100 custom op `nvfp4_quant_and_per_token_scale`, exposed new Python API, added `is_global_scale_inversed` flag to FP4 quantize callsites, and re-exported the new function.
Fused MoE runner / launcher / headers `include/flashinfer/trtllm/fused_moe/runner.h`, `csrc/trtllm_fused_moe_runner.cu`, `csrc/trtllm_fused_moe_kernel_launcher.cu`	Threaded per‑token scaling: added `usePerTokenScaling` and `useExplicitQuantization`, updated PermuteGemm1/Gemm2 runner constructors and `run()` signatures to accept per‑token scales, added `token_scales_fc2` workspace, and added explicit quant path invoking NVFP4 per‑token kernel.
Batched GEMM runner `include/flashinfer/trtllm/batched_gemm/KernelRunner.h`, `csrc/trtllm_batched_gemm_runner.cu`	Added `usePerTokenScaling` option and gating in candidate config filtering to skip configs incompatible with per‑token scaling.
Python MoE API & tests `flashinfer/fused_moe/core.py`, `tests/moe/test_trtllm_gen_per_token_moe.py`	Added `per_token_scale` field and `use_per_token_scaling` plumbing in MoE inputs/runner/APIs, updated custom‑op signatures to accept per‑token scales, and added a routed per‑token MoE test.
Build/artifacts/jit & enums `flashinfer/artifacts.py`, `flashinfer/tllm_enums.py`, `flashinfer/jit/fused_moe.py`	Updated TRTLLM_GEN_BMM artifact path/checksum, changed dtype deduction to use element counts (`numel()`), and included `quantization.cu` in fused‑MoE JIT compilation units.
Exports & re‑exports `flashinfer/quantization/__init__.py`, `flashinfer/fp4_quantization.py`, `flashinfer/__init__.py`	Re-exported `nvfp4_quant_and_per_token_scale`, added Python API wrapper, and updated public FP4 quantization exports.

Sequence Diagram

sequenceDiagram
    participant Host as Host
    participant Launcher as MoE Launcher
    participant NVKernel as NVFP4 Kernel
    participant FP4Kernel as FP4 Quant Kernel
    participant Gemm2 as Gemm2 / FC2

    Host->>Launcher: call fused MoE with input + per_token_scales?
    Launcher->>Launcher: allocate workspace (token_scales_fc2) if needed
    Launcher->>NVKernel: invokeNvfp4QuantAndPerTokenScale(input, globalScaleInv, sfLayout, ...)
    activate NVKernel
    NVKernel->>NVKernel: per-row amax reduce → compute per-token scales
    NVKernel-->>Launcher: write perTokenScaleOutput, weightOutput, scaleOutput
    deactivate NVKernel

    Launcher->>FP4Kernel: invokeFP4Quantization(FC1_output, perTokenScales?, use_row_wise/inverse flags)
    activate FP4Kernel
    FP4Kernel->>FP4Kernel: apply per-token or row-wise/inverse scale, quantize to FP4, emit block scales
    FP4Kernel-->>Launcher: packed FP4 activations + block scales
    deactivate FP4Kernel

    Launcher->>Gemm2: run Gemm2(perTokenScales_fc2, quantized_activation, weights, scales)
    activate Gemm2
    Gemm2-->>Host: final MoE output
    deactivate Gemm2

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

chore: cute dsl nvfp4 moe clean up #2775: Related NVFP4/FP4 MoE quantization changes, per‑token scaling plumbing, and PDL/runtime parameter adjustments.
[feat] trtllm-gen mxfp8 gemm #2653: Overlaps FP4/FP8 quantization dispatch and SF layout handling modifications.
Bf16 routed moe #2594: Overlaps fused MoE launcher/runner constructor and wiring changes (per‑token inputs and routing).

Suggested labels

run-ci

Suggested reviewers

bkryu
yzh119
aleozlx
djmmoss
cyx-6
jimmyzho
nv-yunzheq
jiahanc

Poem

🐇 I hopped through kernels, tuned each token's scale,
Packed tiny FP4 whispers down a leafy trail.
Per‑token math in burrows tight,
MoE and quant together—what a night!
A rabbit cheers the merge, fast and hale.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[feat] Trtllm-gen Per-token Nvfp4 MoE' clearly describes the main feature: enabling per-token NVFP4 MoE support for TrtLLM-gen.
Description check	✅ Passed	The PR description covers the main objectives and includes a checklist, but the test section is incomplete with tests not added.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces support for per-token scaling in FP4 quantization for MoE models, including updates to the quantization kernels, the MoE runner, and the Python interface. The changes enable row-wise Amax calculation and quantization for FP4, and integrate these into the fused MoE pipeline. My review identified several issues: missing header includes for cuda::std::maximum, potential integer overflow in the Amax kernel, efficiency improvements for the Amax kernel, and the need to expand type support for NvFP4. Additionally, I pointed out unused variables and the use of magic numbers that should be replaced with named constants.

gemini-code-assist · 2026-04-09T23:03:30Z

 #include <cudaTypedefs.h>
 #include <float.h>

+#include <cub/cub.cuh>


The use of cuda::maximum (or cuda::std::maximum) in the rowWiseAmaxKernel requires including <cuda/std/functional>. Without this, compilation might fail depending on the CUDA toolkit version and transitive includes.

#include <cub/cub.cuh> #include <cuda/std/functional>

gemini-code-assist · 2026-04-09T23:03:30Z

+    FLASHINFER_CHECK(mGemm2.mDtypeAct == btg::Dtype::E2m1,
+                     "Currently only support NvFP4 when using explicit quantization.");


The check mGemm2.mDtypeAct == btg::Dtype::E2m1 is too restrictive. NvFP4 also supports MxE2m1 (vector size 32). This check should be expanded to allow MxE2m1, and the subsequent call to invokeFP4Quantization should dispatch to the correct template instantiation (16 or 32) based on the actual type of mGemm2.mDtypeAct.

gemini-code-assist · 2026-04-09T23:03:30Z

+template <typename T, uint32_t BLOCK_SIZE>
+__global__ void rowWiseAmaxKernel(uint32_t m, uint32_t n, T const* input, float* amaxOutput, float scale) {
+  uint32_t rowIdx = blockIdx.x;
+  if (rowIdx >= m) return;
+
+  float localMax = 0.f;
+  for (uint32_t colIdx = threadIdx.x; colIdx < n; colIdx += blockDim.x) {
+    T element = input[rowIdx * n + colIdx];
+    localMax = fmaxf(localMax, fabsf(static_cast<float>(element) * scale));
+  }
+
+  using BlockReduce = cub::BlockReduce<float, BLOCK_SIZE>;
+  __shared__ typename BlockReduce::TempStorage tempStorage;
+  float blockMax = BlockReduce(tempStorage)
+                       .Reduce(
+                           localMax,
+#if CUDART_VERSION >= 12090
+                           cuda::maximum<> {}
+#else
+                           cub::Max(),
+#endif
+                       );
+
+  if (threadIdx.x == 0) {
+    amaxOutput[rowIdx] = blockMax;
+  }
+}


There are two issues in rowWiseAmaxKernel:

Efficiency: Multiplying by scale inside the loop performs n multiplications per thread. It is more efficient to compute the maximum of absolute values first and multiply by scale once at the end when writing to global memory.

Correctness (Overflow): The indexing rowIdx * n + colIdx uses uint32_t. If the product exceeds $2^{32}$ (possible with large token counts and intermediate sizes), it will overflow. Using static_cast<size_t>(rowIdx) * n prevents this.

template <typename T, uint32_t BLOCK_SIZE> __global__ void rowWiseAmaxKernel(uint32_t m, uint32_t n, T const* input, float* amaxOutput, float scale) { uint32_t rowIdx = blockIdx.x; if (rowIdx >= m) return; float localMax = 0.f; for (uint32_t colIdx = threadIdx.x; colIdx < n; colIdx += blockDim.x) { T element = input[static_cast<size_t>(rowIdx) * n + colIdx]; localMax = fmaxf(localMax, fabsf(static_cast<float>(element))); } using BlockReduce = cub::BlockReduce<float, BLOCK_SIZE>; __shared__ typename BlockReduce::TempStorage tempStorage; float blockMax = BlockReduce(tempStorage) .Reduce( localMax, #if CUDART_VERSION >= 12090 cuda::std::maximum<> {} #else cub::Max(), #endif ); if (threadIdx.x == 0) { amaxOutput[rowIdx] = blockMax * scale; } }

gemini-code-assist · 2026-04-09T23:03:30Z

+        workspace.token_scales_fc2 != nullptr,
+        "workspace.token_scales_fc2 must be provided When using explicit quantization.");
+    const int mMultiProcessorCount = tensorrt_llm::common::getMultiProcessorCount();
+    int intermediate_size_factor = isGatedActivation(args.activation_type) ? 2 : 1;


The variable intermediate_size_factor is calculated but never used in the subsequent logic. It should be removed to keep the code clean.

gemini-code-assist · 2026-04-09T23:03:30Z

+    auto sfLayout = mGemm2.mTileTokensDim >= 128 ? QuantizationSFLayout::SWIZZLED_128x4
+                                                 : QuantizationSFLayout::SWIZZLED_8x4;


The selection of sfLayout based on a hardcoded threshold (mTileTokensDim >= 128) is a heuristic. As noted in the FIXME, this should ideally be determined from the actual kernel configuration to ensure compatibility with the GEMM2 kernel's expected layout.

gemini-code-assist · 2026-04-09T23:03:30Z

+    invokeRowWiseAmax<__nv_bfloat16>(workspace.total_max_padded_tokens, args.intermediate_size,
+                                     reinterpret_cast<__nv_bfloat16*>(workspace.gemm1_output),
+                                     reinterpret_cast<float*>(workspace.token_scales_fc2),
+                                     1.f / 448.f / 6.f, stream);


The scale factor 1.f / 448.f / 6.f is a magic number. It should be defined as a named constant (e.g., based on FP8 and FP4 max values) to improve code clarity and maintainability.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

aleozlx · 2026-04-24T02:20:00Z

there seems to be some relevant bot run errors

IwakuraRein · 2026-04-24T03:45:26Z

/bot run

flashinfer-bot · 2026-04-24T03:46:16Z

GitLab MR !555 has been updated with latest changes, and the CI pipeline #49357090 is currently running. I'll report back once the pipeline job completes.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein · 2026-04-24T20:00:44Z

/bot run

flashinfer-bot · 2026-04-24T20:01:41Z

GitLab MR !555 has been updated with latest changes, and the CI pipeline #49427266 is currently running. I'll report back once the pipeline job completes.

zianglih · 2026-04-27T22:32:19Z

Further added TE style reference implementation and TE_EXACT_NVFP4. Now ["random", "boundary", "zeros", "maxes"] cases in test_nvfp4_per_token_quantize_te_reference are bitwise exact with TE reference implementation.

IwakuraRein · 2026-04-28T04:05:47Z

/bot run

flashinfer-bot · 2026-04-28T04:06:13Z

GitLab MR !555 has been updated with latest changes, and the CI pipeline #49669938 is currently running. I'll report back once the pipeline job completes.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein · 2026-04-29T02:47:50Z

/bot run

flashinfer-bot · 2026-04-29T02:48:09Z

GitLab MR !555 has been updated with latest changes, and the CI pipeline #49770024 is currently running. I'll report back once the pipeline job completes.

aleozlx · 2026-04-30T20:59:53Z

tests look good so far

restarting CI for merging

## 📌 Description Optimize the performance of the per-token nvfp4 quantization kernel introduced by #3027. 1. default block size to 128. 2. default to fast math path. rename `TE_EXACT_FP4` to `TRTLLM_DISABLE_FP4_QUANT_FAST_MATH` and controlled by environmental variable. 3. change argument list of `get_sf_out_offset_128x4` and `get_sf_out_offset_8x4`. TODOs: 1. optimize low latency cases. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * Added environment variable configuration to disable fast-math optimization in FP4 quantization, enabling behavior alignment with alternative implementations. * **Tests** * Added test fixture to validate FP4 quantization functionality with fast-math mode disabled.  --------- Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Ziang Li <ziangli@umich.edu>

IwakuraRein added 8 commits April 8, 2026 22:27

fix typos

5f13582

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

WIP

d61d597

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

WIP

aed7865

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

explicit amax and fp4 quant

e94e34c

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

update core.py

07c284e

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

fix quantization bug when tokens >= 1024

1fca13e

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

fix MoERunner initializer

cb1717f

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

fix deduce_trtllm_gen_tensor_dtype

d4ff51e

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

flashinfer-bot added the op: moe label Apr 9, 2026

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

IwakuraRein added 16 commits April 9, 2026 23:28

use expanded_idx_to_permuted_idx in amax kernel

9011f09

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

add nvfp4QuantAndPerTokenScaleKernel

69fd614

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

update test; fix pre-commit

05e6f55

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

make test smaller

7ea8538

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

fix typo

a41b623

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

perf opt

6e05cf9

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

guard expandedIdxToPermutedIdx

ef5386e

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

make sf layout as a template parameter

248162a

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

cache the local amax in smem

2b303a9

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

use 256 bit vectorized load; create python binding

56f1f6a

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

no need for inline ptx for vectorized loading

b60d0f8

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

update api; default per_token_scale to None; update artifacts

bfe8d2b

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

kernel code clean up

a9b4968

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

reduce test memory usage

9e37f75

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

Merge remote-tracking branch 'upstream/main' into per-token-fp4

f164ca9

update checksum

884500f

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein marked this pull request as ready for review April 15, 2026 21:24

IwakuraRein requested review from aleozlx and yzh119 as code owners April 15, 2026 21:24

IwakuraRein added 2 commits April 24, 2026 17:48

remove cub::Max{} as Flashinfer is using cccl 3.0

17bc44e

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

add usePerChannelScaling option

8bf05fc

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein force-pushed the per-token-fp4 branch from ee9e4fb to 8bf05fc Compare April 24, 2026 18:20

minor fix

f10fc4c

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

zianglih added 6 commits April 27, 2026 10:44

Merge branch 'main' into siyuan/per-token-fp4

99ad4bc

Add TE_EXACT_NVFP4 for bitwise align with TE

014327a

Fold api and expand test

e0358d2

Simplify te reference

ab993f4

Clean up moe test

1b3ae2b

Tighten tests

c6eb12d

zianglih mentioned this pull request Apr 27, 2026

[Roadmap] Blackwell MXFP8 and NVFP4 RL training radixark/miles#615

Open

30 tasks

revert the cubin hash

40379ee

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

Merge branch 'main' into per-token-fp4

00bd14a

Merge branch 'main' into per-token-fp4

6295fa1

aleozlx enabled auto-merge (squash) April 30, 2026 21:00

aleozlx merged commit 537a3b5 into flashinfer-ai:main May 1, 2026
30 checks passed

IwakuraRein mentioned this pull request May 5, 2026

perf: optimize per-token nvfp4 quantization kernel. #3237

Merged

5 tasks

coderabbitai Bot mentioned this pull request May 7, 2026

Support 4over6 nvfp4 for quantizer and fused MoE #3264

Open

5 tasks

		FLASHINFER_CHECK(mGemm2.mDtypeAct == btg::Dtype::E2m1,
		"Currently only support NvFP4 when using explicit quantization.");

		auto sfLayout = mGemm2.mTileTokensDim >= 128 ? QuantizationSFLayout::SWIZZLED_128x4
		: QuantizationSFLayout::SWIZZLED_8x4;

Conversation

IwakuraRein commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Apr 24, 2026

Uh oh!

IwakuraRein commented Apr 24, 2026

Uh oh!

flashinfer-bot commented Apr 24, 2026

Uh oh!

IwakuraRein commented Apr 24, 2026

Uh oh!

flashinfer-bot commented Apr 24, 2026

Uh oh!

zianglih commented Apr 27, 2026

Uh oh!

IwakuraRein commented Apr 28, 2026

Uh oh!

flashinfer-bot commented Apr 28, 2026

Uh oh!

IwakuraRein commented Apr 29, 2026

Uh oh!

flashinfer-bot commented Apr 29, 2026

Uh oh!

aleozlx commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IwakuraRein commented Apr 9, 2026 •

edited

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading