Yanqinz/gemm cudnn autotune fix by yanqinz2 · Pull Request #2863 · flashinfer-ai/flashinfer

yanqinz2 · 2026-03-23T16:05:33Z

📌 Description

Fix a series of flashinfer gemm backend cudnn issues

🔍 Related Issues

Fix a issue that cudnn gemm backend only build and autotune the first plan
Fix a issue that graph cache key does not containing data type information, and will hit crash when running on two different data types if cudnn plans are not the same for both data types

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Refactor
- Autotuner cache keys now include input-specific per-runner extras derived from actual tensors, improving cache accuracy and reducing incorrect misses during tuning and reuse.
- cuDNN GEMM graph/build flow streamlined into unified graph construction with consistent plan/policy handling; tuning cache discrimination extended for FP4/BF16 variants (includes output dtype, block size, NVFP4 and alpha-related distinctions).

gemini-code-assist · 2026-03-23T16:06:00Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical issues within the cuDNN GEMM backend, primarily focusing on improving the autotuning process and enhancing the robustness of the graph caching mechanism. By refining how cuDNN plans are built and ensuring that cache keys properly differentiate between varying data types, the changes prevent crashes and allow for more comprehensive performance optimization across diverse operational contexts.

Highlights

cuDNN GEMM Autotune Fix: Resolved an issue where the cuDNN GEMM backend would only build and autotune the first available plan, limiting performance optimization.
Graph Cache Key Enhancement: Fixed a critical bug where the graph cache key did not include data type information, leading to crashes when running operations with different data types but otherwise identical input shapes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-23T16:07:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added a TunableRunner hook get_cache_key_extras(inputs) and threaded its returned tuple into AutoTuner cache-key generation and lookup. AutoTuner.search_cache(...) and choose_one(...) accept inputs and use per-runner extras when probing and recording cache entries. cuDNN FP4/BF16 GEMM graph builders were consolidated/updated to accept an optional policy and build plans during graph construction; tactic→policy selection moved to runtime.

Changes

Cohort / File(s)	Summary
Autotuner core `flashinfer/autotuner.py`	Added `TunableRunner.get_cache_key_extras(inputs)` hook; `AutoTuner.search_cache(..., inputs=None)` computes per-runner extras for cache lookup; `AutoTuner.choose_one(...)` forwards `inputs`; `_get_cache_key(..., extras=())` includes extras; `save_configs(...)` updated to unpack expanded cache-key.
cuDNN GEMM graph & runners `flashinfer/gemm/gemm_base.py`	Consolidated FP4 plan builders into `build_cudnn_gemm_fp4_graph(..., policy=None)` that calls `graph.check_support()` and `graph.build_plans(policy)`; BF16 builder accepts `policy=None` and calls `graph.build_plans(policy)`; tactic→policy mapping moved to runtime; FP4/BF16 TunableRunners implement `get_cache_key_extras(...)`; FP4 requirement check no longer eagerly builds graph.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AutoTuner
    participant TunableRunner
    participant Cache
    participant CUDNNGraph

    Client->>AutoTuner: choose_one(inputs)
    AutoTuner->>AutoTuner: search_cache(inputs)
    loop per runner
        AutoTuner->>TunableRunner: get_cache_key_extras(inputs)
        TunableRunner-->>AutoTuner: extras
        AutoTuner->>AutoTuner: _get_cache_key(..., extras)
        AutoTuner->>Cache: lookup(cache_key)
    end
    alt Cache Hit
        Cache-->>AutoTuner: OptimizationProfile
    else Cache Miss
        AutoTuner->>CUDNNGraph: build_cudnn_gemm_*_graph(policy)
        CUDNNGraph-->>AutoTuner: graph with built plans
        AutoTuner->>Cache: store(cache_key, profile)
    end
    AutoTuner-->>Client: selected profile

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: Autotuner support CUDA graph and cold L2 cache #2663 — Modifies flashinfer/autotuner.py; likely overlaps on choose_one/cache-key changes.
feat: Add autotuner config caching, thread safety, and documentation #2554 — Changes AutoTuner caching and config persistence; strong relation to cache-key/layout semantics.

Suggested reviewers

bkryu
nv-yunzheq
yzh119
cyx-6
kahyunnam
jimmyzho

Poem

🐰 I hop through keys and tuck in traits,
Extras bundled for future fates.
I build the plans and skip the fuss,
Cache remembers what we trust.
🥕 Cheers to faster, smarter runs!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title is vague and generic, using a branch name format rather than clearly summarizing the main changes.	Use a specific title that describes the fix, e.g., 'Fix cuDNN GEMM caching and data type handling in autotuner'
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description covers all required sections: description of fixes, related issues, and pre-commit/test checklist completion status.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch yanqinz/gemm-cudnn-autotune-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-23T16:13:35Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

flashinfer/gemm/gemm_base.py (2)
1793-1812: ⚠️ Potential issue | 🔴 Critical

Fix import-time resolution of cudnn.build_plan_policy in function signatures.

The default arguments in build_cudnn_gemm_fp4_graph (line 1809) and build_cudnn_gemm_bf16_graph (line 2773) eagerly evaluate cudnn.build_plan_policy.HEURISTICS_CHOICE during module import. Since cudnn is conditionally imported with a try-except (lines 65-76) and there is no from __future__ import annotations, any system without cuDNN will fail with NameError: name 'cudnn' is not defined before _check_cudnn_availability() can execute. Use policy=None as the default and assign the enum inside the function body.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flashinfer/gemm/gemm_base.py` around lines 1793 - 1812, The function
signatures build_cudnn_gemm_fp4_graph and build_cudnn_gemm_bf16_graph currently
use cudnn.build_plan_policy.HEURISTICS_CHOICE as a default which is evaluated at
import time and breaks if cudnn isn't imported; change the parameter to
policy=None and inside each function (after calling _check_cudnn_availability())
assign policy = cudnn.build_plan_policy.HEURISTICS_CHOICE if policy is None so
the enum is resolved at runtime; update any docstrings or callers if needed to
reflect the new default behavior.
4043-4065: ⚠️ Potential issue | 🟠 Major

FP4 enumeration uses HEURISTICS_CHOICE policy, limiting plan coverage compared to BF16's ALL policy.

CudnnFp4GemmRunner.get_valid_tactics() passes tactic=-1 to _get_cudnn_fp4_gemm_graph(), which maps to policy=cudnn.build_plan_policy.HEURISTICS_CHOICE (line 4044). This policy is then passed to graph.build_plans(policy) during graph construction, restricting available execution plans to cuDNN's heuristic selection. By contrast, CudnnBf16GemmRunner.get_valid_tactics() directly uses policy=cudnn.build_plan_policy.ALL (line 3006), enabling full plan enumeration. The FP4 autotuner will only see plans selected by heuristics, not the complete plan set that BF16 now covers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flashinfer/gemm/gemm_base.py` around lines 4043 - 4065, The FP4 path
incorrectly maps tactic == -1 to cudnn.build_plan_policy.HEURISTICS_CHOICE,
which restricts plan enumeration; update _get_cudnn_fp4_gemm_graph (invoked by
CudnnFp4GemmRunner.get_valid_tactics) to set policy =
cudnn.build_plan_policy.ALL when tactic == -1 so the subsequent
graph.build_plans(policy) call enumerates all plans (matching the BF16 behavior)
instead of only heuristically chosen plans.

🧹 Nitpick comments (1)

3rdparty/cutlass (1)
1-1: Add optional CI validation for CUTLASS submodule commit consistency.

The current setup pins the CUTLASS submodule to commit f3fde58372d33e9a5650ba7b80fc48b3b49d40c8 via git, and build workflows correctly fetch submodules recursively. However, build_backend.py does not assert or validate the CUTLASS commit/version before symlinking or copying. While accidental submodule drift is unlikely given git's pinning, adding a lightweight CI check to explicitly validate the submodule commit (e.g., comparing actual commit against expected f3fde58372d33e9a5650ba7b80fc48b3b49d40c8) would serve as an additional guardrail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@3rdparty/cutlass` at line 1, Add an optional CI-time validation in
build_backend.py to verify the CUTLASS submodule commit matches the pinned
commit "f3fde58372d33e9a5650ba7b80fc48b3b49d40c8" before performing the
symlink/copy; implement this check in the setup path (e.g., inside or just
before the function that creates the CUTLASS symlink/copy such as
setup_cutlass() or the main build routine) by reading the submodule commit (via
git rev-parse HEAD in the cutlass directory or by parsing .git/refs) and
comparing it to the expected hash, and if mismatched, either fail with a clear
error or emit a CI-only warning controlled by an env flag (e.g.,
CI_VALIDATE_CUTLASS=true) so the guardrail can be enabled without changing
default behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flashinfer/autotuner.py`:
- Around line 623-625: The new fifth cache-key element `extras` is computed by
AutoTuner._get_cache_key but never persisted: update search_cache() to include
`extras` when constructing `file_key` (append or include the fifth element from
`cache_key` so file-backed keys match the in-memory key), and update
save_configs() to accept/unpack the 5-tuple `cache_key` (instead of a 4-tuple)
so it writes the persisted entry using the full key; apply the same fix to the
other occurrence around the 1077-1084 block so both file_key construction and
saving consistently propagate `extras`.

---

Outside diff comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 1793-1812: The function signatures build_cudnn_gemm_fp4_graph and
build_cudnn_gemm_bf16_graph currently use
cudnn.build_plan_policy.HEURISTICS_CHOICE as a default which is evaluated at
import time and breaks if cudnn isn't imported; change the parameter to
policy=None and inside each function (after calling _check_cudnn_availability())
assign policy = cudnn.build_plan_policy.HEURISTICS_CHOICE if policy is None so
the enum is resolved at runtime; update any docstrings or callers if needed to
reflect the new default behavior.
- Around line 4043-4065: The FP4 path incorrectly maps tactic == -1 to
cudnn.build_plan_policy.HEURISTICS_CHOICE, which restricts plan enumeration;
update _get_cudnn_fp4_gemm_graph (invoked by
CudnnFp4GemmRunner.get_valid_tactics) to set policy =
cudnn.build_plan_policy.ALL when tactic == -1 so the subsequent
graph.build_plans(policy) call enumerates all plans (matching the BF16 behavior)
instead of only heuristically chosen plans.

---

Nitpick comments:
In `@3rdparty/cutlass`:
- Line 1: Add an optional CI-time validation in build_backend.py to verify the
CUTLASS submodule commit matches the pinned commit
"f3fde58372d33e9a5650ba7b80fc48b3b49d40c8" before performing the symlink/copy;
implement this check in the setup path (e.g., inside or just before the function
that creates the CUTLASS symlink/copy such as setup_cutlass() or the main build
routine) by reading the submodule commit (via git rev-parse HEAD in the cutlass
directory or by parsing .git/refs) and comparing it to the expected hash, and if
mismatched, either fail with a clear error or emit a CI-only warning controlled
by an env flag (e.g., CI_VALIDATE_CUTLASS=true) so the guardrail can be enabled
without changing default behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de7bf225-10c7-4372-844c-7314085c7d4a

📥 Commits

Reviewing files that changed from the base of the PR and between 27cae50 and 04922184ec7a0962b5c736f0c84683d32a3de10d.

📒 Files selected for processing (3)

3rdparty/cutlass
flashinfer/autotuner.py
flashinfer/gemm/gemm_base.py

aleozlx · 2026-03-24T17:28:56Z

/bot run

flashinfer-bot · 2026-03-24T17:30:14Z

GitLab MR !455 has been created, and the CI pipeline #46892853 is currently running. I'll report back once the pipeline job completes.

bkryu · 2026-03-24T18:53:39Z

/bot run

flashinfer-bot · 2026-03-24T18:54:02Z

GitLab MR !455 has been created, and the CI pipeline #46899789 is currently running. I'll report back once the pipeline job completes.

bkryu · 2026-03-25T00:07:15Z

Hi @yanqinz2 I am seeing that tests/autotuner/test_autotuner_configs.py is failing on all cards. Can you check?

bkryu · 2026-03-25T00:08:03Z

/bot stop

flashinfer-bot · 2026-03-25T00:08:54Z

The GitLab CI pipeline #46899789 has been cancelled.

bkryu · 2026-03-25T00:10:46Z

/bot run

flashinfer-bot · 2026-03-25T00:11:04Z

GitLab MR !455 has been updated with latest changes, and the CI pipeline #46922582 is currently running. I'll report back once the pipeline job completes.

yanqinz2 · 2026-03-25T00:13:54Z

@bkryu Yeah, there is a missing unpacking for newly added key. I just push the fix.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4113-4118: The cache key in get_cache_key_extras is incorrectly
using the preallocated output tensor's dtype (out.dtype) instead of the provided
out_dtype argument used to build the cuDNN FP4 graph; update the return tuple in
get_cache_key_extras to use out_dtype (the unpacked variable) instead of
out.dtype so the autotuner key matches the graph key (function:
get_cache_key_extras).
- Around line 4052-4056: CudnnFp4GemmRunner.get_valid_tactics() currently calls
_get_cudnn_fp4_gemm_graph(..., tactic=-1) which forces
cudnn.build_plan_policy.HEURISTICS_CHOICE; change this so FP4 can enumerate all
plans by adding a policy parameter to _get_cudnn_fp4_gemm_graph (e.g., policy
with default cudnn.build_plan_policy.HEURISTICS_CHOICE) and update
CudnnFp4GemmRunner.get_valid_tactics() to call _get_cudnn_fp4_gemm_graph(...,
policy=cudnn.build_plan_policy.ALL); ensure the new policy argument is used
where the code currently maps tactic == -1 / else to HEURISTICS_CHOICE/ALL so
BF16 and FP4 use the same ALL enumeration when requested.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 73e7bd1a-731d-4b12-9511-59873cbe694a

📥 Commits

Reviewing files that changed from the base of the PR and between 04922184ec7a0962b5c736f0c84683d32a3de10d and 9f294a9801764be99b035b15b536227ceb3f20cb.

📒 Files selected for processing (2)

flashinfer/autotuner.py
flashinfer/gemm/gemm_base.py

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/autotuner.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer/autotuner.py (1)
837-841: ⚠️ Potential issue | 🟠 Major

Missing inputs parameter causes cache lookup mismatch.

After the tuning loop completes, search_cache is called without the inputs parameter, so extras will default to (). However, the cache entries were stored with extras computed from the actual tensors. If any runner returns non-empty extras from get_cache_key_extras(), this lookup will fail to find the tuned entry and incorrectly fall back to the default tactic.
🐛 Proposed fix
             # Get the best runner and tactic from cache
             # If no valid tactic is found, the fallback runner and tactic will be used
             _, runner_id, tactic, _ = self.search_cache(
-                custom_op, runners, input_shapes, tuning_config
+                custom_op, runners, input_shapes, tuning_config, inputs=inputs
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flashinfer/autotuner.py` around lines 837 - 841, The cache lookup after the
tuning loop calls search_cache without the inputs/extras, causing mismatch when
runners provide non-empty get_cache_key_extras(); update the call site where _,
runner_id, tactic, _ = self.search_cache(custom_op, runners, input_shapes,
tuning_config) to pass the original inputs (the same tensors used during tuning)
so extras is computed consistently (i.e., include the inputs parameter when
invoking search_cache) to ensure the tuned entry is found.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@flashinfer/autotuner.py`:
- Around line 837-841: The cache lookup after the tuning loop calls search_cache
without the inputs/extras, causing mismatch when runners provide non-empty
get_cache_key_extras(); update the call site where _, runner_id, tactic, _ =
self.search_cache(custom_op, runners, input_shapes, tuning_config) to pass the
original inputs (the same tensors used during tuning) so extras is computed
consistently (i.e., include the inputs parameter when invoking search_cache) to
ensure the tuned entry is found.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39ce1ee9-dbd7-4be8-9f8d-a71f9ea36538

📥 Commits

Reviewing files that changed from the base of the PR and between 9f294a9801764be99b035b15b536227ceb3f20cb and f7af2d73215039aa6a87354ac88c700a77a6f0a0.

📒 Files selected for processing (1)

flashinfer/autotuner.py

coderabbitai

🧹 Nitpick comments (1)

flashinfer/gemm/gemm_base.py (1)

4113-4118: Rename the unused out placeholder.

The extras tuple looks right, but Line 4117 still trips Ruff's RUF059. Renaming the binding to _out keeps the tuple layout documented without leaving an unused local.

♻️ Suggested cleanup

-            _, _, _, _, alpha, out_dtype, out, block_size, use_nvfp4, _ = inputs
+            _, _, _, _, alpha, out_dtype, _out, block_size, use_nvfp4, _ = inputs
             return (out_dtype, block_size, use_nvfp4, alpha is not None)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flashinfer/gemm/gemm_base.py` around lines 4113 - 4118, In
get_cache_key_extras, the tuple unpack binds an unused local named out which
triggers Ruff RUF059; change that binding to _out (or prefix with underscore) so
the input layout remains documented but the linter knows the variable is
intentionally unused — update the tuple unpack line in get_cache_key_extras to
use _out instead of out and leave the returned tuple (out_dtype, block_size,
use_nvfp4, alpha is not None) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4113-4118: In get_cache_key_extras, the tuple unpack binds an
unused local named out which triggers Ruff RUF059; change that binding to _out
(or prefix with underscore) so the input layout remains documented but the
linter knows the variable is intentionally unused — update the tuple unpack line
in get_cache_key_extras to use _out instead of out and leave the returned tuple
(out_dtype, block_size, use_nvfp4, alpha is not None) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17ddf2c2-74ba-439c-b436-ff55734e6be1

📥 Commits

Reviewing files that changed from the base of the PR and between f7af2d73215039aa6a87354ac88c700a77a6f0a0 and 2c4f70ee57546807ace66b8d5aaebf8bf40151f9.

📒 Files selected for processing (2)

flashinfer/autotuner.py
flashinfer/gemm/gemm_base.py

coderabbitai

♻️ Duplicate comments (1)

flashinfer/gemm/gemm_base.py (1)

4095-4100: ⚠️ Potential issue | 🟡 Minor

Remove unused unpacked variable in cache-key extras.

Line 4099 unpacks out but never uses it (RUF059). This should be simplified to avoid lint noise.

♻️ Proposed fix

-            _, _, _, _, alpha, out_dtype, out, block_size, use_nvfp4, _ = inputs
+            _, _, _, _, alpha, out_dtype, _, block_size, use_nvfp4, _ = inputs
             return (out_dtype, block_size, use_nvfp4, alpha is not None)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flashinfer/gemm/gemm_base.py` around lines 4095 - 4100, The unpack in
get_cache_key_extras includes an unused variable `out` which triggers lint
noise; update the tuple unpacking in get_cache_key_extras (inside gemm_base.py)
to omit `out` (or replace it with `_`) so you only bind the used names (e.g.,
capture alpha, out_dtype, block_size, use_nvfp4) and return the same cache key
tuple (out_dtype, block_size, use_nvfp4, alpha is not None).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4095-4100: The unpack in get_cache_key_extras includes an unused
variable `out` which triggers lint noise; update the tuple unpacking in
get_cache_key_extras (inside gemm_base.py) to omit `out` (or replace it with
`_`) so you only bind the used names (e.g., capture alpha, out_dtype,
block_size, use_nvfp4) and return the same cache key tuple (out_dtype,
block_size, use_nvfp4, alpha is not None).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d597ade-2a39-4abc-99ae-1751f18b9e58

📥 Commits

Reviewing files that changed from the base of the PR and between 2c4f70ee57546807ace66b8d5aaebf8bf40151f9 and 4f591be7877710a990b0329d3b24b15add250b0e.

📒 Files selected for processing (1)

flashinfer/gemm/gemm_base.py

flashinfer-bot · 2026-03-25T05:23:32Z

[FAILED] Pipeline #46922582: 6/20 passed

bkryu · 2026-03-25T17:39:23Z

/bot run

flashinfer-bot · 2026-03-25T17:40:10Z

GitLab MR !455 has been updated with latest changes, and the CI pipeline #46987701 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-25T23:21:29Z

[FAILED] Pipeline #46987701: 12/20 passed

YangXu1990uiuc · 2026-03-26T02:52:02Z

/bot run

flashinfer-bot · 2026-03-26T02:52:39Z

@YangXu1990uiuc is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww

yanqinz2 requested review from aleozlx, bkryu, cyx-6, dhiraj113, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, sricketts, yongwww, yyihuang and yzh119 as code owners March 23, 2026 16:05

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread flashinfer/autotuner.py

aleozlx added the run-ci label Mar 24, 2026

yanqinz2 force-pushed the yanqinz/gemm-cudnn-autotune-fix branch from 98afcc9 to 9f294a9 Compare March 25, 2026 00:07

yanqinz2 requested a review from samuellees as a code owner March 25, 2026 00:07

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread flashinfer/gemm/gemm_base.py

Comment thread flashinfer/gemm/gemm_base.py Outdated

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

Yanqin Zhai added 7 commits March 25, 2026 15:50

add_separate_cache_according_output_dtype

31058ab

initial_commit

1a36a1a

fix: restore cutlass submodule pointer to match main

c353393

fix_cudnn_not_found_at_aot_issue

40e39e7

fix_autotuner_missing_packing_var

dd8f1ab

bug_fixes

40aa2a6

optimize_nvfp4_cache

7fbcc01

yanqinz2 force-pushed the yanqinz/gemm-cudnn-autotune-fix branch from 4f591be to 7fbcc01 Compare March 25, 2026 22:51

bkryu approved these changes Mar 26, 2026

View reviewed changes

bkryu merged commit a33f3d8 into main Mar 26, 2026
29 of 37 checks passed

bkryu deleted the yanqinz/gemm-cudnn-autotune-fix branch March 26, 2026 06:14

This was referenced Apr 20, 2026

autotuner: check cache before synthesizing profile input tensors #3126

Merged

Yanqinz/fix cudnn sm120 nan #3192

Merged

Conversation

yanqinz2 commented Mar 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist Bot commented Mar 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aleozlx commented Mar 24, 2026

Uh oh!

flashinfer-bot commented Mar 24, 2026

Uh oh!

bkryu commented Mar 24, 2026

Uh oh!

flashinfer-bot commented Mar 24, 2026

Uh oh!

bkryu commented Mar 25, 2026

Uh oh!

bkryu commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

bkryu commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

yanqinz2 commented Mar 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

bkryu commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

YangXu1990uiuc commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

yanqinz2 commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading