Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical issues within the cuDNN GEMM backend, primarily focusing on improving the autotuning process and enhancing the robustness of the graph caching mechanism. By refining how cuDNN plans are built and ensuring that cache keys properly differentiate between varying data types, the changes prevent crashes and allow for more comprehensive performance optimization across diverse operational contexts. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded a TunableRunner hook Changes
Sequence DiagramsequenceDiagram
participant Client
participant AutoTuner
participant TunableRunner
participant Cache
participant CUDNNGraph
Client->>AutoTuner: choose_one(inputs)
AutoTuner->>AutoTuner: search_cache(inputs)
loop per runner
AutoTuner->>TunableRunner: get_cache_key_extras(inputs)
TunableRunner-->>AutoTuner: extras
AutoTuner->>AutoTuner: _get_cache_key(..., extras)
AutoTuner->>Cache: lookup(cache_key)
end
alt Cache Hit
Cache-->>AutoTuner: OptimizationProfile
else Cache Miss
AutoTuner->>CUDNNGraph: build_cudnn_gemm_*_graph(policy)
CUDNNGraph-->>AutoTuner: graph with built plans
AutoTuner->>Cache: store(cache_key, profile)
end
AutoTuner-->>Client: selected profile
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Warning Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
flashinfer/gemm/gemm_base.py (2)
1793-1812:⚠️ Potential issue | 🔴 CriticalFix import-time resolution of
cudnn.build_plan_policyin function signatures.The default arguments in
build_cudnn_gemm_fp4_graph(line 1809) andbuild_cudnn_gemm_bf16_graph(line 2773) eagerly evaluatecudnn.build_plan_policy.HEURISTICS_CHOICEduring module import. Sincecudnnis conditionally imported with a try-except (lines 65-76) and there is nofrom __future__ import annotations, any system without cuDNN will fail withNameError: name 'cudnn' is not definedbefore_check_cudnn_availability()can execute. Usepolicy=Noneas the default and assign the enum inside the function body.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/gemm/gemm_base.py` around lines 1793 - 1812, The function signatures build_cudnn_gemm_fp4_graph and build_cudnn_gemm_bf16_graph currently use cudnn.build_plan_policy.HEURISTICS_CHOICE as a default which is evaluated at import time and breaks if cudnn isn't imported; change the parameter to policy=None and inside each function (after calling _check_cudnn_availability()) assign policy = cudnn.build_plan_policy.HEURISTICS_CHOICE if policy is None so the enum is resolved at runtime; update any docstrings or callers if needed to reflect the new default behavior.
4043-4065:⚠️ Potential issue | 🟠 MajorFP4 enumeration uses
HEURISTICS_CHOICEpolicy, limiting plan coverage compared to BF16'sALLpolicy.
CudnnFp4GemmRunner.get_valid_tactics()passestactic=-1to_get_cudnn_fp4_gemm_graph(), which maps topolicy=cudnn.build_plan_policy.HEURISTICS_CHOICE(line 4044). This policy is then passed tograph.build_plans(policy)during graph construction, restricting available execution plans to cuDNN's heuristic selection. By contrast,CudnnBf16GemmRunner.get_valid_tactics()directly usespolicy=cudnn.build_plan_policy.ALL(line 3006), enabling full plan enumeration. The FP4 autotuner will only see plans selected by heuristics, not the complete plan set that BF16 now covers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/gemm/gemm_base.py` around lines 4043 - 4065, The FP4 path incorrectly maps tactic == -1 to cudnn.build_plan_policy.HEURISTICS_CHOICE, which restricts plan enumeration; update _get_cudnn_fp4_gemm_graph (invoked by CudnnFp4GemmRunner.get_valid_tactics) to set policy = cudnn.build_plan_policy.ALL when tactic == -1 so the subsequent graph.build_plans(policy) call enumerates all plans (matching the BF16 behavior) instead of only heuristically chosen plans.
🧹 Nitpick comments (1)
3rdparty/cutlass (1)
1-1: Add optional CI validation for CUTLASS submodule commit consistency.The current setup pins the CUTLASS submodule to commit
f3fde58372d33e9a5650ba7b80fc48b3b49d40c8via git, and build workflows correctly fetch submodules recursively. However, build_backend.py does not assert or validate the CUTLASS commit/version before symlinking or copying. While accidental submodule drift is unlikely given git's pinning, adding a lightweight CI check to explicitly validate the submodule commit (e.g., comparing actual commit against expectedf3fde58372d33e9a5650ba7b80fc48b3b49d40c8) would serve as an additional guardrail.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@3rdparty/cutlass` at line 1, Add an optional CI-time validation in build_backend.py to verify the CUTLASS submodule commit matches the pinned commit "f3fde58372d33e9a5650ba7b80fc48b3b49d40c8" before performing the symlink/copy; implement this check in the setup path (e.g., inside or just before the function that creates the CUTLASS symlink/copy such as setup_cutlass() or the main build routine) by reading the submodule commit (via git rev-parse HEAD in the cutlass directory or by parsing .git/refs) and comparing it to the expected hash, and if mismatched, either fail with a clear error or emit a CI-only warning controlled by an env flag (e.g., CI_VALIDATE_CUTLASS=true) so the guardrail can be enabled without changing default behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flashinfer/autotuner.py`:
- Around line 623-625: The new fifth cache-key element `extras` is computed by
AutoTuner._get_cache_key but never persisted: update search_cache() to include
`extras` when constructing `file_key` (append or include the fifth element from
`cache_key` so file-backed keys match the in-memory key), and update
save_configs() to accept/unpack the 5-tuple `cache_key` (instead of a 4-tuple)
so it writes the persisted entry using the full key; apply the same fix to the
other occurrence around the 1077-1084 block so both file_key construction and
saving consistently propagate `extras`.
---
Outside diff comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 1793-1812: The function signatures build_cudnn_gemm_fp4_graph and
build_cudnn_gemm_bf16_graph currently use
cudnn.build_plan_policy.HEURISTICS_CHOICE as a default which is evaluated at
import time and breaks if cudnn isn't imported; change the parameter to
policy=None and inside each function (after calling _check_cudnn_availability())
assign policy = cudnn.build_plan_policy.HEURISTICS_CHOICE if policy is None so
the enum is resolved at runtime; update any docstrings or callers if needed to
reflect the new default behavior.
- Around line 4043-4065: The FP4 path incorrectly maps tactic == -1 to
cudnn.build_plan_policy.HEURISTICS_CHOICE, which restricts plan enumeration;
update _get_cudnn_fp4_gemm_graph (invoked by
CudnnFp4GemmRunner.get_valid_tactics) to set policy =
cudnn.build_plan_policy.ALL when tactic == -1 so the subsequent
graph.build_plans(policy) call enumerates all plans (matching the BF16 behavior)
instead of only heuristically chosen plans.
---
Nitpick comments:
In `@3rdparty/cutlass`:
- Line 1: Add an optional CI-time validation in build_backend.py to verify the
CUTLASS submodule commit matches the pinned commit
"f3fde58372d33e9a5650ba7b80fc48b3b49d40c8" before performing the symlink/copy;
implement this check in the setup path (e.g., inside or just before the function
that creates the CUTLASS symlink/copy such as setup_cutlass() or the main build
routine) by reading the submodule commit (via git rev-parse HEAD in the cutlass
directory or by parsing .git/refs) and comparing it to the expected hash, and if
mismatched, either fail with a clear error or emit a CI-only warning controlled
by an env flag (e.g., CI_VALIDATE_CUTLASS=true) so the guardrail can be enabled
without changing default behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: de7bf225-10c7-4372-844c-7314085c7d4a
📥 Commits
Reviewing files that changed from the base of the PR and between 27cae50 and 04922184ec7a0962b5c736f0c84683d32a3de10d.
📒 Files selected for processing (3)
3rdparty/cutlassflashinfer/autotuner.pyflashinfer/gemm/gemm_base.py
|
/bot run |
|
/bot run |
|
Hi @yanqinz2 I am seeing that |
98afcc9 to
9f294a9
Compare
|
/bot stop |
|
The GitLab CI pipeline #46899789 has been cancelled. |
|
/bot run |
|
@bkryu Yeah, there is a missing unpacking for newly added key. I just push the fix. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4113-4118: The cache key in get_cache_key_extras is incorrectly
using the preallocated output tensor's dtype (out.dtype) instead of the provided
out_dtype argument used to build the cuDNN FP4 graph; update the return tuple in
get_cache_key_extras to use out_dtype (the unpacked variable) instead of
out.dtype so the autotuner key matches the graph key (function:
get_cache_key_extras).
- Around line 4052-4056: CudnnFp4GemmRunner.get_valid_tactics() currently calls
_get_cudnn_fp4_gemm_graph(..., tactic=-1) which forces
cudnn.build_plan_policy.HEURISTICS_CHOICE; change this so FP4 can enumerate all
plans by adding a policy parameter to _get_cudnn_fp4_gemm_graph (e.g., policy
with default cudnn.build_plan_policy.HEURISTICS_CHOICE) and update
CudnnFp4GemmRunner.get_valid_tactics() to call _get_cudnn_fp4_gemm_graph(...,
policy=cudnn.build_plan_policy.ALL); ensure the new policy argument is used
where the code currently maps tactic == -1 / else to HEURISTICS_CHOICE/ALL so
BF16 and FP4 use the same ALL enumeration when requested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73e7bd1a-731d-4b12-9511-59873cbe694a
📥 Commits
Reviewing files that changed from the base of the PR and between 04922184ec7a0962b5c736f0c84683d32a3de10d and 9f294a9801764be99b035b15b536227ceb3f20cb.
📒 Files selected for processing (2)
flashinfer/autotuner.pyflashinfer/gemm/gemm_base.py
🚧 Files skipped from review as they are similar to previous changes (1)
- flashinfer/autotuner.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/autotuner.py (1)
837-841:⚠️ Potential issue | 🟠 MajorMissing
inputsparameter causes cache lookup mismatch.After the tuning loop completes,
search_cacheis called without theinputsparameter, soextraswill default to(). However, the cache entries were stored with extras computed from the actual tensors. If any runner returns non-empty extras fromget_cache_key_extras(), this lookup will fail to find the tuned entry and incorrectly fall back to the default tactic.🐛 Proposed fix
# Get the best runner and tactic from cache # If no valid tactic is found, the fallback runner and tactic will be used _, runner_id, tactic, _ = self.search_cache( - custom_op, runners, input_shapes, tuning_config + custom_op, runners, input_shapes, tuning_config, inputs=inputs )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/autotuner.py` around lines 837 - 841, The cache lookup after the tuning loop calls search_cache without the inputs/extras, causing mismatch when runners provide non-empty get_cache_key_extras(); update the call site where _, runner_id, tactic, _ = self.search_cache(custom_op, runners, input_shapes, tuning_config) to pass the original inputs (the same tensors used during tuning) so extras is computed consistently (i.e., include the inputs parameter when invoking search_cache) to ensure the tuned entry is found.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@flashinfer/autotuner.py`:
- Around line 837-841: The cache lookup after the tuning loop calls search_cache
without the inputs/extras, causing mismatch when runners provide non-empty
get_cache_key_extras(); update the call site where _, runner_id, tactic, _ =
self.search_cache(custom_op, runners, input_shapes, tuning_config) to pass the
original inputs (the same tensors used during tuning) so extras is computed
consistently (i.e., include the inputs parameter when invoking search_cache) to
ensure the tuned entry is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39ce1ee9-dbd7-4be8-9f8d-a71f9ea36538
📥 Commits
Reviewing files that changed from the base of the PR and between 9f294a9801764be99b035b15b536227ceb3f20cb and f7af2d73215039aa6a87354ac88c700a77a6f0a0.
📒 Files selected for processing (1)
flashinfer/autotuner.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
flashinfer/gemm/gemm_base.py (1)
4113-4118: Rename the unusedoutplaceholder.The extras tuple looks right, but Line 4117 still trips Ruff's
RUF059. Renaming the binding to_outkeeps the tuple layout documented without leaving an unused local.♻️ Suggested cleanup
- _, _, _, _, alpha, out_dtype, out, block_size, use_nvfp4, _ = inputs + _, _, _, _, alpha, out_dtype, _out, block_size, use_nvfp4, _ = inputs return (out_dtype, block_size, use_nvfp4, alpha is not None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/gemm/gemm_base.py` around lines 4113 - 4118, In get_cache_key_extras, the tuple unpack binds an unused local named out which triggers Ruff RUF059; change that binding to _out (or prefix with underscore) so the input layout remains documented but the linter knows the variable is intentionally unused — update the tuple unpack line in get_cache_key_extras to use _out instead of out and leave the returned tuple (out_dtype, block_size, use_nvfp4, alpha is not None) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4113-4118: In get_cache_key_extras, the tuple unpack binds an
unused local named out which triggers Ruff RUF059; change that binding to _out
(or prefix with underscore) so the input layout remains documented but the
linter knows the variable is intentionally unused — update the tuple unpack line
in get_cache_key_extras to use _out instead of out and leave the returned tuple
(out_dtype, block_size, use_nvfp4, alpha is not None) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 17ddf2c2-74ba-439c-b436-ff55734e6be1
📥 Commits
Reviewing files that changed from the base of the PR and between f7af2d73215039aa6a87354ac88c700a77a6f0a0 and 2c4f70ee57546807ace66b8d5aaebf8bf40151f9.
📒 Files selected for processing (2)
flashinfer/autotuner.pyflashinfer/gemm/gemm_base.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
flashinfer/gemm/gemm_base.py (1)
4095-4100:⚠️ Potential issue | 🟡 MinorRemove unused unpacked variable in cache-key extras.
Line 4099 unpacks
outbut never uses it (RUF059). This should be simplified to avoid lint noise.♻️ Proposed fix
- _, _, _, _, alpha, out_dtype, out, block_size, use_nvfp4, _ = inputs + _, _, _, _, alpha, out_dtype, _, block_size, use_nvfp4, _ = inputs return (out_dtype, block_size, use_nvfp4, alpha is not None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/gemm/gemm_base.py` around lines 4095 - 4100, The unpack in get_cache_key_extras includes an unused variable `out` which triggers lint noise; update the tuple unpacking in get_cache_key_extras (inside gemm_base.py) to omit `out` (or replace it with `_`) so you only bind the used names (e.g., capture alpha, out_dtype, block_size, use_nvfp4) and return the same cache key tuple (out_dtype, block_size, use_nvfp4, alpha is not None).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4095-4100: The unpack in get_cache_key_extras includes an unused
variable `out` which triggers lint noise; update the tuple unpacking in
get_cache_key_extras (inside gemm_base.py) to omit `out` (or replace it with
`_`) so you only bind the used names (e.g., capture alpha, out_dtype,
block_size, use_nvfp4) and return the same cache key tuple (out_dtype,
block_size, use_nvfp4, alpha is not None).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d597ade-2a39-4abc-99ae-1751f18b9e58
📥 Commits
Reviewing files that changed from the base of the PR and between 2c4f70ee57546807ace66b8d5aaebf8bf40151f9 and 4f591be7877710a990b0329d3b24b15add250b0e.
📒 Files selected for processing (1)
flashinfer/gemm/gemm_base.py
|
[FAILED] Pipeline #46922582: 6/20 passed |
|
/bot run |
4f591be to
7fbcc01
Compare
|
[FAILED] Pipeline #46987701: 12/20 passed |
|
/bot run |
|
@YangXu1990uiuc is not authorized to trigger this CI job. cc: @yzh119, @sricketts, @yongwww |
📌 Description
Fix a series of flashinfer gemm backend cudnn issues
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit