fix(autotuner): differentiate file cache entries by runner specific kernel parameters by qiching · Pull Request #3367 · flashinfer-ai/flashinfer

qiching · 2026-05-19T21:00:47Z

📌 Description

The persistent autotune file cache key was constructed as a 3-tuple (custom_op, runner_class, profile), intentionally dropping hash(runner) for cross-process stability, but unintentionally also dropping extras, which carries runnerspecific parameters like use_8x4_sf_layout, which caused TrtllmGemmRunner instances with use_8x4_sf_layout=True and use_8x4_sf_layout=False to collide in the file cache. When vLLM or other frameworks persists and reloads autotune results, the wrong tactic gets applied, producing:

RuntimeError: Check failed: (config.mOptions.mSfLayoutB == mOptions.sfLayoutB)
             is false: Invalid sf layout in run

Updates:

flashinfer/autotuner.py, extend file key from 3-tuple to 4-tuple in search_cache, save_configs, load_from_file.
flashinfer/gemm/gemm_base.py, implement get_cache_key_extras() to return (self._use_8x4_sf_layout,).
tests/autotuner/test_autotuner_configs.py, add TestFileCacheKeyCollision with two tests reproducing and update existing tests for the new 4-tuple key format.

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).
- test_autotuner_configs.py — 37 passed
- test_autotuner_core.py — 78 passed

Reviewer Notes

The bug is latent until autotune results are persisted to disk and reloaded. It was exposed by vllm-project/vllm#42537 which added persistent caching for FlashInfer autotuning in vLLM.

Summary by CodeRabbit

Bug Fixes
- Fixed cache key mismatch between memory and disk so runner-specific parameters (extras/layout) are preserved when saving/loading autotuner results.
- Resolved collisions where configurations that differed only by layout-related parameters were treated as the same.
Tests
- Added and updated tests to verify distinct file-cache entries and loading for entries differing by extras/layout.

coderabbitai · 2026-05-19T21:00:56Z

📝 Walkthrough

Walkthrough

Autotuner persistence and lookup now include runner-specific extras in file-cache keys. TrtllmGemmRunner exposes get_cache_key_extras(), autotuner load/search/save use the extras component, and tests were updated plus a new collision test ensuring distinct persisted entries for differing extras.

Changes

Autotuner cache key extras for file persistence

Layer / File(s)	Summary
TrtllmGemmRunner cache key extras `flashinfer/gemm/gemm_base.py`	`TrtllmGemmRunner.get_cache_key_extras()` added to include `self._use_8x4_sf_layout` in runner cache extras.
Autotuner file-cache load/search/save with extras `flashinfer/autotuner.py`	`load_from_file()`, `search_cache()`, and `save_configs()` now build/use file-cache keys that include the extras component (`cache_key[4]` / `_extras`).
Test suite: existing and collision detection `tests/autotuner/test_autotuner_configs.py`	Fallback-chain and config-merge tests updated to seed/lookup extras-inclusive `file_key`; new `TestFileCacheKeyCollision` verifies that differing `extras` produce distinct saved entries preserved across save/load.
MOE test helper key update `tests/moe/test_trtllm_gen_moe_autotune_tactics.py`	`_force_tactic_in_autotuner_cache` now computes `file_key` with a trailing empty `extras` tuple so forced tactics target the correct cache slot.

Sequence Diagram

sequenceDiagram
  participant Runner as TrtllmGemmRunner
  participant SearchCache as search_cache()
  participant FileCache as _file_configs
  participant SaveConfigs as save_configs()
  participant Disk as on-disk JSON

  Runner->>SearchCache: compute cache_key (includes extras)
  SearchCache->>FileCache: lookup using cache_key[4] (extras)
  FileCache-->>SearchCache: return config or miss
  SaveConfigs->>Disk: persist entry under tuple including _extras
  Disk-->>SaveConfigs: ack

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

#3363: Addresses the reported cache-key collision where differing use_8x4_sf_layout led to invalid cached tactics being used.
Feature Request: Allow Autotuner to store to and load from Cache #2620: Related to on-disk autotuner cache format and key construction changes involving runner-specific extras.

Possibly related PRs

flashinfer-ai/flashinfer#2863: Similar updates to include runner-specific extras in autotuner cache keys and GEMM runners.
flashinfer-ai/flashinfer#2554: Related changes to autotuner file-cache loading/saving and key derivation.
flashinfer-ai/flashinfer#3126: Overlapping modifications to cache-key extras handling in autotuner logic.

Suggested labels

run-ci

Suggested reviewers

yzh119
bkryu
cyx-6
yongwww
nv-yunzheq
aleozlx
sricketts
saltyminty

Poem

🐰 In the meadow of keys where configs roam free,

I tuck extras in tuples so each runner can be—
Distinct and well-ordered, no collisions to chase,
Saved safe on the disk in its proper small space.
Hop, hop—cache harmony, neat as a lace!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: fixing the autotuner to differentiate file cache entries by runner-specific kernel parameters (extras like use_8x4_sf_layout).
Linked Issues check	✅ Passed	The PR directly addresses issue `#3363` by implementing the exact solution needed: extending the file cache key from 3-tuple to 4-tuple to include runner-specific extras, preventing cache collisions and runtime errors.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the cache key collision issue: autotuner.py (key construction), gemm_base.py (extras implementation), and test updates. No unrelated modifications.
Description check	✅ Passed	The PR description comprehensively covers the bug, root cause, solution, and testing status with detailed explanations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the AutoTuner to include "extras" in the persistent file cache key, resolving potential collisions for runners with different parameters. The changes span autotuner.py, gemm_base.py, and associated tests. Review feedback recommends refactoring the duplicated key construction logic into a helper method and suggests providing a fallback mechanism to ensure backward compatibility with existing 3-tuple cache files.

wzhao18 · 2026-05-19T21:11:02Z

@qiching Did you check if there are other classes of Runner that contain parameters like use_8x4_sf_layout which are not included in the file cache key?

qiching · 2026-05-19T21:17:54Z

@qiching Did you check if there are other classes of Runner that contain parameters like use_8x4_sf_layout which are not included in the file cache key?

yes, I audited TunableRunner subclasses, TrtllmGemmRunner is the only one that has a constructor parameter affecting kernel behavior but was missing a get_cache_key_extras override.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer/autotuner.py (1)

940-949: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Consider a legacy file-key fallback for backward compatibility.

search_cache() now only probes the new 4-tuple file key. Existing JSON caches written with the old 3-tuple format will silently miss and fall back to -1, which can cause avoidable perf regressions after upgrade.

Suggested compatibility patch

-                file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))
+                file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))
+                legacy_file_key = str((cache_key[0], cache_key[1], cache_key[3]))

                 # 2. User-loaded configs (from load_configs or autotune(cache=...))
                 #    Always consulted, even during tuning mode — loaded configs take priority
                 #    so that already-tuned shapes are never re-profiled.
-                if file_key in self._file_configs:
-                    runner_name, tactic = self._file_configs[file_key]
+                hit_key = None
+                if file_key in self._file_configs:
+                    hit_key = file_key
+                elif cache_key[4] == () and legacy_file_key in self._file_configs:
+                    # Backward-compat for caches created before extras were persisted.
+                    hit_key = legacy_file_key
+
+                if hit_key is not None:
+                    runner_name, tactic = self._file_configs[hit_key]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flashinfer/autotuner.py` around lines 940 - 949, search_cache currently only
checks the new 4-tuple file_key (str((cache_key[0], cache_key[1], cache_key[3],
cache_key[4]))) so older JSON caches keyed by the legacy 3-tuple are ignored;
add a fallback lookup: after checking file_key in self._file_configs, if not
found construct legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and
check self._file_configs[legacy_key] (optionally log a debug/warn about using
legacy key) and then assign runner_name, tactic from that entry so legacy
configs are honored. Ensure you update only the lookup logic around file_key and
preserve existing behavior when the 4-tuple exists.

🧹 Nitpick comments (1)

tests/autotuner/test_autotuner_configs.py (1)

459-539: ⚡ Quick win

Strengthen collision test with a runner that overrides get_cache_key_extras().

Current new tests prove two serialized entries exist, but they don’t validate search_cache() resolution through the runner extras contract. Adding a tiny runner stub with get_cache_key_extras() would make this fully end-to-end.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/autotuner/test_autotuner_configs.py` around lines 459 - 539, The tests
currently only assert two serialized entries but don't exercise runner-driven
lookup; add a tiny runner stub class (e.g., FakeRunnerWithExtras) that overrides
get_cache_key_extras(self) to return the same extras tuples used when creating
cache_key_a/cache_key_b, then replace or add calls in TestFileCacheKeyCollision
tests to use this stub when calling AutoTuner._get_cache_key and when calling
AutoTuner.search_cache / tuner.search_cache to verify that lookup returns the
correct entry for each extras variant; ensure you reference
AutoTuner._get_cache_key to build the keys, use tuner.search_cache (or
AutoTuner.search_cache) to perform resolution, and assert the returned profiling
entry matches the expected tuple for both extras values after save/load
roundtrip and when _file_configs is populated.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@flashinfer/autotuner.py`:
- Around line 940-949: search_cache currently only checks the new 4-tuple
file_key (str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))) so
older JSON caches keyed by the legacy 3-tuple are ignored; add a fallback
lookup: after checking file_key in self._file_configs, if not found construct
legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and check
self._file_configs[legacy_key] (optionally log a debug/warn about using legacy
key) and then assign runner_name, tactic from that entry so legacy configs are
honored. Ensure you update only the lookup logic around file_key and preserve
existing behavior when the 4-tuple exists.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_configs.py`:
- Around line 459-539: The tests currently only assert two serialized entries
but don't exercise runner-driven lookup; add a tiny runner stub class (e.g.,
FakeRunnerWithExtras) that overrides get_cache_key_extras(self) to return the
same extras tuples used when creating cache_key_a/cache_key_b, then replace or
add calls in TestFileCacheKeyCollision tests to use this stub when calling
AutoTuner._get_cache_key and when calling AutoTuner.search_cache /
tuner.search_cache to verify that lookup returns the correct entry for each
extras variant; ensure you reference AutoTuner._get_cache_key to build the keys,
use tuner.search_cache (or AutoTuner.search_cache) to perform resolution, and
assert the returned profiling entry matches the expected tuple for both extras
values after save/load roundtrip and when _file_configs is populated.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4286143f-4f7e-4908-b6b5-ab3edfee540d

📥 Commits

Reviewing files that changed from the base of the PR and between 194930c and 074df06.

📒 Files selected for processing (3)

flashinfer/autotuner.py
flashinfer/gemm/gemm_base.py
tests/autotuner/test_autotuner_configs.py

bkryu · 2026-05-20T17:26:23Z

/bot run

flashinfer-bot · 2026-05-20T17:28:05Z

GitLab MR !694 has been created, and the CI pipeline #51979900 is currently running. I'll report back once the pipeline job completes.

bkryu · 2026-05-21T04:08:30Z

@qiching , the PR might be causing failures on test_trtllm_gen_moe_autotune_tactics.py on SM100/103 devices. Do you mind checking?

The file cache key dropped the extras tuple, causing runners that differ only in parameters like use_8x4_sf_layout to collide. This led to invalid tactics being loaded from persistent cache. - Include extras (index 4) in file_key construction (search_cache, save_configs, load_from_file) - Implement get_cache_key_extras() in TrtllmGemmRunner to expose use_8x4_sf_layout - Add unit tests for file cache key collision

qiching · 2026-05-21T04:29:03Z

@qiching , the PR might be causing failures on test_trtllm_gen_moe_autotune_tactics.py on SM100/103 devices. Do you mind checking?

updated

bkryu · 2026-05-21T04:29:24Z

/bot run

flashinfer-bot · 2026-05-21T04:29:45Z

GitLab MR !694 has been updated with latest changes, and the CI pipeline #52039797 is currently running. I'll report back once the pipeline job completes.

qiching · 2026-05-21T17:51:36Z

@nv-yunzheq could we merge it?

flashinfer-bot added the op: gemm label May 19, 2026

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Comment thread flashinfer/autotuner.py

Comment thread flashinfer/autotuner.py

qiching marked this pull request as ready for review May 20, 2026 17:12

qiching requested review from aleozlx, bkryu, cyx-6, dhiraj113, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners May 20, 2026 17:12

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

qiching force-pushed the fix/autotune-file-cache-key-extras branch from 074df06 to 6d05ec8 Compare May 21, 2026 04:27

qiching requested review from IwakuraRein and jiahanc as code owners May 21, 2026 04:27

flashinfer-bot added the op: moe label May 21, 2026

bkryu approved these changes May 21, 2026

View reviewed changes

nv-yunzheq approved these changes May 22, 2026

View reviewed changes

nv-yunzheq merged commit 2f372e2 into flashinfer-ai:main May 22, 2026
32 of 45 checks passed

mmangkad mentioned this pull request May 30, 2026

Add env var for FlashInfer autotune cache vllm-project/vllm#44071

Open

Conversation

qiching commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

wzhao18 commented May 19, 2026

Uh oh!

qiching commented May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

bkryu commented May 20, 2026

Uh oh!

flashinfer-bot commented May 20, 2026

Uh oh!

bkryu commented May 21, 2026

Uh oh!

qiching commented May 21, 2026

Uh oh!

bkryu commented May 21, 2026

Uh oh!

flashinfer-bot commented May 21, 2026

Uh oh!

qiching commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qiching commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading