Skip to content

fix(autotuner): differentiate file cache entries by runner specific kernel parameters#3367

Merged
nv-yunzheq merged 1 commit into
flashinfer-ai:mainfrom
qiching:fix/autotune-file-cache-key-extras
May 22, 2026
Merged

fix(autotuner): differentiate file cache entries by runner specific kernel parameters#3367
nv-yunzheq merged 1 commit into
flashinfer-ai:mainfrom
qiching:fix/autotune-file-cache-key-extras

Conversation

@qiching

@qiching qiching commented May 19, 2026

Copy link
Copy Markdown
Collaborator

📌 Description

The persistent autotune file cache key was constructed as a 3-tuple (custom_op, runner_class, profile), intentionally dropping hash(runner) for cross-process stability, but unintentionally also dropping extras, which carries runnerspecific parameters like use_8x4_sf_layout, which caused TrtllmGemmRunner instances with use_8x4_sf_layout=True and use_8x4_sf_layout=False to collide in the file cache. When vLLM or other frameworks persists and reloads autotune results, the wrong tactic gets applied, producing:

RuntimeError: Check failed: (config.mOptions.mSfLayoutB == mOptions.sfLayoutB)
             is false: Invalid sf layout in run

Updates:

  1. flashinfer/autotuner.py, extend file key from 3-tuple to 4-tuple in search_cache, save_configs, load_from_file.
  2. flashinfer/gemm/gemm_base.py, implement get_cache_key_extras() to return (self._use_8x4_sf_layout,).
  3. tests/autotuner/test_autotuner_configs.py, add TestFileCacheKeyCollision with two tests reproducing and update existing tests for the new 4-tuple key format.

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).
    • test_autotuner_configs.py37 passed
    • test_autotuner_core.py78 passed

Reviewer Notes

The bug is latent until autotune results are persisted to disk and reloaded. It was exposed by vllm-project/vllm#42537 which added persistent caching for FlashInfer autotuning in vLLM.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed cache key mismatch between memory and disk so runner-specific parameters (extras/layout) are preserved when saving/loading autotuner results.
    • Resolved collisions where configurations that differed only by layout-related parameters were treated as the same.
  • Tests

    • Added and updated tests to verify distinct file-cache entries and loading for entries differing by extras/layout.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Autotuner persistence and lookup now include runner-specific extras in file-cache keys. TrtllmGemmRunner exposes get_cache_key_extras(), autotuner load/search/save use the extras component, and tests were updated plus a new collision test ensuring distinct persisted entries for differing extras.

Changes

Autotuner cache key extras for file persistence

Layer / File(s) Summary
TrtllmGemmRunner cache key extras
flashinfer/gemm/gemm_base.py
TrtllmGemmRunner.get_cache_key_extras() added to include self._use_8x4_sf_layout in runner cache extras.
Autotuner file-cache load/search/save with extras
flashinfer/autotuner.py
load_from_file(), search_cache(), and save_configs() now build/use file-cache keys that include the extras component (cache_key[4] / _extras).
Test suite: existing and collision detection
tests/autotuner/test_autotuner_configs.py
Fallback-chain and config-merge tests updated to seed/lookup extras-inclusive file_key; new TestFileCacheKeyCollision verifies that differing extras produce distinct saved entries preserved across save/load.
MOE test helper key update
tests/moe/test_trtllm_gen_moe_autotune_tactics.py
_force_tactic_in_autotuner_cache now computes file_key with a trailing empty extras tuple so forced tactics target the correct cache slot.

Sequence Diagram

sequenceDiagram
  participant Runner as TrtllmGemmRunner
  participant SearchCache as search_cache()
  participant FileCache as _file_configs
  participant SaveConfigs as save_configs()
  participant Disk as on-disk JSON

  Runner->>SearchCache: compute cache_key (includes extras)
  SearchCache->>FileCache: lookup using cache_key[4] (extras)
  FileCache-->>SearchCache: return config or miss
  SaveConfigs->>Disk: persist entry under tuple including _extras
  Disk-->>SaveConfigs: ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • yzh119
  • bkryu
  • cyx-6
  • yongwww
  • nv-yunzheq
  • aleozlx
  • sricketts
  • saltyminty

Poem

🐰 In the meadow of keys where configs roam free,

I tuck extras in tuples so each runner can be—
Distinct and well-ordered, no collisions to chase,
Saved safe on the disk in its proper small space.
Hop, hop—cache harmony, neat as a lace!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing the autotuner to differentiate file cache entries by runner-specific kernel parameters (extras like use_8x4_sf_layout).
Linked Issues check ✅ Passed The PR directly addresses issue #3363 by implementing the exact solution needed: extending the file cache key from 3-tuple to 4-tuple to include runner-specific extras, preventing cache collisions and runtime errors.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the cache key collision issue: autotuner.py (key construction), gemm_base.py (extras implementation), and test updates. No unrelated modifications.
Description check ✅ Passed The PR description comprehensively covers the bug, root cause, solution, and testing status with detailed explanations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the AutoTuner to include "extras" in the persistent file cache key, resolving potential collisions for runners with different parameters. The changes span autotuner.py, gemm_base.py, and associated tests. Review feedback recommends refactoring the duplicated key construction logic into a helper method and suggests providing a fallback mechanism to ensure backward compatibility with existing 3-tuple cache files.

Comment thread flashinfer/autotuner.py
Comment thread flashinfer/autotuner.py
@wzhao18

wzhao18 commented May 19, 2026

Copy link
Copy Markdown
Contributor

@qiching Did you check if there are other classes of Runner that contain parameters like use_8x4_sf_layout which are not included in the file cache key?

@qiching

qiching commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

@qiching Did you check if there are other classes of Runner that contain parameters like use_8x4_sf_layout which are not included in the file cache key?

yes, I audited TunableRunner subclasses, TrtllmGemmRunner is the only one that has a constructor parameter affecting kernel behavior but was missing a get_cache_key_extras override.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flashinfer/autotuner.py (1)

940-949: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Consider a legacy file-key fallback for backward compatibility.

search_cache() now only probes the new 4-tuple file key. Existing JSON caches written with the old 3-tuple format will silently miss and fall back to -1, which can cause avoidable perf regressions after upgrade.

Suggested compatibility patch
-                file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))
+                file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))
+                legacy_file_key = str((cache_key[0], cache_key[1], cache_key[3]))

                 # 2. User-loaded configs (from load_configs or autotune(cache=...))
                 #    Always consulted, even during tuning mode — loaded configs take priority
                 #    so that already-tuned shapes are never re-profiled.
-                if file_key in self._file_configs:
-                    runner_name, tactic = self._file_configs[file_key]
+                hit_key = None
+                if file_key in self._file_configs:
+                    hit_key = file_key
+                elif cache_key[4] == () and legacy_file_key in self._file_configs:
+                    # Backward-compat for caches created before extras were persisted.
+                    hit_key = legacy_file_key
+
+                if hit_key is not None:
+                    runner_name, tactic = self._file_configs[hit_key]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flashinfer/autotuner.py` around lines 940 - 949, search_cache currently only
checks the new 4-tuple file_key (str((cache_key[0], cache_key[1], cache_key[3],
cache_key[4]))) so older JSON caches keyed by the legacy 3-tuple are ignored;
add a fallback lookup: after checking file_key in self._file_configs, if not
found construct legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and
check self._file_configs[legacy_key] (optionally log a debug/warn about using
legacy key) and then assign runner_name, tactic from that entry so legacy
configs are honored. Ensure you update only the lookup logic around file_key and
preserve existing behavior when the 4-tuple exists.
🧹 Nitpick comments (1)
tests/autotuner/test_autotuner_configs.py (1)

459-539: ⚡ Quick win

Strengthen collision test with a runner that overrides get_cache_key_extras().

Current new tests prove two serialized entries exist, but they don’t validate search_cache() resolution through the runner extras contract. Adding a tiny runner stub with get_cache_key_extras() would make this fully end-to-end.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/autotuner/test_autotuner_configs.py` around lines 459 - 539, The tests
currently only assert two serialized entries but don't exercise runner-driven
lookup; add a tiny runner stub class (e.g., FakeRunnerWithExtras) that overrides
get_cache_key_extras(self) to return the same extras tuples used when creating
cache_key_a/cache_key_b, then replace or add calls in TestFileCacheKeyCollision
tests to use this stub when calling AutoTuner._get_cache_key and when calling
AutoTuner.search_cache / tuner.search_cache to verify that lookup returns the
correct entry for each extras variant; ensure you reference
AutoTuner._get_cache_key to build the keys, use tuner.search_cache (or
AutoTuner.search_cache) to perform resolution, and assert the returned profiling
entry matches the expected tuple for both extras values after save/load
roundtrip and when _file_configs is populated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@flashinfer/autotuner.py`:
- Around line 940-949: search_cache currently only checks the new 4-tuple
file_key (str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))) so
older JSON caches keyed by the legacy 3-tuple are ignored; add a fallback
lookup: after checking file_key in self._file_configs, if not found construct
legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and check
self._file_configs[legacy_key] (optionally log a debug/warn about using legacy
key) and then assign runner_name, tactic from that entry so legacy configs are
honored. Ensure you update only the lookup logic around file_key and preserve
existing behavior when the 4-tuple exists.

---

Nitpick comments:
In `@tests/autotuner/test_autotuner_configs.py`:
- Around line 459-539: The tests currently only assert two serialized entries
but don't exercise runner-driven lookup; add a tiny runner stub class (e.g.,
FakeRunnerWithExtras) that overrides get_cache_key_extras(self) to return the
same extras tuples used when creating cache_key_a/cache_key_b, then replace or
add calls in TestFileCacheKeyCollision tests to use this stub when calling
AutoTuner._get_cache_key and when calling AutoTuner.search_cache /
tuner.search_cache to verify that lookup returns the correct entry for each
extras variant; ensure you reference AutoTuner._get_cache_key to build the keys,
use tuner.search_cache (or AutoTuner.search_cache) to perform resolution, and
assert the returned profiling entry matches the expected tuple for both extras
values after save/load roundtrip and when _file_configs is populated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4286143f-4f7e-4908-b6b5-ab3edfee540d

📥 Commits

Reviewing files that changed from the base of the PR and between 194930c and 074df06.

📒 Files selected for processing (3)
  • flashinfer/autotuner.py
  • flashinfer/gemm/gemm_base.py
  • tests/autotuner/test_autotuner_configs.py

@bkryu

bkryu commented May 20, 2026

Copy link
Copy Markdown
Collaborator

/bot run

@flashinfer-bot

Copy link
Copy Markdown
Collaborator

GitLab MR !694 has been created, and the CI pipeline #51979900 is currently running. I'll report back once the pipeline job completes.

@bkryu

bkryu commented May 21, 2026

Copy link
Copy Markdown
Collaborator

@qiching , the PR might be causing failures on test_trtllm_gen_moe_autotune_tactics.py on SM100/103 devices. Do you mind checking?

The file cache key dropped the extras tuple, causing runners that differ
only in parameters like use_8x4_sf_layout to collide. This led to
invalid tactics being loaded from persistent cache.
- Include extras (index 4) in file_key construction (search_cache,
  save_configs, load_from_file)
- Implement get_cache_key_extras() in TrtllmGemmRunner to expose
  use_8x4_sf_layout
- Add unit tests for file cache key collision
@qiching qiching force-pushed the fix/autotune-file-cache-key-extras branch from 074df06 to 6d05ec8 Compare May 21, 2026 04:27
@qiching

qiching commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

@qiching , the PR might be causing failures on test_trtllm_gen_moe_autotune_tactics.py on SM100/103 devices. Do you mind checking?

updated

@bkryu

bkryu commented May 21, 2026

Copy link
Copy Markdown
Collaborator

/bot run

@flashinfer-bot

Copy link
Copy Markdown
Collaborator

GitLab MR !694 has been updated with latest changes, and the CI pipeline #52039797 is currently running. I'll report back once the pipeline job completes.

@qiching

qiching commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

@nv-yunzheq could we merge it?

@nv-yunzheq nv-yunzheq merged commit 2f372e2 into flashinfer-ai:main May 22, 2026
32 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flashinfer autotune file cache not discriminating kernel 'extra' parameters

5 participants