fix(autotuner): differentiate file cache entries by runner specific kernel parameters#3367
Conversation
📝 WalkthroughWalkthroughAutotuner persistence and lookup now include runner-specific extras in file-cache keys. TrtllmGemmRunner exposes get_cache_key_extras(), autotuner load/search/save use the extras component, and tests were updated plus a new collision test ensuring distinct persisted entries for differing extras. ChangesAutotuner cache key extras for file persistence
Sequence DiagramsequenceDiagram
participant Runner as TrtllmGemmRunner
participant SearchCache as search_cache()
participant FileCache as _file_configs
participant SaveConfigs as save_configs()
participant Disk as on-disk JSON
Runner->>SearchCache: compute cache_key (includes extras)
SearchCache->>FileCache: lookup using cache_key[4] (extras)
FileCache-->>SearchCache: return config or miss
SaveConfigs->>Disk: persist entry under tuple including _extras
Disk-->>SaveConfigs: ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the AutoTuner to include "extras" in the persistent file cache key, resolving potential collisions for runners with different parameters. The changes span autotuner.py, gemm_base.py, and associated tests. Review feedback recommends refactoring the duplicated key construction logic into a helper method and suggests providing a fallback mechanism to ensure backward compatibility with existing 3-tuple cache files.
|
@qiching Did you check if there are other classes of Runner that contain parameters like |
yes, I audited TunableRunner subclasses, |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/autotuner.py (1)
940-949:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winConsider a legacy file-key fallback for backward compatibility.
search_cache()now only probes the new 4-tuple file key. Existing JSON caches written with the old 3-tuple format will silently miss and fall back to-1, which can cause avoidable perf regressions after upgrade.Suggested compatibility patch
- file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4])) + file_key = str((cache_key[0], cache_key[1], cache_key[3], cache_key[4])) + legacy_file_key = str((cache_key[0], cache_key[1], cache_key[3])) # 2. User-loaded configs (from load_configs or autotune(cache=...)) # Always consulted, even during tuning mode — loaded configs take priority # so that already-tuned shapes are never re-profiled. - if file_key in self._file_configs: - runner_name, tactic = self._file_configs[file_key] + hit_key = None + if file_key in self._file_configs: + hit_key = file_key + elif cache_key[4] == () and legacy_file_key in self._file_configs: + # Backward-compat for caches created before extras were persisted. + hit_key = legacy_file_key + + if hit_key is not None: + runner_name, tactic = self._file_configs[hit_key]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flashinfer/autotuner.py` around lines 940 - 949, search_cache currently only checks the new 4-tuple file_key (str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))) so older JSON caches keyed by the legacy 3-tuple are ignored; add a fallback lookup: after checking file_key in self._file_configs, if not found construct legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and check self._file_configs[legacy_key] (optionally log a debug/warn about using legacy key) and then assign runner_name, tactic from that entry so legacy configs are honored. Ensure you update only the lookup logic around file_key and preserve existing behavior when the 4-tuple exists.
🧹 Nitpick comments (1)
tests/autotuner/test_autotuner_configs.py (1)
459-539: ⚡ Quick winStrengthen collision test with a runner that overrides
get_cache_key_extras().Current new tests prove two serialized entries exist, but they don’t validate
search_cache()resolution through the runner extras contract. Adding a tiny runner stub withget_cache_key_extras()would make this fully end-to-end.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/autotuner/test_autotuner_configs.py` around lines 459 - 539, The tests currently only assert two serialized entries but don't exercise runner-driven lookup; add a tiny runner stub class (e.g., FakeRunnerWithExtras) that overrides get_cache_key_extras(self) to return the same extras tuples used when creating cache_key_a/cache_key_b, then replace or add calls in TestFileCacheKeyCollision tests to use this stub when calling AutoTuner._get_cache_key and when calling AutoTuner.search_cache / tuner.search_cache to verify that lookup returns the correct entry for each extras variant; ensure you reference AutoTuner._get_cache_key to build the keys, use tuner.search_cache (or AutoTuner.search_cache) to perform resolution, and assert the returned profiling entry matches the expected tuple for both extras values after save/load roundtrip and when _file_configs is populated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@flashinfer/autotuner.py`:
- Around line 940-949: search_cache currently only checks the new 4-tuple
file_key (str((cache_key[0], cache_key[1], cache_key[3], cache_key[4]))) so
older JSON caches keyed by the legacy 3-tuple are ignored; add a fallback
lookup: after checking file_key in self._file_configs, if not found construct
legacy_key = str((cache_key[0], cache_key[1], cache_key[3])) and check
self._file_configs[legacy_key] (optionally log a debug/warn about using legacy
key) and then assign runner_name, tactic from that entry so legacy configs are
honored. Ensure you update only the lookup logic around file_key and preserve
existing behavior when the 4-tuple exists.
---
Nitpick comments:
In `@tests/autotuner/test_autotuner_configs.py`:
- Around line 459-539: The tests currently only assert two serialized entries
but don't exercise runner-driven lookup; add a tiny runner stub class (e.g.,
FakeRunnerWithExtras) that overrides get_cache_key_extras(self) to return the
same extras tuples used when creating cache_key_a/cache_key_b, then replace or
add calls in TestFileCacheKeyCollision tests to use this stub when calling
AutoTuner._get_cache_key and when calling AutoTuner.search_cache /
tuner.search_cache to verify that lookup returns the correct entry for each
extras variant; ensure you reference AutoTuner._get_cache_key to build the keys,
use tuner.search_cache (or AutoTuner.search_cache) to perform resolution, and
assert the returned profiling entry matches the expected tuple for both extras
values after save/load roundtrip and when _file_configs is populated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4286143f-4f7e-4908-b6b5-ab3edfee540d
📒 Files selected for processing (3)
flashinfer/autotuner.pyflashinfer/gemm/gemm_base.pytests/autotuner/test_autotuner_configs.py
|
/bot run |
|
@qiching , the PR might be causing failures on |
The file cache key dropped the extras tuple, causing runners that differ only in parameters like use_8x4_sf_layout to collide. This led to invalid tactics being loaded from persistent cache. - Include extras (index 4) in file_key construction (search_cache, save_configs, load_from_file) - Implement get_cache_key_extras() in TrtllmGemmRunner to expose use_8x4_sf_layout - Add unit tests for file cache key collision
074df06 to
6d05ec8
Compare
updated |
|
/bot run |
|
@nv-yunzheq could we merge it? |
📌 Description
The persistent autotune file cache key was constructed as a 3-tuple
(custom_op, runner_class, profile), intentionally droppinghash(runner)for cross-process stability, but unintentionally also droppingextras, which carries runnerspecific parameters likeuse_8x4_sf_layout, which causedTrtllmGemmRunnerinstances withuse_8x4_sf_layout=Trueanduse_8x4_sf_layout=Falseto collide in the file cache. When vLLM or other frameworks persists and reloads autotune results, the wrong tactic gets applied, producing:Updates:
flashinfer/autotuner.py, extend file key from 3-tuple to 4-tuple insearch_cache,save_configs,load_from_file.flashinfer/gemm/gemm_base.py, implementget_cache_key_extras()to return(self._use_8x4_sf_layout,).tests/autotuner/test_autotuner_configs.py, addTestFileCacheKeyCollisionwith two tests reproducing and update existing tests for the new 4-tuple key format.🔍 Related Issues
🚀 Pull Request Checklist
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).test_autotuner_configs.py— 37 passedtest_autotuner_core.py— 78 passedReviewer Notes
The bug is latent until autotune results are persisted to disk and reloaded. It was exposed by vllm-project/vllm#42537 which added persistent caching for FlashInfer autotuning in vLLM.
Summary by CodeRabbit
Bug Fixes
Tests