[CLI]Add long-doc-permutator CLI bench workload#2937
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the long-doc-permutator workload to the engine benchmark, designed to stress-test KV cache reuse through document permutations. It also refactors the lmcache query CLI command to be self-contained, integrating RequestSender and implementing an automatic fallback to the completions endpoint when chat templates are missing. Feedback focuses on optimizing memory usage for large permutation sets, ensuring proper resource lifecycle management by removing a redundant run override, and refining exception handling for optional dependency imports.
sammshen
left a comment
There was a problem hiding this comment.
LGTM! small comment on the long doc permutator
| def _is_missing_chat_template_error(error: str) -> bool: | ||
| """Return whether an error indicates missing tokenizer chat template.""" | ||
| normalized = error.lower() | ||
| return "chat template" in normalized and "tokenizer" in normalized |
There was a problem hiding this comment.
Chat template error detection is too narrow for fallback
Medium Severity
_is_missing_chat_template_error requires both "chat template" and "tokenizer" in the error string, but the old _missing_chat_template matched on "chat template" alone (plus several other patterns). Common vLLM/engine errors like "No chat template found" or "This model does not have a chat template" lack "tokenizer", so the automatic retry from chat to completions mode won't trigger, causing queries to fail unnecessarily.
Additional Locations (1)
| "long-doc-permutator", | ||
| "Permutations of context documents (stress-tests blended KV reuse)", |
There was a problem hiding this comment.
For the name and the description, the current one is not super clear.
My proposal for the description: Query the same set of long documents with different system prompts
No good ideas for the name. WDYT?
There was a problem hiding this comment.
Oh I got it wrong! Is it something like query the same set of long documents with different orders?
There was a problem hiding this comment.
yes, just updated it to Query the same set of long documents with different orders to make it less confusing
| ConfigItem( | ||
| key="ldp_vocab_size", | ||
| display_name="Vocabulary size", | ||
| description=( | ||
| "Pool size for context word generation. " | ||
| "Smaller values increase chunk hash collision risk." | ||
| ), | ||
| input_type="int", | ||
| default=8000, | ||
| condition=_workload_is("long-doc-permutator"), | ||
| phase=PHASE_WORKLOAD, | ||
| ), |
There was a problem hiding this comment.
I don't think we need to expose this to users. This can be purely internal and hard-coded.
There was a problem hiding this comment.
Changed it to hardcoded
962b2a0 to
5284f7d
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: deng451e <838677410@qq.com>
5284f7d to
cdcc318
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


Add long-doc-permutator workload to lmcache bench engine
Stress-tests blended KV cache reuse by sending all N! permutations of N synthetic context documents, exercising five axes: context boundary mixing, eviction,
chunk homogeneity, prefix domination, and concurrency.
Benchmark script ported from @sammshen PR #2885.
Changes
Note
Medium Risk
Adds a new async workload with its own concurrency and request-dispatch loop, plus new CLI/interactive config surface; issues would primarily affect benchmark execution and resource usage rather than core runtime logic.
Overview
Adds a new
long-doc-permutatorbenchmark workload tolmcache bench engine, which generates synthetic long contexts and sends multiple permutations of them (with configurable context count/length, system prompt length, permutation count, and in-flight concurrency).Wires the workload into the CLI and interactive config schema via new
--ldp-*flags and factory dispatch, and updates CSV export to auto-create the output directory before writing results. Includes a comprehensive new test suite covering config validation, prompt/permutation generation, dispatch behavior, and reproducibility.Written by Cursor Bugbot for commit 1356112. This will update automatically on new commits. Configure here.