Skip to content

[CLI]Add long-doc-permutator CLI bench workload#2937

Merged
ApostaC merged 3 commits intoLMCache:devfrom
deng451e:migrate_bench_to_cli
Apr 3, 2026
Merged

[CLI]Add long-doc-permutator CLI bench workload#2937
ApostaC merged 3 commits intoLMCache:devfrom
deng451e:migrate_bench_to_cli

Conversation

@deng451e
Copy link
Copy Markdown
Collaborator

@deng451e deng451e commented Apr 2, 2026

Add long-doc-permutator workload to lmcache bench engine

Stress-tests blended KV cache reuse by sending all N! permutations of N synthetic context documents, exercising five axes: context boundary mixing, eviction,
chunk homogeneity, prefix domination, and concurrency.

Benchmark script ported from @sammshen PR #2885.

Changes

  • long_doc_permutator.py — new workload + CLI wiring (--ldp-* flags, workload factory)

Note

Medium Risk
Adds a new async workload with its own concurrency and request-dispatch loop, plus new CLI/interactive config surface; issues would primarily affect benchmark execution and resource usage rather than core runtime logic.

Overview
Adds a new long-doc-permutator benchmark workload to lmcache bench engine, which generates synthetic long contexts and sends multiple permutations of them (with configurable context count/length, system prompt length, permutation count, and in-flight concurrency).

Wires the workload into the CLI and interactive config schema via new --ldp-* flags and factory dispatch, and updates CSV export to auto-create the output directory before writing results. Includes a comprehensive new test suite covering config validation, prompt/permutation generation, dispatch behavior, and reproducibility.

Written by Cursor Bugbot for commit 1356112. This will update automatically on new commits. Configure here.

@deng451e deng451e requested a review from sammshen April 2, 2026 21:41
@deng451e deng451e marked this pull request as ready for review April 2, 2026 21:41
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the long-doc-permutator workload to the engine benchmark, designed to stress-test KV cache reuse through document permutations. It also refactors the lmcache query CLI command to be self-contained, integrating RequestSender and implementing an automatic fallback to the completions endpoint when chat templates are missing. Feedback focuses on optimizing memory usage for large permutation sets, ensuring proper resource lifecycle management by removing a redundant run override, and refining exception handling for optional dependency imports.

Comment thread lmcache/cli/commands/query.py Outdated
Comment thread lmcache/cli/commands/bench/engine_bench/workloads/long_doc_permutator.py Outdated
@deng451e deng451e requested a review from ApostaC April 2, 2026 21:48
@sammshen sammshen added the full Run comprehensive tests on this PR label Apr 2, 2026
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! small comment on the long doc permutator

Comment thread lmcache/cli/commands/query.py Outdated
def _is_missing_chat_template_error(error: str) -> bool:
"""Return whether an error indicates missing tokenizer chat template."""
normalized = error.lower()
return "chat template" in normalized and "tokenizer" in normalized
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chat template error detection is too narrow for fallback

Medium Severity

_is_missing_chat_template_error requires both "chat template" and "tokenizer" in the error string, but the old _missing_chat_template matched on "chat template" alone (plus several other patterns). Common vLLM/engine errors like "No chat template found" or "This model does not have a chat template" lack "tokenizer", so the automatic retry from chat to completions mode won't trigger, causing queries to fail unnecessarily.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to the lmcache bench looks good to me in general! Please see other comments below.

Comment thread docs/design/cli/query-command.md
Comment thread lmcache/cli/prompt.py
Comment thread tests/cli/commands/test_query.py
Comment on lines +110 to +111
"long-doc-permutator",
"Permutations of context documents (stress-tests blended KV reuse)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the name and the description, the current one is not super clear.

My proposal for the description: Query the same set of long documents with different system prompts

No good ideas for the name. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I got it wrong! Is it something like query the same set of long documents with different orders?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just updated it to Query the same set of long documents with different orders to make it less confusing

Comment on lines +215 to +226
ConfigItem(
key="ldp_vocab_size",
display_name="Vocabulary size",
description=(
"Pool size for context word generation. "
"Smaller values increase chunk hash collision risk."
),
input_type="int",
default=8000,
condition=_workload_is("long-doc-permutator"),
phase=PHASE_WORKLOAD,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to expose this to users. This can be purely internal and hard-coded.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to hardcoded

Comment thread lmcache/cli/commands/bench/engine_bench/workloads/__init__.py
@deng451e deng451e force-pushed the migrate_bench_to_cli branch 2 times, most recently from 962b2a0 to 5284f7d Compare April 3, 2026 01:49
deng451e and others added 2 commits April 3, 2026 01:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: deng451e <838677410@qq.com>
@deng451e deng451e force-pushed the migrate_bench_to_cli branch from 5284f7d to cdcc318 Compare April 3, 2026 01:53
Signed-off-by: deng451e <838677410@qq.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC merged commit 6ceed5e into LMCache:dev Apr 3, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants