Skip to content

[CLI] Implementation of lmcache bench engine#2889

Merged
ApostaC merged 13 commits intoLMCache:devfrom
ApostaC:local-dev/lmcache-bench
Mar 28, 2026
Merged

[CLI] Implementation of lmcache bench engine#2889
ApostaC merged 13 commits intoLMCache:devfrom
ApostaC:local-dev/lmcache-bench

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Mar 27, 2026

What this PR does / why we need it:

Adds the lmcache bench engine command — a sustained performance benchmarking tool for inference engines. It supports three workload types (long-doc-qa, multi-round-chat, random-prefill), interactive configuration, config file save/load, and real-time progress display.

Key features:

  • Three workloads: long-doc-qa (KV cache reuse testing), multi-round-chat (stateful QPS-controlled sessions), random-prefill (concurrent prefill-only)
  • Interactive mode: Step-by-step guided setup with arrow-key selection when required args are missing
  • Config file: Export/import JSON configs via --config for reproducible benchmarks
  • LMCache auto-detection: --lmcache-url auto-resolves tokens_per_gb_kvcache from a running LMCache server
  • Real-time display: ANSI terminal progress with live TTFT, decode speed, and throughput metrics
  • Output: Per-request CSV + aggregate JSON summary with P50/P90/P99 percentiles

Special notes for your reviewers:

The implementation follows a bottom-up modular architecture (config → stats → request_sender → progress → workloads → orchestrator → interactive). Each module has its own test file. Adding a new workload requires: (1) a config dataclass with resolve(), (2) a BaseWorkload subclass, (3) ConfigItem entries in schema.py, (4) a dispatch branch in create_workload(). See docs/design/cli/bench-engine.md for the full design doc.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Demo

demo

ApostaC added 7 commits March 27, 2026 01:46
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the lmcache bench engine command, a comprehensive benchmarking tool for inference engines. It features a modular architecture supporting multiple workloads, an interactive configuration mode, real-time progress monitoring, and detailed performance statistics. The implementation also extends the LMCache server to report KV cache size metrics. Review feedback identifies a violation of the project's encapsulation policy regarding private member access and suggests implementing a public callback registration method in the RequestSender class to improve modularity.

Comment thread lmcache/cli/commands/bench/__init__.py Outdated
Comment on lines +326 to +334
request_sender._on_finished.extend(
[
lambda result, _text: stats_collector.on_request_finished(result),
lambda result, _text: progress_monitor.on_request_finished(
result.request_id,
result.successful,
),
workload.request_finished,
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing the private member _on_finished directly violates the project's style guide, which prohibits accessing _-prefixed attributes across class boundaries. This can lead to maintenance issues as the internal implementation of RequestSender becomes tightly coupled with this orchestrator.

To improve encapsulation and adhere to the style guide, please add a public method to the RequestSender class for adding callbacks.

For example, you could add the following method to lmcache/cli/commands/bench/engine_bench/request_sender.py:

    def add_on_finished_callback(self, callback: OnFinishedCallback) -> None:
        """Register a callback to be invoked when a request finishes."""
        self._on_finished.append(callback)
Suggested change
request_sender._on_finished.extend(
[
lambda result, _text: stats_collector.on_request_finished(result),
lambda result, _text: progress_monitor.on_request_finished(
result.request_id,
result.successful,
),
workload.request_finished,
]
# 5. Wire callbacks on sender
request_sender.add_on_finished_callback(
lambda result, _text: stats_collector.on_request_finished(result)
)
request_sender.add_on_finished_callback(
lambda result, _text: progress_monitor.on_request_finished(
result.request_id,
result.successful,
)
)
request_sender.add_on_finished_callback(workload.request_finished)
References
  1. The code directly accesses a private member (_on_finished) of the RequestSender class from another module. The style guide explicitly forbids accessing _-prefixed attributes across class boundaries to maintain encapsulation and modularity. (link)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, addressed

ApostaC added 2 commits March 27, 2026 06:06
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC requested review from KuntaiDu and sammshen March 27, 2026 06:12
Signed-off-by: ApostaC <yihua98@uchicago.edu>
# Launch the LMCache server (ZMQ + HTTP)
lmcache server --host 0.0.0.0 --port 5555 --l1-size-gb 100 --eviction-policy LRU

# Run a benchmark against the engine
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comments: can we have a non-interactive mode (like in docker -it vs not adding this parameter) for bash usage?

Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM!

ApostaC added 2 commits March 27, 2026 20:09
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check comments

Comment thread docs/source/cli/bench.rst
@@ -0,0 +1,400 @@
lmcache bench engine
====================

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ApostaC can you also add your GIF at the top of this file?

Comment thread docs/source/cli/bench.rst
lmcache bench engine
====================

The ``lmcache bench engine`` command runs sustained performance benchmarks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we can also print out the whole CLI command being executed before start benchmarking, so that user can copy-paste it and run it in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do it in the future, added to my backlog

Comment thread docs/source/cli/bench.rst
P90 TTFT (ms): 587.21
P99 TTFT (ms): 837.32
------------------ Decoding Speed ---------------------
Mean decode (tok/s): 48.23
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's use ITL here? Because throughput metrics (e.g. tokens/s) has no standard definition of P95 percentile

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove P99 and P95

@@ -0,0 +1,281 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we can have a quick doc on how to contribute workloads

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in docs/design/cli/bench-engine.md

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC enabled auto-merge (squash) March 27, 2026 23:52
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 27, 2026
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC merged commit 7f2554f into LMCache:dev Mar 28, 2026
34 checks passed
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* add lmcache bench engine command

Signed-off-by: ApostaC <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* add lmcache bench engine command

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants