[CLI] Implementation of lmcache bench engine by ApostaC · Pull Request #2889 · LMCache/LMCache

ApostaC · 2026-03-27T05:59:12Z

What this PR does / why we need it:

Adds the lmcache bench engine command — a sustained performance benchmarking tool for inference engines. It supports three workload types (long-doc-qa, multi-round-chat, random-prefill), interactive configuration, config file save/load, and real-time progress display.

Key features:

Three workloads: long-doc-qa (KV cache reuse testing), multi-round-chat (stateful QPS-controlled sessions), random-prefill (concurrent prefill-only)
Interactive mode: Step-by-step guided setup with arrow-key selection when required args are missing
Config file: Export/import JSON configs via --config for reproducible benchmarks
LMCache auto-detection: --lmcache-url auto-resolves tokens_per_gb_kvcache from a running LMCache server
Real-time display: ANSI terminal progress with live TTFT, decode speed, and throughput metrics
Output: Per-request CSV + aggregate JSON summary with P50/P90/P99 percentiles

Special notes for your reviewers:

The implementation follows a bottom-up modular architecture (config → stats → request_sender → progress → workloads → orchestrator → interactive). Each module has its own test file. Adding a new workload requires: (1) a config dataclass with resolve(), (2) a BaseWorkload subclass, (3) ConfigItem entries in schema.py, (4) a dispatch branch in create_workload(). See docs/design/cli/bench-engine.md for the full design doc.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Demo

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist

Code Review

This pull request introduces the lmcache bench engine command, a comprehensive benchmarking tool for inference engines. It features a modular architecture supporting multiple workloads, an interactive configuration mode, real-time progress monitoring, and detailed performance statistics. The implementation also extends the LMCache server to report KV cache size metrics. Review feedback identifies a violation of the project's encapsulation policy regarding private member access and suggests implementing a public callback registration method in the RequestSender class to improve modularity.

gemini-code-assist · 2026-03-27T06:01:03Z

+        request_sender._on_finished.extend(
+            [
+                lambda result, _text: stats_collector.on_request_finished(result),
+                lambda result, _text: progress_monitor.on_request_finished(
+                    result.request_id,
+                    result.successful,
+                ),
+                workload.request_finished,
+            ]


Accessing the private member _on_finished directly violates the project's style guide, which prohibits accessing _-prefixed attributes across class boundaries. This can lead to maintenance issues as the internal implementation of RequestSender becomes tightly coupled with this orchestrator.

To improve encapsulation and adhere to the style guide, please add a public method to the RequestSender class for adding callbacks.

For example, you could add the following method to lmcache/cli/commands/bench/engine_bench/request_sender.py:

def add_on_finished_callback(self, callback: OnFinishedCallback) -> None: """Register a callback to be invoked when a request finishes.""" self._on_finished.append(callback)

Suggested change

request_sender._on_finished.extend(

[

lambda result, _text: stats_collector.on_request_finished(result),

lambda result, _text: progress_monitor.on_request_finished(

result.request_id,

result.successful,

),

workload.request_finished,

]

# 5. Wire callbacks on sender

request_sender.add_on_finished_callback(

lambda result, _text: stats_collector.on_request_finished(result)

)

request_sender.add_on_finished_callback(

lambda result, _text: progress_monitor.on_request_finished(

result.request_id,

result.successful,

)

)

request_sender.add_on_finished_callback(workload.request_finished)

References

The code directly accesses a private member (_on_finished) of the RequestSender class from another module. The style guide explicitly forbids accessing _-prefixed attributes across class boundaries to maintain encapsulation and modularity. ^(link)

good catch, addressed

Signed-off-by: ApostaC <yihua98@uchicago.edu>

KuntaiDu · 2026-03-27T18:18:06Z

   # Launch the LMCache server (ZMQ + HTTP)
   lmcache server --host 0.0.0.0 --port 5555 --l1-size-gb 100 --eviction-policy LRU

+   # Run a benchmark against the engine


High-level comments: can we have a non-interactive mode (like in docker -it vs not adding this parameter) for bash usage?

KuntaiDu

Otherwise LGTM!

Signed-off-by: ApostaC <yihua98@uchicago.edu>

KuntaiDu

Check comments

KuntaiDu · 2026-03-27T18:37:34Z

@@ -0,0 +1,400 @@
+lmcache bench engine
+====================
+


@ApostaC can you also add your GIF at the top of this file?

KuntaiDu · 2026-03-27T18:40:16Z

+lmcache bench engine
+====================
+
+The ``lmcache bench engine`` command runs sustained performance benchmarks


It would be great if we can also print out the whole CLI command being executed before start benchmarking, so that user can copy-paste it and run it in the future.

we can do it in the future, added to my backlog

KuntaiDu · 2026-03-27T18:43:26Z

+   P90 TTFT (ms):                    587.21
+   P99 TTFT (ms):                    837.32
+   ------------------ Decoding Speed ---------------------
+   Mean decode (tok/s):              48.23


Maybe let's use ITL here? Because throughput metrics (e.g. tokens/s) has no standard definition of P95 percentile

remove P99 and P95

KuntaiDu · 2026-03-27T18:45:05Z

@@ -0,0 +1,281 @@
+# SPDX-License-Identifier: Apache-2.0


Would be great if we can have a quick doc on how to contribute workloads

It's in docs/design/cli/bench-engine.md

Signed-off-by: ApostaC <yihua98@uchicago.edu>

sammshen

LGTM!

* add lmcache bench engine command Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC added 7 commits March 27, 2026 01:46

[add] cache size per token in report status

c67694b

Signed-off-by: ApostaC <yihua98@uchicago.edu>

add lmcache bench engine command

1a5947b

Signed-off-by: ApostaC <yihua98@uchicago.edu>

add multi-round chatting and prefill random workload

48e80ca

Signed-off-by: ApostaC <yihua98@uchicago.edu>

add design and developer doc

04cd8f7

Signed-off-by: ApostaC <yihua98@uchicago.edu>

add interactive configuration module

c68ee44

Signed-off-by: ApostaC <yihua98@uchicago.edu>

tweak ux

52b5338

Signed-off-by: ApostaC <yihua98@uchicago.edu>

update user facing docs

c22e674

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist Bot reviewed Mar 27, 2026

View reviewed changes

ApostaC added 2 commits March 27, 2026 06:06

update requirements txt

deedc3a

Signed-off-by: ApostaC <yihua98@uchicago.edu>

address gemini tests

2a0a919

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC requested review from KuntaiDu and sammshen March 27, 2026 06:12

fix unit tests

0999d2d

Signed-off-by: ApostaC <yihua98@uchicago.edu>

KuntaiDu reviewed Mar 27, 2026

View reviewed changes

KuntaiDu approved these changes Mar 27, 2026

View reviewed changes

ApostaC added 2 commits March 27, 2026 20:09

add --export-config and --no-interactive

5c1b86c

Signed-off-by: ApostaC <yihua98@uchicago.edu>

add user facing docs

f078ce8

Signed-off-by: ApostaC <yihua98@uchicago.edu>

KuntaiDu reviewed Mar 27, 2026

View reviewed changes

address kuntai's comments

1bd4f57

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC enabled auto-merge (squash) March 27, 2026 23:52

github-actions Bot added the full Run comprehensive tests on this PR label Mar 27, 2026

sammshen approved these changes Mar 28, 2026

View reviewed changes

ApostaC merged commit 7f2554f into LMCache:dev Mar 28, 2026
34 checks passed

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[CLI] Implementation of lmcache bench engine (LMCache#2889)

1d541b0

* add lmcache bench engine command Signed-off-by: ApostaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[CLI] Implementation of lmcache bench engine (LMCache#2889)

2592f07

* add lmcache bench engine command Signed-off-by: ApostaC <yihua98@uchicago.edu>

Conversation

ApostaC commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ApostaC commented Mar 27, 2026 •

edited

Loading