Skip to content

[CLI] lmcache query engine subcommand#2846

Merged
sammshen merged 9 commits intoLMCache:devfrom
deng451e:cli_lmcache_query
Mar 25, 2026
Merged

[CLI] lmcache query engine subcommand#2846
sammshen merged 9 commits intoLMCache:devfrom
deng451e:cli_lmcache_query

Conversation

@deng451e
Copy link
Copy Markdown
Collaborator

@deng451e deng451e commented Mar 22, 2026

Summary

  • Add lmcache query engine: expand {name} via --documents, send one OpenAI-compatible request, and report token breakdown + latency metrics (incl. input tokens from usage)
  • Implement helpers:
    • prompt.py: placeholder expansion + tokenizer-based token estimation
    • request.py: streaming request + usage/latency metrics
  • Document in:
    • lmcache/cli/documents/lmcache.txt: built in text document for test
    • docs/design/cli/query-command.md
    • docs/source/getting_started/cli.rst
    • docs/source/developer_guide/cli.rst
  • Add tests: tests/cli/commands/test_query.py (args, mocked request, output, error handling)

Test Plan

  • query engine
  • query kvcache (not implemented)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new lmcache query command-line interface (CLI) subcommand for querying both the serving engine and KV cache worker. It provides a single-shot query interface with metrics output, supports prompt expansion using external files, and includes comprehensive documentation and unit tests.

Highlights

  • New CLI Subcommand: Introduced the lmcache query engine subcommand, enabling prompt expansion using external files and reporting latency and token-level metrics.
  • Documentation: Provided documentation for the new subcommand in docs/design/cli/query-command.md, docs/source/getting_started/cli.rst, and docs/source/developer_guide/cli.rst.
  • Unit Tests: Added unit tests in tests/cli/commands/test_query.py, covering token-count helper contracts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new lmcache query engine subcommand, providing functionality for single-shot inference queries with detailed performance metrics. The implementation is comprehensive, including prompt expansion, API endpoint fallback logic, and integration with the existing metrics framework. The changes are well-documented with a design document, updates to user guides, and tests for the main command class.

My review focuses on improving code quality, maintainability, and test coverage. I have identified a few areas for improvement: a missing docstring in violation of the style guide, an opportunity to refactor duplicated logic for better maintainability, and a minor grammatical correction in the design document. Most critically, the complex helper functions for token counting and prompt manipulation lack unit tests, which is a significant gap given their complexity and the project's testing standards. Addressing these points will enhance the robustness and long-term health of this new feature.

Comment thread lmcache/cli/commands/query.py Outdated
Comment thread tests/cli/commands/test_query.py Outdated
Comment thread docs/design/cli/query-command.md
Comment thread lmcache/cli/commands/query.py Outdated
@deng451e deng451e requested review from KuntaiDu and sammshen March 23, 2026 22:43
Comment thread lmcache/cli/commands/query.py Outdated
Comment on lines +31 to +36
_BUILTIN_CORPORA = {
"ffmpeg": (
"ffmpeg — multimedia framework. Example: ffmpeg -i in.mp4 "
"-c:v libx264 out.mk4\n"
),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a designated document folder for this?

Comment thread lmcache/cli/commands/query.py Outdated
misc.pop(0)


def _stream(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if you could prompt-related logic into a separate file

Comment thread lmcache/cli/commands/query.py Outdated
corpus_args: list[str],
model_id: str,
) -> None:
metrics.add("prompt_tokens", "Prompt tokens", prompt_tokens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calculating prompt tokens, maybe let's use a default tokenizer (e.g. openai/gpt-oss-120b) and in section title maybe we can write
--- Prompt tokens (est by gpt-oss, w/o chat template) ----

@deng451e deng451e changed the title [CLI] Implement lmcache query kvcache subcommand [CLI] lmcache query engine subcommand Mar 23, 2026
deng451e and others added 6 commits March 24, 2026 19:13
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
@deng451e deng451e force-pushed the cli_lmcache_query branch from 6ab1525 to b06b55d Compare March 24, 2026 19:22
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Comment thread docs/source/getting_started/cli.rst Outdated
Updated CLI documentation to reflect changes in command syntax.

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,11 @@
LMCache is a high-performance key–value (KV) cache management system designed to accelerate large language model (LLM) inference by efficiently storing, transferring, and reusing intermediate attention states. As modern LLM serving increasingly becomes bottlenecked by memory bandwidth, redundant computation, and cross-device communication, LMCache provides a system-level solution that decouples KV cache storage from the model execution pipeline and enables scalable, low-latency reuse across requests, processes, and even distributed nodes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to maintain this in lmcache code repository? @deng451e

Copy link
Copy Markdown
Collaborator Author

@deng451e deng451e Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This built-in test doc was suggested by Kuntai.

Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I found that in LMCache dev a common subroutine is to compile a long request to trigger LMCache KV cache offloading, so I guess it is good for us to have some long docs in LMCache so that we can easily construct and send long prompt, and check if LLM response makes sense.

@sammshen sammshen enabled auto-merge (squash) March 25, 2026 06:49
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 25, 2026
# Single inference query
$ lmcache query engine --url http://localhost:8000/v1 \
--prompt "{ctx} What is the example usage of lmcache?" \
--documents ctx=LMCache/lmcache/cli/documents/lmcache.txt \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check -- can we make sure that the lmcache document you put in can be directly referenced inside the prompt without supplying --documents parameter? For example

lmcache query engine --prompt "{lmcache} Summarize with lmcache" --url http://localhost:8000/v1

Ideally should just work.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now {lmcache} still needs a file path via --documents to work. I can add support for built-in docs so it works out of the box

@sammshen sammshen merged commit e2cc19b into LMCache:dev Mar 25, 2026
33 of 35 checks passed
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
* Add query-command.md

Signed-off-by: deng451e <838677410@qq.com>

* lmcache query CLI Command Design

Signed-off-by: deng451e <838677410@qq.com>

* add cli query commnad

Signed-off-by: deng451e <838677410@qq.com>

* split query module

Signed-off-by: deng451e <838677410@qq.com>

* decompose query engine module

Signed-off-by: deng451e <838677410@qq.com>

* cli lmcache query engine

Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>

* Fix command syntax in CLI usage example

Updated CLI documentation to reflect changes in command syntax.

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

* Update query command documentation for lmcache

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
deng451e added a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
* Add query-command.md

Signed-off-by: deng451e <838677410@qq.com>

* lmcache query CLI Command Design

Signed-off-by: deng451e <838677410@qq.com>

* add cli query commnad

Signed-off-by: deng451e <838677410@qq.com>

* split query module

Signed-off-by: deng451e <838677410@qq.com>

* decompose query engine module

Signed-off-by: deng451e <838677410@qq.com>

* cli lmcache query engine

Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>

* Fix command syntax in CLI usage example

Updated CLI documentation to reflect changes in command syntax.

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

* Update query command documentation for lmcache

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add query-command.md

Signed-off-by: deng451e <838677410@qq.com>

* lmcache query CLI Command Design

Signed-off-by: deng451e <838677410@qq.com>

* add cli query commnad

Signed-off-by: deng451e <838677410@qq.com>

* split query module

Signed-off-by: deng451e <838677410@qq.com>

* decompose query engine module

Signed-off-by: deng451e <838677410@qq.com>

* cli lmcache query engine

Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>

* Fix command syntax in CLI usage example

Updated CLI documentation to reflect changes in command syntax.

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

* Update query command documentation for lmcache

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add query-command.md

Signed-off-by: deng451e <838677410@qq.com>

* lmcache query CLI Command Design

Signed-off-by: deng451e <838677410@qq.com>

* add cli query commnad

Signed-off-by: deng451e <838677410@qq.com>

* split query module

Signed-off-by: deng451e <838677410@qq.com>

* decompose query engine module

Signed-off-by: deng451e <838677410@qq.com>

* cli lmcache query engine

Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>

* Fix command syntax in CLI usage example

Updated CLI documentation to reflect changes in command syntax.

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

* Update query command documentation for lmcache

Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants