[CLI] lmcache query engine subcommand#2846
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new lmcache query engine subcommand, providing functionality for single-shot inference queries with detailed performance metrics. The implementation is comprehensive, including prompt expansion, API endpoint fallback logic, and integration with the existing metrics framework. The changes are well-documented with a design document, updates to user guides, and tests for the main command class.
My review focuses on improving code quality, maintainability, and test coverage. I have identified a few areas for improvement: a missing docstring in violation of the style guide, an opportunity to refactor duplicated logic for better maintainability, and a minor grammatical correction in the design document. Most critically, the complex helper functions for token counting and prompt manipulation lack unit tests, which is a significant gap given their complexity and the project's testing standards. Addressing these points will enhance the robustness and long-term health of this new feature.
| _BUILTIN_CORPORA = { | ||
| "ffmpeg": ( | ||
| "ffmpeg — multimedia framework. Example: ffmpeg -i in.mp4 " | ||
| "-c:v libx264 out.mk4\n" | ||
| ), | ||
| } |
There was a problem hiding this comment.
Can we have a designated document folder for this?
| misc.pop(0) | ||
|
|
||
|
|
||
| def _stream( |
There was a problem hiding this comment.
Would be great if you could prompt-related logic into a separate file
| corpus_args: list[str], | ||
| model_id: str, | ||
| ) -> None: | ||
| metrics.add("prompt_tokens", "Prompt tokens", prompt_tokens) |
There was a problem hiding this comment.
When calculating prompt tokens, maybe let's use a default tokenizer (e.g. openai/gpt-oss-120b) and in section title maybe we can write
--- Prompt tokens (est by gpt-oss, w/o chat template) ----
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com>
6ab1525 to
b06b55d
Compare
Updated CLI documentation to reflect changes in command syntax. Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com>
| @@ -0,0 +1,11 @@ | |||
| LMCache is a high-performance key–value (KV) cache management system designed to accelerate large language model (LLM) inference by efficiently storing, transferring, and reusing intermediate attention states. As modern LLM serving increasingly becomes bottlenecked by memory bandwidth, redundant computation, and cross-device communication, LMCache provides a system-level solution that decouples KV cache storage from the model execution pipeline and enables scalable, low-latency reuse across requests, processes, and even distributed nodes. | |||
There was a problem hiding this comment.
Do we really need to maintain this in lmcache code repository? @deng451e
There was a problem hiding this comment.
This built-in test doc was suggested by Kuntai.
There was a problem hiding this comment.
Basically I found that in LMCache dev a common subroutine is to compile a long request to trigger LMCache KV cache offloading, so I guess it is good for us to have some long docs in LMCache so that we can easily construct and send long prompt, and check if LLM response makes sense.
| # Single inference query | ||
| $ lmcache query engine --url http://localhost:8000/v1 \ | ||
| --prompt "{ctx} What is the example usage of lmcache?" \ | ||
| --documents ctx=LMCache/lmcache/cli/documents/lmcache.txt \ |
There was a problem hiding this comment.
Just want to double check -- can we make sure that the lmcache document you put in can be directly referenced inside the prompt without supplying --documents parameter? For example
lmcache query engine --prompt "{lmcache} Summarize with lmcache" --url http://localhost:8000/v1
Ideally should just work.
There was a problem hiding this comment.
right now {lmcache} still needs a file path via --documents to work. I can add support for built-in docs so it works out of the box
* Add query-command.md Signed-off-by: deng451e <838677410@qq.com> * lmcache query CLI Command Design Signed-off-by: deng451e <838677410@qq.com> * add cli query commnad Signed-off-by: deng451e <838677410@qq.com> * split query module Signed-off-by: deng451e <838677410@qq.com> * decompose query engine module Signed-off-by: deng451e <838677410@qq.com> * cli lmcache query engine Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> * Fix command syntax in CLI usage example Updated CLI documentation to reflect changes in command syntax. Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> * Update query command documentation for lmcache Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> --------- Signed-off-by: deng451e <838677410@qq.com> Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
* Add query-command.md Signed-off-by: deng451e <838677410@qq.com> * lmcache query CLI Command Design Signed-off-by: deng451e <838677410@qq.com> * add cli query commnad Signed-off-by: deng451e <838677410@qq.com> * split query module Signed-off-by: deng451e <838677410@qq.com> * decompose query engine module Signed-off-by: deng451e <838677410@qq.com> * cli lmcache query engine Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> * Fix command syntax in CLI usage example Updated CLI documentation to reflect changes in command syntax. Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> * Update query command documentation for lmcache Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> --------- Signed-off-by: deng451e <838677410@qq.com> Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
* Add query-command.md Signed-off-by: deng451e <838677410@qq.com> * lmcache query CLI Command Design Signed-off-by: deng451e <838677410@qq.com> * add cli query commnad Signed-off-by: deng451e <838677410@qq.com> * split query module Signed-off-by: deng451e <838677410@qq.com> * decompose query engine module Signed-off-by: deng451e <838677410@qq.com> * cli lmcache query engine Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> * Fix command syntax in CLI usage example Updated CLI documentation to reflect changes in command syntax. Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> * Update query command documentation for lmcache Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> --------- Signed-off-by: deng451e <838677410@qq.com> Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
* Add query-command.md Signed-off-by: deng451e <838677410@qq.com> * lmcache query CLI Command Design Signed-off-by: deng451e <838677410@qq.com> * add cli query commnad Signed-off-by: deng451e <838677410@qq.com> * split query module Signed-off-by: deng451e <838677410@qq.com> * decompose query engine module Signed-off-by: deng451e <838677410@qq.com> * cli lmcache query engine Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> * Fix command syntax in CLI usage example Updated CLI documentation to reflect changes in command syntax. Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> * Update query command documentation for lmcache Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> --------- Signed-off-by: deng451e <838677410@qq.com> Signed-off-by: lmcache-ci-bot <lmcache-ci@example.com> Signed-off-by: deng451e <57919305+deng451e@users.noreply.github.com> Co-authored-by: lmcache-ci-bot <lmcache-ci@example.com>
Summary
lmcache query engine: expand{name}via--documents, send one OpenAI-compatible request, and report token breakdown + latency metrics (incl. input tokens from usage)prompt.py: placeholder expansion + tokenizer-based token estimationrequest.py: streaming request + usage/latency metricslmcache/cli/documents/lmcache.txt: built in text document for testdocs/design/cli/query-command.mddocs/source/getting_started/cli.rstdocs/source/developer_guide/cli.rsttests/cli/commands/test_query.py(args, mocked request, output, error handling)Test Plan
query enginequery kvcache(not implemented)