Skip to content

[LMCache CLI] Design and implementation of lmcache kvcache#2827

Merged
royyhuang merged 15 commits intoLMCache:devfrom
KuntaiDu:kuntai-kvcache
Mar 25, 2026
Merged

[LMCache CLI] Design and implementation of lmcache kvcache#2827
royyhuang merged 15 commits intoLMCache:devfrom
KuntaiDu:kuntai-kvcache

Conversation

@KuntaiDu
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu commented Mar 19, 2026

The initial design of lmcache kvcache. Please refer to the changed files for detail.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive design for a new lmcache kvcache command-line interface. The proposed CLI aims to empower users with fine-grained control over the Key-Value (KV) cache state for individual requests, facilitating operations such as inspecting, clearing, pinning, compressing, and ending sessions. By adhering to principles of HTTP-based communication and script-friendly output, this design provides a foundational framework for robust cache management and debugging within the LMCache ecosystem.

Highlights

  • New CLI Command Design: Introduced the design for the lmcache kvcache command-line interface, focusing on per-request KV cache management.
  • Core Sub-commands Defined: Defined five key sub-commands: info, clear, pin, compress, and end-session, each targeting specific KV cache operations.
  • Design Principles Established: Established principles including HTTP-only communication for management, per-request targeting, and script-friendly output with clear exit codes.
  • API Gaps and Implementation Plan: Identified necessary new HTTP endpoints for several sub-commands and outlined a phased implementation plan for the CLI.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design document for the lmcache kvcache CLI command. The document is well-structured, comprehensive, and clearly outlines the new functionality, including subcommands for inspecting, clearing, pinning, compressing, and ending sessions for KV caches on a per-request basis. The design thoughtfully considers script-friendliness with features like JSON output and specific exit codes. My feedback includes a couple of suggestions to further improve the scriptability of the JSON output and the user experience of the compress command.

Comment thread docs/design/cli/kvcache-command.md
Comment thread docs/design/cli/kvcache-command.md
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Just wondering what sub-command we already support for now? I suppose only clear?

Other small comments:

  • End-session should only be used by the serving engine, otherwise it may cause internal state inconsistency
  • Can we add a user-facing doc (docs/src/mp) for LMCache CLI as well?

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
@KuntaiDu KuntaiDu changed the title [LMCache CLI][Design] the design of lmcache kvcache [LMCache CLI] Design and implementation of lmcache kvcache Mar 20, 2026
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Comment thread docs/source/cli/kvcache.rst Outdated
Comment thread docs/source/cli/kvcache.rst Outdated
Common Patterns
---------------

**Check if a server is reachable before clearing:**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is using the destructive clear as a reachability check a good pattern?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the clear command is not intended to perform reachability check. The goal is that, in case where the clear command fails due to connectivity issue, the return value reflects this. I just updated the doc.

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! two small nits

Comment thread docs/design/cli/kvcache-command.md Outdated

Every sub-command requires one of these to identify the target KV cache:

- **`--request-id <id>`** (required) — identifies the request whose KV cache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since request id is required all the time, I feel it would be more convenient to just have lmcache kvcache <subcommand> <req_id>.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmcache kvcache clear does not take in request id. I will make that clear in the doc.

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Copy link
Copy Markdown
Contributor

@royyhuang royyhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@royyhuang royyhuang enabled auto-merge (squash) March 24, 2026 21:09
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this great feature.

@royyhuang royyhuang merged commit 130db2b into LMCache:dev Mar 25, 2026
33 of 34 checks passed
@KuntaiDu KuntaiDu deleted the kuntai-kvcache branch March 25, 2026 22:58
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
…#2827)

* initial design of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* changing of file

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add lmcache kvcache -h

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* clarify that lmcache kvcache info design is temporary

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* initial implementation of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* UX update

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* remove end-session

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add user-facing docs

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* update doc and fix comments

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* let request-id be append argument instead of --request-id

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

---------

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: Roy Huang <roy.y.huang@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…#2827)

* initial design of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* changing of file

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add lmcache kvcache -h

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* clarify that lmcache kvcache info design is temporary

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* initial implementation of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* UX update

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* remove end-session

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add user-facing docs

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* update doc and fix comments

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* let request-id be append argument instead of --request-id

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

---------

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: Roy Huang <roy.y.huang@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…#2827)

* initial design of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* changing of file

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add lmcache kvcache -h

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* clarify that lmcache kvcache info design is temporary

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* initial implementation of lmcache kvcache

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* UX update

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* remove end-session

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* add user-facing docs

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* update doc and fix comments

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

* let request-id be append argument instead of --request-id

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>

---------

Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: Roy Huang <roy.y.huang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants