Skip to content

[CLI] Implement lmcache describe kvcache subcommand#2825

Merged
royyhuang merged 7 commits intoLMCache:devfrom
royyhuang:feat/lmcache-cli-describe
Mar 23, 2026
Merged

[CLI] Implement lmcache describe kvcache subcommand#2825
royyhuang merged 7 commits intoLMCache:devfrom
royyhuang:feat/lmcache-cli-describe

Conversation

@royyhuang
Copy link
Copy Markdown
Contributor

@royyhuang royyhuang commented Mar 19, 2026

Summary

  • Add lmcache describe kvcache --url <url> command (Phase 1 of the CLI roadmap)
  • Fetches /api/status from the HTTP server and displays health, engine type, chunk size, L1 memory usage, eviction policy, cached objects, and active sessions
  • Supports --format json and --output <path> via the existing Metrics framework
  • Change default HTTP port from 8000 to 8080
  • Use __class__.__name__ for engine_type in report_status() so subclasses report their own name
  • Design doc at docs/design/cli/describe.md
  • 18 unit tests covering helpers, field extraction, error handling, and a real HTTP server round-trip

Test plan

  • pytest tests/cli/test_describe.py — 18 tests pass
  • Manual: lmcache describe kvcache --url http://localhost:8080 against a running HTTP server
  • Manual: --format json output has clean snake_case keys and raw values
  • Manual: connection refused / 503 errors print to stderr and exit 1

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new command-line interface (CLI) subcommand, lmcache describe kvcache, designed to provide a detailed status overview of an LMCache KV cache service. This enhancement allows users to quickly ascertain the operational health and key metrics of their cache instances, improving observability and diagnostics. The implementation includes necessary server-side adjustments to expose required data and robust error handling for a reliable user experience.

Highlights

  • New CLI Subcommand: Implemented lmcache describe kvcache --url <url> to display the status of a running LMCache KV cache service.
  • Status Information: The command fetches /api/status and reports health, engine type, chunk size, L1 memory usage, eviction policy, cached objects, and active sessions.
  • Output Formatting: Supports --format json and --output <path> using the existing Metrics framework for flexible output.
  • Default HTTP Port Change: The default HTTP port for the LMCache service has been changed from 8000 to 8080.
  • Dynamic Engine Type Reporting: The report_status() method now uses __class__.__name__ to dynamically report the correct engine type for subclasses.
  • Comprehensive Testing: Includes 18 unit tests covering helpers, field extraction, error handling, and real HTTP server round-trip scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new lmcache describe kvcache CLI command, along with a detailed design document and comprehensive unit tests. The implementation is robust, with good error handling and clear structure. The default HTTP port is also updated from 8000 to 8080 to avoid common conflicts.

My review includes a suggestion on the design document to improve the machine-readability of the JSON output, and a minor refactoring suggestion in the test files to improve maintainability by centralizing imports. Overall, this is a solid contribution.

Comment on lines +38 to +50
"metrics": {
"health": "OK",
"zmq_endpoint": "tcp://localhost:5555",
"http_endpoint": "http://localhost:8000",
"engine_type": "blend",
"chunk_size": 256,
"l1_capacity_gb": 60.0,
"l1_used_gb": "42.30 (70.5%)",
"eviction_policy": "LRU",
"cached_objects": 1024,
"active_sessions": 3,
"uptime": "2h 14m 32s"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the proposed JSON output, fields like l1_used_gb and uptime are formatted as human-readable strings. For a machine-readable format like JSON, it's standard practice to provide raw numerical values to allow consumers of the API to perform calculations or apply their own formatting.

I'd recommend changing the design to output raw data. For example:

{
  "metrics": {
    ...
    "l1_used_bytes": 45415895859,
    "l1_capacity_bytes": 64424509440,
    "l1_usage_ratio": 0.705,
    "uptime_seconds": 8072,
    ...
  }
}

If human-readable strings are desired, they could be provided in addition to the raw values.

def test_field_extraction(self):
"""Verify metrics are populated from the sample status dict."""
# First Party
from lmcache.cli.commands.describe import DescribeCommand
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

DescribeCommand is imported locally within this test method. This pattern is repeated in other tests in this file (e.g., test_unhealthy, test_missing_fields_show_na). To improve code clarity and adhere to standard Python style (PEP 8), please move this and other local imports like import io (found on line 142) to the top of the file.

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches
status data from the HTTP server's /api/status endpoint and presents
it as a formatted table or JSON.

Changes:
- New `DescribeCommand` in lmcache/cli/commands/describe.py
- Register command in commands/__init__.py
- Change default HTTP port from 8000 to 8080
- Use `__class__.__name__` for engine_type in report_status()
- Design doc: docs/design/cli/describe.md
- Unit tests: tests/cli/test_describe.py (18 tests)

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
@royyhuang royyhuang force-pushed the feat/lmcache-cli-describe branch from 900ad73 to 3d69d1b Compare March 19, 2026 20:59
@royyhuang royyhuang requested a review from KuntaiDu March 19, 2026 21:03
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about lmcache bench, one of the feature requirements is to let lmcache describe return the KV cache shape or size information.
Ideally, something like the following will be enough for the benchmark tool to compute the KV cache size information:

{
  ### Other data in the json output ...
  "kv_cache_information": [
    {
      "model_name": "model A",
      "world_size": 2,
      "local_rank": 0,
      "num_kv_tensors": 32,
      "kv_tensor_shape": [2, 1000, 16, 4, 128],
      "dtype": "torch.bfloat16"
    },
    {
      "model_name": "model A",
      "world_size": 2,
      "local_rank": 1,
      "num_kv_tensors": 32,
      "kv_tensor_shape": [2, 1000, 16, 4, 128],
      "dtype": "torch.bfloat16"
    },
    {
      "model_name": "model B",
      "world_size": 1,
      "local_rank": 0,
      "num_kv_tensors": 64,
      "kv_tensor_shape": [2, 2000, 16, 8, 128],
      "dtype": "torch.bfloat16"
    }
  ]
}

No need to do that for now, but it's needed before the lmcache bench

Comment thread docs/design/cli/describe.md
Enrich report_status() to include kv_cache_layout per GPU context
(num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks).
Display each registered model as its own section in describe kvcache.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
…ist sections

- Add GPU KV shape (symbolic + concrete), attention backend, and format
  name to per-model sections via new utils (get_gpu_kv_shape_description,
  get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext
  properties
- Add L2 adapter sections showing type, health, backend, stored objects,
  size/capacity, and pool utilization
- Add list section support to Metrics framework (add_list_section) so
  models and L2 adapters render as JSON arrays instead of keyed dicts
- Refactor describe.py into KVCacheDescriber class with composable
  section builders (add_overview, add_l1_storage, add_models,
  add_l2_adapters) decoupled from DescribeCommand
- Update design doc with new output examples and server-side changes

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
…ions

Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their
public counterparts. Updated model section assertions to use the new
"models" list format from add_list_section.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow in general LGTM. Small nits listed in comments.

Eviction policy: LRU
Cached objects: 1024
Active sessions: 3
Uptime: 2h 14m 32s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kv cache shape & layout

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

========== DeepSeek =========
Registered KV caches: [2, T, Head, Hidden]
World size: 4

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open up a section

return {
"is_healthy": sm["is_healthy"],
"engine_type": "MPCacheEngine",
"engine_type": self.__class__.__name__,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's expose more information to lmcache describe

unhealthy_data = {**SAMPLE_STATUS, "is_healthy": False}
cmd = DescribeCommand()

class FakeArgs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this test -- what is it for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see --- it is testing the case where response_status returns nothing

assert m["engine_type"] == "MPCacheEngine"
assert m["chunk_size"] == 256
assert m["l1_capacity_gb"] == 60.0
assert m["l1_used_gb"] == "42.30 (70.5%)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why this l1_used_gb is a fixed value --- is it coded somewhere in the test?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see it is coded in the sampled trace. In this case I feel like the current way does not protect the semantics of report_status. Maybe we can add test around that.

@KuntaiDu
Copy link
Copy Markdown
Contributor

Also add user-facing doc in docs/source

Add describe command to CLI reference with terminal and JSON output
examples, argument reference, per-model GPU KV shape details, L2
adapter sections, and GPU KV shape abbreviation table.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
@royyhuang royyhuang enabled auto-merge (squash) March 20, 2026 23:46
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 20, 2026
@sammshen sammshen added the enhancement New feature or request label Mar 23, 2026
@royyhuang royyhuang merged commit bb2043e into LMCache:dev Mar 23, 2026
26 checks passed
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
* [CLI] Implement `lmcache describe kvcache` subcommand

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches
status data from the HTTP server's /api/status endpoint and presents
it as a formatted table or JSON.

Changes:
- New `DescribeCommand` in lmcache/cli/commands/describe.py
- Register command in commands/__init__.py
- Change default HTTP port from 8000 to 8080
- Use `__class__.__name__` for engine_type in report_status()
- Design doc: docs/design/cli/describe.md
- Unit tests: tests/cli/test_describe.py (18 tests)

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: add per-model KV cache layout to describe kvcache

Enrich report_status() to include kv_cache_layout per GPU context
(num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks).
Display each registered model as its own section in describe kvcache.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections

- Add GPU KV shape (symbolic + concrete), attention backend, and format
  name to per-model sections via new utils (get_gpu_kv_shape_description,
  get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext
  properties
- Add L2 adapter sections showing type, health, backend, stored objects,
  size/capacity, and pool utilization
- Add list section support to Metrics framework (add_list_section) so
  models and L2 adapters render as JSON arrays instead of keyed dicts
- Refactor describe.py into KVCacheDescriber class with composable
  section builders (add_overview, add_l1_storage, add_models,
  add_l2_adapters) decoupled from DescribeCommand
- Update design doc with new output examples and server-side changes

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* fix: update test_describe.py for renamed public helpers and list sections

Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their
public counterparts. Updated model section assertions to use the new
"models" list format from add_list_section.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* docs: add user-facing documentation for lmcache describe kvcache

Add describe command to CLI reference with terminal and JSON output
examples, argument reference, per-model GPU KV shape details, L2
adapter sections, and GPU KV shape abbreviation table.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

---------

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
* [CLI] Implement `lmcache describe kvcache` subcommand

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches
status data from the HTTP server's /api/status endpoint and presents
it as a formatted table or JSON.

Changes:
- New `DescribeCommand` in lmcache/cli/commands/describe.py
- Register command in commands/__init__.py
- Change default HTTP port from 8000 to 8080
- Use `__class__.__name__` for engine_type in report_status()
- Design doc: docs/design/cli/describe.md
- Unit tests: tests/cli/test_describe.py (18 tests)

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: add per-model KV cache layout to describe kvcache

Enrich report_status() to include kv_cache_layout per GPU context
(num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks).
Display each registered model as its own section in describe kvcache.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections

- Add GPU KV shape (symbolic + concrete), attention backend, and format
  name to per-model sections via new utils (get_gpu_kv_shape_description,
  get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext
  properties
- Add L2 adapter sections showing type, health, backend, stored objects,
  size/capacity, and pool utilization
- Add list section support to Metrics framework (add_list_section) so
  models and L2 adapters render as JSON arrays instead of keyed dicts
- Refactor describe.py into KVCacheDescriber class with composable
  section builders (add_overview, add_l1_storage, add_models,
  add_l2_adapters) decoupled from DescribeCommand
- Update design doc with new output examples and server-side changes

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* fix: update test_describe.py for renamed public helpers and list sections

Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their
public counterparts. Updated model section assertions to use the new
"models" list format from add_list_section.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* docs: add user-facing documentation for lmcache describe kvcache

Add describe command to CLI reference with terminal and JSON output
examples, argument reference, per-model GPU KV shape details, L2
adapter sections, and GPU KV shape abbreviation table.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

---------

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* [CLI] Implement `lmcache describe kvcache` subcommand

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches
status data from the HTTP server's /api/status endpoint and presents
it as a formatted table or JSON.

Changes:
- New `DescribeCommand` in lmcache/cli/commands/describe.py
- Register command in commands/__init__.py
- Change default HTTP port from 8000 to 8080
- Use `__class__.__name__` for engine_type in report_status()
- Design doc: docs/design/cli/describe.md
- Unit tests: tests/cli/test_describe.py (18 tests)

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: add per-model KV cache layout to describe kvcache

Enrich report_status() to include kv_cache_layout per GPU context
(num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks).
Display each registered model as its own section in describe kvcache.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections

- Add GPU KV shape (symbolic + concrete), attention backend, and format
  name to per-model sections via new utils (get_gpu_kv_shape_description,
  get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext
  properties
- Add L2 adapter sections showing type, health, backend, stored objects,
  size/capacity, and pool utilization
- Add list section support to Metrics framework (add_list_section) so
  models and L2 adapters render as JSON arrays instead of keyed dicts
- Refactor describe.py into KVCacheDescriber class with composable
  section builders (add_overview, add_l1_storage, add_models,
  add_l2_adapters) decoupled from DescribeCommand
- Update design doc with new output examples and server-side changes

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* fix: update test_describe.py for renamed public helpers and list sections

Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their
public counterparts. Updated model section assertions to use the new
"models" list format from add_list_section.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* docs: add user-facing documentation for lmcache describe kvcache

Add describe command to CLI reference with terminal and JSON output
examples, argument reference, per-model GPU KV shape details, L2
adapter sections, and GPU KV shape abbreviation table.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

---------

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* [CLI] Implement `lmcache describe kvcache` subcommand

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches
status data from the HTTP server's /api/status endpoint and presents
it as a formatted table or JSON.

Changes:
- New `DescribeCommand` in lmcache/cli/commands/describe.py
- Register command in commands/__init__.py
- Change default HTTP port from 8000 to 8080
- Use `__class__.__name__` for engine_type in report_status()
- Design doc: docs/design/cli/describe.md
- Unit tests: tests/cli/test_describe.py (18 tests)

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: add per-model KV cache layout to describe kvcache

Enrich report_status() to include kv_cache_layout per GPU context
(num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks).
Display each registered model as its own section in describe kvcache.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections

- Add GPU KV shape (symbolic + concrete), attention backend, and format
  name to per-model sections via new utils (get_gpu_kv_shape_description,
  get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext
  properties
- Add L2 adapter sections showing type, health, backend, stored objects,
  size/capacity, and pool utilization
- Add list section support to Metrics framework (add_list_section) so
  models and L2 adapters render as JSON arrays instead of keyed dicts
- Refactor describe.py into KVCacheDescriber class with composable
  section builders (add_overview, add_l1_storage, add_models,
  add_l2_adapters) decoupled from DescribeCommand
- Update design doc with new output examples and server-side changes

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* fix: update test_describe.py for renamed public helpers and list sections

Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their
public counterparts. Updated model section assertions to use the new
"models" list format from add_list_section.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* docs: add user-facing documentation for lmcache describe kvcache

Add describe command to CLI reference with terminal and JSON output
examples, argument reference, per-model GPU KV shape details, L2
adapter sections, and GPU KV shape abbreviation table.

Signed-off-by: royyhuang <roy.y.huang@gmail.com>

---------

Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants