[CLI] Implement `lmcache describe kvcache` subcommand by royyhuang · Pull Request #2825 · LMCache/LMCache

royyhuang · 2026-03-19T20:51:48Z

Summary

Add lmcache describe kvcache --url <url> command (Phase 1 of the CLI roadmap)
Fetches /api/status from the HTTP server and displays health, engine type, chunk size, L1 memory usage, eviction policy, cached objects, and active sessions
Supports --format json and --output <path> via the existing Metrics framework
Change default HTTP port from 8000 to 8080
Use __class__.__name__ for engine_type in report_status() so subclasses report their own name
Design doc at docs/design/cli/describe.md
18 unit tests covering helpers, field extraction, error handling, and a real HTTP server round-trip

Test plan

pytest tests/cli/test_describe.py — 18 tests pass
Manual: lmcache describe kvcache --url http://localhost:8080 against a running HTTP server
Manual: --format json output has clean snake_case keys and raw values
Manual: connection refused / 503 errors print to stderr and exit 1

gemini-code-assist · 2026-03-19T20:52:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new command-line interface (CLI) subcommand, lmcache describe kvcache, designed to provide a detailed status overview of an LMCache KV cache service. This enhancement allows users to quickly ascertain the operational health and key metrics of their cache instances, improving observability and diagnostics. The implementation includes necessary server-side adjustments to expose required data and robust error handling for a reliable user experience.

Highlights

New CLI Subcommand: Implemented lmcache describe kvcache --url <url> to display the status of a running LMCache KV cache service.
Status Information: The command fetches /api/status and reports health, engine type, chunk size, L1 memory usage, eviction policy, cached objects, and active sessions.
Output Formatting: Supports --format json and --output <path> using the existing Metrics framework for flexible output.
Default HTTP Port Change: The default HTTP port for the LMCache service has been changed from 8000 to 8080.
Dynamic Engine Type Reporting: The report_status() method now uses __class__.__name__ to dynamically report the correct engine type for subclasses.
Comprehensive Testing: Includes 18 unit tests covering helpers, field extraction, error handling, and real HTTP server round-trip scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new lmcache describe kvcache CLI command, along with a detailed design document and comprehensive unit tests. The implementation is robust, with good error handling and clear structure. The default HTTP port is also updated from 8000 to 8080 to avoid common conflicts.

My review includes a suggestion on the design document to improve the machine-readability of the JSON output, and a minor refactoring suggestion in the test files to improve maintainability by centralizing imports. Overall, this is a solid contribution.

gemini-code-assist · 2026-03-19T20:53:37Z

+  "metrics": {
+    "health": "OK",
+    "zmq_endpoint": "tcp://localhost:5555",
+    "http_endpoint": "http://localhost:8000",
+    "engine_type": "blend",
+    "chunk_size": 256,
+    "l1_capacity_gb": 60.0,
+    "l1_used_gb": "42.30 (70.5%)",
+    "eviction_policy": "LRU",
+    "cached_objects": 1024,
+    "active_sessions": 3,
+    "uptime": "2h 14m 32s"
+  }


In the proposed JSON output, fields like l1_used_gb and uptime are formatted as human-readable strings. For a machine-readable format like JSON, it's standard practice to provide raw numerical values to allow consumers of the API to perform calculations or apply their own formatting.

I'd recommend changing the design to output raw data. For example:

{ "metrics": { ... "l1_used_bytes": 45415895859, "l1_capacity_bytes": 64424509440, "l1_usage_ratio": 0.705, "uptime_seconds": 8072, ... } }

If human-readable strings are desired, they could be provided in addition to the raw values.

gemini-code-assist · 2026-03-19T20:53:37Z

+    def test_field_extraction(self):
+        """Verify metrics are populated from the sample status dict."""
+        # First Party
+        from lmcache.cli.commands.describe import DescribeCommand


DescribeCommand is imported locally within this test method. This pattern is repeated in other tests in this file (e.g., test_unhealthy, test_missing_fields_show_na). To improve code clarity and adhere to standard Python style (PEP 8), please move this and other local imports like import io (found on line 142) to the top of the file.

Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com>

ApostaC

When thinking about lmcache bench, one of the feature requirements is to let lmcache describe return the KV cache shape or size information.
Ideally, something like the following will be enough for the benchmark tool to compute the KV cache size information:

{
  ### Other data in the json output ...
  "kv_cache_information": [
    {
      "model_name": "model A",
      "world_size": 2,
      "local_rank": 0,
      "num_kv_tensors": 32,
      "kv_tensor_shape": [2, 1000, 16, 4, 128],
      "dtype": "torch.bfloat16"
    },
    {
      "model_name": "model A",
      "world_size": 2,
      "local_rank": 1,
      "num_kv_tensors": 32,
      "kv_tensor_shape": [2, 1000, 16, 4, 128],
      "dtype": "torch.bfloat16"
    },
    {
      "model_name": "model B",
      "world_size": 1,
      "local_rank": 0,
      "num_kv_tensors": 64,
      "kv_tensor_shape": [2, 2000, 16, 8, 128],
      "dtype": "torch.bfloat16"
    }
  ]
}

No need to do that for now, but it's needed before the lmcache bench

Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

…ist sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com>

…ions Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

KuntaiDu

Workflow in general LGTM. Small nits listed in comments.

KuntaiDu · 2026-03-19T22:51:15Z

+Eviction policy:                         LRU
+Cached objects:                          1024
+Active sessions:                         3
+Uptime:                                  2h 14m 32s


kv cache shape & layout

========== DeepSeek =========
Registered KV caches: [2, T, Head, Hidden]
World size: 4

Open up a section

KuntaiDu · 2026-03-20T19:43:17Z

        return {
            "is_healthy": sm["is_healthy"],
-            "engine_type": "MPCacheEngine",
+            "engine_type": self.__class__.__name__,


Let's expose more information to lmcache describe

KuntaiDu · 2026-03-20T19:44:49Z

+        unhealthy_data = {**SAMPLE_STATUS, "is_healthy": False}
+        cmd = DescribeCommand()
+
+        class FakeArgs:


I don't fully understand this test -- what is it for?

Oh I see --- it is testing the case where response_status returns nothing

KuntaiDu · 2026-03-20T19:48:37Z

+        assert m["engine_type"] == "MPCacheEngine"
+        assert m["chunk_size"] == 256
+        assert m["l1_capacity_gb"] == 60.0
+        assert m["l1_used_gb"] == "42.30 (70.5%)"


Wondering why this l1_used_gb is a fixed value --- is it coded somewhere in the test?

Oh I see it is coded in the sampled trace. In this case I feel like the current way does not protect the semantics of report_status. Maybe we can add test around that.

KuntaiDu · 2026-03-20T22:38:25Z

Also add user-facing doc in docs/source

Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com>

* [CLI] Implement `lmcache describe kvcache` subcommand Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: add per-model KV cache layout to describe kvcache Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com> * fix: update test_describe.py for renamed public helpers and list sections Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * docs: add user-facing documentation for lmcache describe kvcache Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>

gemini-code-assist Bot reviewed Mar 19, 2026

View reviewed changes

royyhuang force-pushed the feat/lmcache-cli-describe branch from 900ad73 to 3d69d1b Compare March 19, 2026 20:59

royyhuang requested a review from KuntaiDu March 19, 2026 21:03

ApostaC approved these changes Mar 20, 2026

View reviewed changes

Comment thread docs/design/cli/describe.md

royyhuang added 4 commits March 20, 2026 14:14

Merge branch 'dev' into feat/lmcache-cli-describe

499c4f9

KuntaiDu approved these changes Mar 20, 2026

View reviewed changes

royyhuang enabled auto-merge (squash) March 20, 2026 23:46

github-actions Bot added the full Run comprehensive tests on this PR label Mar 20, 2026

Merge branch 'dev' into feat/lmcache-cli-describe

d8ce591

sammshen added the enhancement New feature or request label Mar 23, 2026

royyhuang merged commit bb2043e into LMCache:dev Mar 23, 2026
26 checks passed

Conversation

royyhuang commented Mar 19, 2026 • edited by deng451e Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

royyhuang commented Mar 19, 2026 •

edited by deng451e

Loading