[CLI] Implement lmcache describe kvcache subcommand#2825
[CLI] Implement lmcache describe kvcache subcommand#2825royyhuang merged 7 commits intoLMCache:devfrom
lmcache describe kvcache subcommand#2825Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new command-line interface (CLI) subcommand, Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new lmcache describe kvcache CLI command, along with a detailed design document and comprehensive unit tests. The implementation is robust, with good error handling and clear structure. The default HTTP port is also updated from 8000 to 8080 to avoid common conflicts.
My review includes a suggestion on the design document to improve the machine-readability of the JSON output, and a minor refactoring suggestion in the test files to improve maintainability by centralizing imports. Overall, this is a solid contribution.
| "metrics": { | ||
| "health": "OK", | ||
| "zmq_endpoint": "tcp://localhost:5555", | ||
| "http_endpoint": "http://localhost:8000", | ||
| "engine_type": "blend", | ||
| "chunk_size": 256, | ||
| "l1_capacity_gb": 60.0, | ||
| "l1_used_gb": "42.30 (70.5%)", | ||
| "eviction_policy": "LRU", | ||
| "cached_objects": 1024, | ||
| "active_sessions": 3, | ||
| "uptime": "2h 14m 32s" | ||
| } |
There was a problem hiding this comment.
In the proposed JSON output, fields like l1_used_gb and uptime are formatted as human-readable strings. For a machine-readable format like JSON, it's standard practice to provide raw numerical values to allow consumers of the API to perform calculations or apply their own formatting.
I'd recommend changing the design to output raw data. For example:
{
"metrics": {
...
"l1_used_bytes": 45415895859,
"l1_capacity_bytes": 64424509440,
"l1_usage_ratio": 0.705,
"uptime_seconds": 8072,
...
}
}If human-readable strings are desired, they could be provided in addition to the raw values.
| def test_field_extraction(self): | ||
| """Verify metrics are populated from the sample status dict.""" | ||
| # First Party | ||
| from lmcache.cli.commands.describe import DescribeCommand |
There was a problem hiding this comment.
DescribeCommand is imported locally within this test method. This pattern is repeated in other tests in this file (e.g., test_unhealthy, test_missing_fields_show_na). To improve code clarity and adhere to standard Python style (PEP 8), please move this and other local imports like import io (found on line 142) to the top of the file.
Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com>
900ad73 to
3d69d1b
Compare
ApostaC
left a comment
There was a problem hiding this comment.
When thinking about lmcache bench, one of the feature requirements is to let lmcache describe return the KV cache shape or size information.
Ideally, something like the following will be enough for the benchmark tool to compute the KV cache size information:
{
### Other data in the json output ...
"kv_cache_information": [
{
"model_name": "model A",
"world_size": 2,
"local_rank": 0,
"num_kv_tensors": 32,
"kv_tensor_shape": [2, 1000, 16, 4, 128],
"dtype": "torch.bfloat16"
},
{
"model_name": "model A",
"world_size": 2,
"local_rank": 1,
"num_kv_tensors": 32,
"kv_tensor_shape": [2, 1000, 16, 4, 128],
"dtype": "torch.bfloat16"
},
{
"model_name": "model B",
"world_size": 1,
"local_rank": 0,
"num_kv_tensors": 64,
"kv_tensor_shape": [2, 2000, 16, 8, 128],
"dtype": "torch.bfloat16"
}
]
}No need to do that for now, but it's needed before the lmcache bench
Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
…ist sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com>
…ions Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
KuntaiDu
left a comment
There was a problem hiding this comment.
Workflow in general LGTM. Small nits listed in comments.
| Eviction policy: LRU | ||
| Cached objects: 1024 | ||
| Active sessions: 3 | ||
| Uptime: 2h 14m 32s |
There was a problem hiding this comment.
========== DeepSeek =========
Registered KV caches: [2, T, Head, Hidden]
World size: 4
| return { | ||
| "is_healthy": sm["is_healthy"], | ||
| "engine_type": "MPCacheEngine", | ||
| "engine_type": self.__class__.__name__, |
There was a problem hiding this comment.
Let's expose more information to lmcache describe
| unhealthy_data = {**SAMPLE_STATUS, "is_healthy": False} | ||
| cmd = DescribeCommand() | ||
|
|
||
| class FakeArgs: |
There was a problem hiding this comment.
I don't fully understand this test -- what is it for?
There was a problem hiding this comment.
Oh I see --- it is testing the case where response_status returns nothing
| assert m["engine_type"] == "MPCacheEngine" | ||
| assert m["chunk_size"] == 256 | ||
| assert m["l1_capacity_gb"] == 60.0 | ||
| assert m["l1_used_gb"] == "42.30 (70.5%)" |
There was a problem hiding this comment.
Wondering why this l1_used_gb is a fixed value --- is it coded somewhere in the test?
There was a problem hiding this comment.
Oh I see it is coded in the sampled trace. In this case I feel like the current way does not protect the semantics of report_status. Maybe we can add test around that.
|
Also add user-facing doc in docs/source |
Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [CLI] Implement `lmcache describe kvcache` subcommand Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: add per-model KV cache layout to describe kvcache Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com> * fix: update test_describe.py for renamed public helpers and list sections Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * docs: add user-facing documentation for lmcache describe kvcache Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [CLI] Implement `lmcache describe kvcache` subcommand Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: add per-model KV cache layout to describe kvcache Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com> * fix: update test_describe.py for renamed public helpers and list sections Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * docs: add user-facing documentation for lmcache describe kvcache Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [CLI] Implement `lmcache describe kvcache` subcommand Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: add per-model KV cache layout to describe kvcache Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com> * fix: update test_describe.py for renamed public helpers and list sections Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * docs: add user-facing documentation for lmcache describe kvcache Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [CLI] Implement `lmcache describe kvcache` subcommand Add the `describe` command (Phase 1 of the CLI roadmap) that fetches status data from the HTTP server's /api/status endpoint and presents it as a formatted table or JSON. Changes: - New `DescribeCommand` in lmcache/cli/commands/describe.py - Register command in commands/__init__.py - Change default HTTP port from 8000 to 8080 - Use `__class__.__name__` for engine_type in report_status() - Design doc: docs/design/cli/describe.md - Unit tests: tests/cli/test_describe.py (18 tests) Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: add per-model KV cache layout to describe kvcache Enrich report_status() to include kv_cache_layout per GPU context (num_layers, block_size, hidden_dim_size, dtype, is_mla, num_blocks). Display each registered model as its own section in describe kvcache. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * feat: enhance describe kvcache with GPU KV format, L2 adapters, and list sections - Add GPU KV shape (symbolic + concrete), attention backend, and format name to per-model sections via new utils (get_gpu_kv_shape_description, get_attention_backend, get_concrete_gpu_kv_shape) and GPUCacheContext properties - Add L2 adapter sections showing type, health, backend, stored objects, size/capacity, and pool utilization - Add list section support to Metrics framework (add_list_section) so models and L2 adapters render as JSON arrays instead of keyed dicts - Refactor describe.py into KVCacheDescriber class with composable section builders (add_overview, add_l1_storage, add_models, add_l2_adapters) decoupled from DescribeCommand - Update design doc with new output examples and server-side changes Signed-off-by: royyhuang <roy.y.huang@gmail.com> * fix: update test_describe.py for renamed public helpers and list sections Renamed _normalize_url, _fetch_json, _safe_get, _fmt_used_gb to their public counterparts. Updated model section assertions to use the new "models" list format from add_list_section. Signed-off-by: royyhuang <roy.y.huang@gmail.com> * docs: add user-facing documentation for lmcache describe kvcache Add describe command to CLI reference with terminal and JSON output examples, argument reference, per-model GPU KV shape details, L2 adapter sections, and GPU KV shape abbreviation table. Signed-off-by: royyhuang <roy.y.huang@gmail.com> --------- Signed-off-by: royyhuang <roy.y.huang@gmail.com>
Summary
lmcache describe kvcache --url <url>command (Phase 1 of the CLI roadmap)/api/statusfrom the HTTP server and displays health, engine type, chunk size, L1 memory usage, eviction policy, cached objects, and active sessions--format jsonand--output <path>via the existing Metrics framework__class__.__name__forengine_typeinreport_status()so subclasses report their own namedocs/design/cli/describe.mdTest plan
pytest tests/cli/test_describe.py— 18 tests passlmcache describe kvcache --url http://localhost:8080against a running HTTP server--format jsonoutput has clean snake_case keys and raw values