[MP][Debuggability] Introduce status report subsystem for MP-mode by ApostaC · Pull Request #2699 · LMCache/LMCache

ApostaC · 2026-03-05T21:41:12Z

What this PR does / why we need it:

Adds a composable report_status() -> dict interface across all MP-mode components for production debugging and introspection. Currently the only introspection tools are memcheck() (returns bool) and debug() (returns "OK"), which are insufficient for diagnosing issues in the multi-tier storage pipeline.

Each component implements report_status() returning a dict with is_healthy: bool plus component-specific metrics. Parents aggregate children's reports as nested dicts, with health propagating upward (any unhealthy child → parent unhealthy).

Components instrumented (bottom-up):

L1Manager: object counts, lock counts (write/read/temporary), memory usage, TTL config
L2 adapters: stored object count, locked keys, capacity (abstract method on interface; MockL2Adapter implemented)
StoreController: thread alive, pending keys, in-flight task count (via shadow counters — no new locks on critical path)
PrefetchController: thread alive, submission/pending/in-flight/completed queue sizes, phase breakdown (lookup vs load)
EvictionController: thread alive, policy config
StorageManager: aggregates all children
MPCacheEngine / BlendEngine: engine type, chunk size, hash algorithm, GPU contexts, active sessions + storage manager subtree

New HTTP endpoint: GET /api/status returns the full JSON status tree.

New CLI tool: python -m lmcache.tools.mp_status_viewer [--url URL] [--json] fetches and pretty-prints the status.

Leaf helpers: TokenHasher.hash_algorithm_name, SessionManager.active_count()

Usage example:

# Start LMCache MP mode with http server (default http port 8000)
python3 -m lmcache.v1.multiprocess.http_server --l1-size 70 --eviction-policy LRU 

# Pretty-printed view (default)
python -m lmcache.tools.mp_status_viewer

# Raw JSON (for scripting / monitoring)
python -m lmcache.tools.mp_status_viewer --json | jq '.storage_manager.l1_manager'

# Custom endpoint
python -m lmcache.tools.mp_status_viewer --url http://my-host:9000/api/status

# Or just curl the endpoint directly
curl -s localhost:8000/api/status | jq

Screenshot

Special notes for your reviewers:

Shadow counters in StoreController and PrefetchController are updated in the background loop thread only — no new locks on the controller critical path. Existing lightweight locks (submission queue, results queue, listener) are reused for the few fields that need them.
L1Manager's report_status() uses the existing @l1_mgr_synchronized decorator and iterates _objects checking TTLLock.is_locked() per entry. This is O(n) but the endpoint is called infrequently (debug use).
The CLI viewer uses only stdlib (urllib.request, json) — no requests or rich dependency.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist · 2026-03-05T21:42:37Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive status reporting subsystem for MP-mode components, enhancing debuggability and introspection. It provides a structured way to monitor the health and performance of various components through a new HTTP endpoint and a CLI tool, facilitating easier diagnosis of issues in the multi-tier storage pipeline.

Highlights

Status Reporting Interface: Introduced a composable report_status() -> dict interface across all MP-mode components for production debugging and introspection.
Instrumentation: Instrumented key components like L1Manager, L2 adapters, StoreController, PrefetchController, EvictionController, and StorageManager to provide detailed status reports.
New HTTP Endpoint and CLI Tool: Added a new HTTP endpoint (GET /api/status) to return the full JSON status tree and a CLI tool (python -m lmcache.tools.mp_status_viewer) for fetching and displaying the status.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

Instrumentation
- L1Manager
- L2 adapters
- StoreController
- PrefetchController
- EvictionController
- StorageManager
- MPCacheEngine / BlendEngine
New Features
- Introduced report_status() interface for MP-mode components
- Added HTTP endpoint /api/status
- Created CLI tool lmcache.tools.mp_status_viewer

Activity

Implemented report_status() methods in various components
Created a new HTTP endpoint to expose the status information
Developed a CLI tool for easy status viewing
Added unit tests for the new functionality

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive status reporting subsystem, a valuable addition for debuggability and introspection of the multi-process cache components. The implementation is well-designed, utilizing a composable report_status() interface and shadow counters, and the changes are well-tested. However, a critical security concern has been identified: the new /api/status endpoint exposes sensitive internal state without any authentication or authorization, which could lead to information exposure or potential denial of service if the service is exposed to untrusted networks. Additionally, there is a minor suggestion to improve the robustness of the mp_status_viewer CLI tool.

ApostaC · 2026-03-05T21:48:19Z

TODO:

Status viewer tool have a UI
Combine it with the telemetry viewer

ApostaC · 2026-03-05T21:56:18Z

@maobaolong Please feel free to take a look at this and leave your thoughts. It's the health check functionality for multi-process mode.

sammshen · 2026-03-06T23:15:27Z

+        status["cb_registered_gpu_ids"] = list(self._cb_gpu_contexts.keys())
+        status["cb_gpu_context_meta"] = {
+            str(gpu_id): {"model_name": meta[0], "world_size": meta[1]}
+            for gpu_id, meta in self._cb_gpu_context_meta.items()


if you modify the gpu context here you probably need to lock it up?

sammshen · 2026-03-06T23:16:26Z

+            "registered_gpu_ids": list(self.gpu_contexts.keys()),
+            "gpu_context_meta": {
+                str(gpu_id): {"model_name": meta[0], "world_size": meta[1]}
+                for gpu_id, meta in self.gpu_context_meta.items()


same comment here, gpu_context needs protection?

@ApostaC is this lock not needed? like a CacheContext lock

sammshen

lgtm, just a small lock comment

maobaolong

@ApostaC This looks great!

Just suggest to rename and set http server as default behavior, otherwise, LGTM.

Never mind to merge this PR first. A bundle of features base on this will come soon.

…Cache#2699) * [Add] status report * [Add] tool to report status Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

…Cache#2699) * [Add] status report * [Add] tool to report status Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…Cache#2699) * [Add] status report * [Add] tool to report status Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC added 4 commits March 5, 2026 04:58

[Add] status report

f6ed4c4

Signed-off-by: ApostaC <yihua98@uchicago.edu>

[Add] tool to report status

a045b59

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Merge branch 'dev' into local-dev/mp-state-report

188c85e

[add] test report status

cc2c01b

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread lmcache/v1/multiprocess/http_server.py

Comment thread lmcache/tools/mp_status_viewer/__main__.py

ApostaC added 2 commits March 5, 2026 18:58

Merge branch 'dev' into local-dev/mp-state-report

32cdaaa

Merge branch 'dev' into local-dev/mp-state-report

4042f72

ApostaC requested review from KuntaiDu and sammshen March 6, 2026 21:25

sammshen reviewed Mar 6, 2026

View reviewed changes

sammshen approved these changes Mar 6, 2026

View reviewed changes

maobaolong approved these changes Mar 9, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 9, 2026 19:13

github-actions Bot added the full Run comprehensive tests on this PR label Mar 9, 2026

ApostaC merged commit 98c337d into LMCache:dev Mar 9, 2026
35 of 38 checks passed

ApostaC mentioned this pull request Mar 9, 2026

[MP][Hotfix] add default implementation for report_status #2723

Merged

2 tasks

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Debuggability] Introduce status report subsystem for MP-mode (LM…

566133c

…Cache#2699) * [Add] status report * [Add] tool to report status Signed-off-by: ApostaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Debuggability] Introduce status report subsystem for MP-mode (LM…

41578a3

…Cache#2699) * [Add] status report * [Add] tool to report status Signed-off-by: ApostaC <yihua98@uchicago.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MP][Debuggability] Introduce status report subsystem for MP-mode#2699

[MP][Debuggability] Introduce status report subsystem for MP-mode#2699
ApostaC merged 6 commits intoLMCache:devfrom
ApostaC:local-dev/mp-state-report

ApostaC commented Mar 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

ApostaC commented Mar 5, 2026

Uh oh!

ApostaC commented Mar 5, 2026

Uh oh!

sammshen Mar 6, 2026

Uh oh!

sammshen Mar 6, 2026

Uh oh!

sammshen Mar 9, 2026

Uh oh!

sammshen left a comment

Uh oh!

maobaolong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ApostaC commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ApostaC commented Mar 5, 2026

Uh oh!

ApostaC commented Mar 5, 2026

Uh oh!

sammshen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ApostaC commented Mar 5, 2026 •

edited

Loading