[MP][Telemetry] Hot-fix to enable the telemetry logging for store by ApostaC · Pull Request #2707 · LMCache/LMCache

ApostaC · 2026-03-06T23:14:49Z

What this PR does / why we need it:

Re-enables telemetry logging for store operations in the CuPy stream, which was previously disabled due to a hanging issue in certain environments.

The fix introduces a _can_log_store flag that gates store telemetry logging until after the first retrieve operation completes. This ensures a retrieve operation is always logged in the CuPy stream before any store operation, which avoids the hanging behavior.

Special notes for your reviewers:

The root cause of the hang appears to be related to logging a store telemetry event in the CuPy stream before any retrieve event has been logged. The flag-based ordering constraint is a minimal fix that preserves the telemetry data while avoiding the problematic code path.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist · 2026-03-06T23:15:01Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where telemetry logging for store operations in the CuPy stream caused the process to hang in certain environments. The changes re-enable this essential telemetry by introducing a safeguard that ensures a retrieve operation is always logged before any store operation, thereby preventing the hanging behavior and allowing for continued data collection.

Highlights

Telemetry Re-enablement: Re-enabled telemetry logging for store operations in the CuPy stream, which was previously disabled due to a hanging issue.
Conditional Logging Mechanism: Introduced a _can_log_store flag to gate store telemetry logging, ensuring it only occurs after the first retrieve operation completes to prevent hangs.
Root Cause Mitigation: Implemented a minimal fix to preserve telemetry data by enforcing an ordering constraint, avoiding the problematic code path where store telemetry logging before a retrieve event caused process hangs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/v1/multiprocess/server.py
- Initialized a new instance variable _can_log_store to False in the __init__ method.
- Modified the store method to conditionally log telemetry events (start and end) only when _can_log_store is True.
- Updated the _retrieve_loop method to set _can_log_store to True after a successful retrieve operation.

Activity

No specific activity (comments, reviews, etc.) was provided in the context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist

Code Review

This pull request re-enables telemetry logging for store operations by introducing a _can_log_store flag, using a sensible workaround of waiting for the first retrieve operation to enable store logging to address a hanging issue. However, a critical security concern has been identified: the implementation relies on unvalidated client-provided identifiers (instance_id) to access sensitive GPU contexts, exposing the system to unauthorized access and potential data leakage between different client processes. Additionally, consider improving the comment for the new _can_log_store flag for clarity and cleaning up outdated comments in the store method regarding disabled telemetry.

gemini-code-assist · 2026-03-06T23:18:08Z

+                    "store",
+                    key.request_id,
+                    stored_count=len(reserved_dict),
+                    device=str(gpu_context.device),


The store method uses a client-provided instance_id to access GPU contexts from the self.gpu_contexts dictionary without any authentication or authorization checks. This allows a malicious client to perform operations or trigger telemetry events on a GPU context belonging to another process. This is particularly risky as the server is exposed via ZMQ over TCP without any built-in security, making it vulnerable to Insecure Direct Object Reference (IDOR) attacks.

gemini-code-assist · 2026-03-06T23:18:09Z

+        # FIX: fix the problem of telemetry logging in cupy stream
+        # Need to log a retrieve operation before logging any store
+        # operation in the cupy stream


This comment is a bit informal and could be more descriptive for future maintainers. It's good practice to explain the 'why' behind a workaround. Consider replacing it with a more detailed explanation of the underlying issue and how this flag resolves it.

Suggested change

# FIX: fix the problem of telemetry logging in cupy stream

# Need to log a retrieve operation before logging any store

# operation in the cupy stream

# Workaround for a hanging issue where logging a 'store' telemetry event

# before any 'retrieve' event can cause the process to hang. This flag

# delays logging 'store' events until after the first 'retrieve' operation.

KuntaiDu

LGTM

sammshen

LGTM

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu>

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu>

[MP][Telemetry] Hot-fix to enable the telemetry logging for store

aafc2eb

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Remove old notes

97c1b7d

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC added the full Run comprehensive tests on this PR label Mar 6, 2026

YaoJiayi self-requested a review March 6, 2026 23:17

YaoJiayi approved these changes Mar 6, 2026

View reviewed changes

gemini-code-assist Bot reviewed Mar 6, 2026

View reviewed changes

KuntaiDu approved these changes Mar 6, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 6, 2026 23:20

sammshen approved these changes Mar 6, 2026

View reviewed changes

ApostaC merged commit ed7b386 into LMCache:dev Mar 7, 2026
25 of 29 checks passed

mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026

[MP][Telemetry] Hot-fix to enable the telemetry logging for store (LM…

d94a6c4

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu>

shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026

[MP][Telemetry] Hot-fix to enable the telemetry logging for store (LM…

5822642

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026

[MP][Telemetry] Hot-fix to enable the telemetry logging for store (LM…

7485bbf

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Telemetry] Hot-fix to enable the telemetry logging for store (LM…

f9a2d22

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Telemetry] Hot-fix to enable the telemetry logging for store (LM…

479e3de

…Cache#2707) Signed-off-by: ApostaC <yihua98@uchicago.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MP][Telemetry] Hot-fix to enable the telemetry logging for store#2707

[MP][Telemetry] Hot-fix to enable the telemetry logging for store#2707
ApostaC merged 2 commits intoLMCache:devfrom
ApostaC:local-dev/mp-telemetry-2

ApostaC commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Uh oh!

KuntaiDu left a comment

Uh oh!

sammshen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ApostaC commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants