Skip to content

fix(storage_manager): re-raise exception in read_prefetched_results#2737

Closed
maobaolong wants to merge 1 commit intoLMCache:devfrom
maobaolong:fix/cm-reraise-exception
Closed

fix(storage_manager): re-raise exception in read_prefetched_results#2737
maobaolong wants to merge 1 commit intoLMCache:devfrom
maobaolong:fix/cm-reraise-exception

Conversation

@maobaolong
Copy link
Copy Markdown
Collaborator

The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched.

What this PR does / why we need it:

The read_prefetched_results context manager catches exceptions raised inside the with block (e.g., TypeError from _retrieve_loop) but does not re-raise them. Per Python's @contextmanager protocol, this causes the exception to be silently swallowed — the caller continues execution as if the with block succeeded normally.
This leads to a read lock double-free: the context manager's finally clause releases the read lock (correctly), but since the caller doesn't know an exception occurred, it proceeds to schedule a GPU callback that calls finish_read_prefetched again, releasing the same lock a second time. For temporary objects, this drives the lock count to zero and triggers premature deletion.
Fix: Add raise after the warning log in the except block so the exception propagates correctly. The caller then enters its own except branch and skips the redundant finish_read_prefetched call.

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue in the read_prefetched_results context manager where exceptions raised within its with block were not propagated. This silent suppression of errors caused calling code to proceed as if no error occurred, leading to a double-free of read locks and potential premature deletion of temporary objects. The fix ensures that exceptions are re-raised, allowing callers to correctly handle errors and prevent resource leaks or corruption.

Highlights

  • Exception Handling: Modified the read_prefetched_results context manager to re-raise exceptions, preventing silent swallowing of errors that could lead to resource mismanagement.
  • Resource Management: Addressed a double-free issue of read locks that occurred when exceptions were silently suppressed, ensuring correct resource release behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/v1/distributed/storage_manager.py
    • Added a raise statement within the exception handling block of the read_prefetched_results context manager to ensure exceptions are propagated.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the read_prefetched_results context manager where exceptions were being swallowed, potentially leading to resource mismanagement. By re-raising the exception, it ensures proper error propagation. The change is sound. I have one suggestion to improve the associated logging for better debuggability.

Comment thread lmcache/v1/distributed/storage_manager.py
The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong force-pushed the fix/cm-reraise-exception branch from def6348 to 6f66935 Compare March 11, 2026 08:59
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked the logic here. It's quite complicated due to cuda async execution.

  1. For the normal path (i.e., no error happening), the finish_read_prefetched MUST be called by server.py via cupy_stream.launch_host_func, because we don't want the memory objects to be evicted before the CUDA kernels are finished.
  2. For the retrieve failure path, i.e., memory objects are None or the expected count does not match the requested keys in retrieve, the _retrieve_loop won't be executed. In this case, we need to have a clear way to define "who should call the finish_read.

There are two potential designs:

  1. Always call finish_read_prefetched in retrieve via launch_host_func, no matter whether the retrieve loop runs or not. In this case, the finally block in the context manager can be removed.
  2. Only call the finished_read_prefetched inretrieve via launch_host_func if everything runs successfully. Otherwise, we rely on the context manager to free the objects.

Personally, I prefer the second design because when there is something wrong in CUDA (i.e., during retrieve loop), the cupy_stream.launch_host_func may have errors as well.

Let me know what you think @maobaolong

successfully_yielded = True
except Exception as e:
except Exception:
logger.warning(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use logger.exception here. I think it will also include the traceback.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

"Exception occurred while processing read prefetched results",
exc_info=True,
)
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we raise here, will the finally block below still be executed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, finally block will still be executed. but the caller cannot aware this exception.

@ApostaC
Copy link
Copy Markdown
Contributor

ApostaC commented Mar 11, 2026

Ah, I see #2736. So if we combine this one and #2736 together, it's basically the "second design" in my above description, right?

If that's the case, can we do it in the same PR to streamline the merging process?

@maobaolong
Copy link
Copy Markdown
Collaborator Author

@ApostaC Thanks for the review, i'll close this PR and push a new commit to the #2376

@maobaolong maobaolong closed this Mar 12, 2026
maobaolong added a commit to maobaolong/LMCache that referenced this pull request Mar 12, 2026
The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

This change was originally in PR LMCache#2737 and is now combined into
this PR per reviewer's suggestion to streamline the merging process.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
maobaolong added a commit that referenced this pull request Mar 13, 2026
…lag (#2736)

* fix(server): guard finish_read_prefetched behind retrieve_succeeded flag

When _retrieve_loop raises an exception inside the with-block, the
read_prefetched_results context manager already releases the read
locks in its finally clause. The retrieve() finally block was
unconditionally scheduling a GPU callback to call
finish_read_prefetched, causing a double-free of read locks.

Introduce retrieve_succeeded flag that is only set to True when the
with-block exits normally. The GPU callback is now conditional on
this flag, forming a defense-in-depth pair with the context manager
re-raise fix.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix(storage_manager): re-raise exception in read_prefetched_results

The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

This change was originally in PR #2737 and is now combined into
this PR per reviewer's suggestion to streamline the merging process.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
…lag (LMCache#2736)

* fix(server): guard finish_read_prefetched behind retrieve_succeeded flag

When _retrieve_loop raises an exception inside the with-block, the
read_prefetched_results context manager already releases the read
locks in its finally clause. The retrieve() finally block was
unconditionally scheduling a GPU callback to call
finish_read_prefetched, causing a double-free of read locks.

Introduce retrieve_succeeded flag that is only set to True when the
with-block exits normally. The GPU callback is now conditional on
this flag, forming a defense-in-depth pair with the context manager
re-raise fix.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix(storage_manager): re-raise exception in read_prefetched_results

The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

This change was originally in PR LMCache#2737 and is now combined into
this PR per reviewer's suggestion to streamline the merging process.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…lag (LMCache#2736)

* fix(server): guard finish_read_prefetched behind retrieve_succeeded flag

When _retrieve_loop raises an exception inside the with-block, the
read_prefetched_results context manager already releases the read
locks in its finally clause. The retrieve() finally block was
unconditionally scheduling a GPU callback to call
finish_read_prefetched, causing a double-free of read locks.

Introduce retrieve_succeeded flag that is only set to True when the
with-block exits normally. The GPU callback is now conditional on
this flag, forming a defense-in-depth pair with the context manager
re-raise fix.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix(storage_manager): re-raise exception in read_prefetched_results

The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

This change was originally in PR LMCache#2737 and is now combined into
this PR per reviewer's suggestion to streamline the merging process.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…lag (LMCache#2736)

* fix(server): guard finish_read_prefetched behind retrieve_succeeded flag

When _retrieve_loop raises an exception inside the with-block, the
read_prefetched_results context manager already releases the read
locks in its finally clause. The retrieve() finally block was
unconditionally scheduling a GPU callback to call
finish_read_prefetched, causing a double-free of read locks.

Introduce retrieve_succeeded flag that is only set to True when the
with-block exits normally. The GPU callback is now conditional on
this flag, forming a defense-in-depth pair with the context manager
re-raise fix.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* fix(storage_manager): re-raise exception in read_prefetched_results

The context manager was swallowing exceptions raised inside the
with-block (e.g. TypeError from _retrieve_loop). This caused the
caller to continue as if the retrieve succeeded, leading to a
double-free of read locks: once by the context manager's finally
clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter
their except branch and skip the redundant finish_read_prefetched.

This change was originally in PR LMCache#2737 and is now combined into
this PR per reviewer's suggestion to streamline the merging process.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants