fix(storage_manager): re-raise exception in read_prefetched_results by maobaolong · Pull Request #2737 · LMCache/LMCache

maobaolong · 2026-03-11T07:16:21Z

The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback.

Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched.

What this PR does / why we need it:

The read_prefetched_results context manager catches exceptions raised inside the with block (e.g., TypeError from _retrieve_loop) but does not re-raise them. Per Python's @contextmanager protocol, this causes the exception to be silently swallowed — the caller continues execution as if the with block succeeded normally.
This leads to a read lock double-free: the context manager's finally clause releases the read lock (correctly), but since the caller doesn't know an exception occurred, it proceeds to schedule a GPU callback that calls finish_read_prefetched again, releasing the same lock a second time. For temporary objects, this drives the lock count to zero and triggers premature deletion.
Fix: Add raise after the warning log in the except block so the exception propagates correctly. The caller then enters its own except branch and skips the redundant finish_read_prefetched call.

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

gemini-code-assist · 2026-03-11T07:16:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue in the read_prefetched_results context manager where exceptions raised within its with block were not propagated. This silent suppression of errors caused calling code to proceed as if no error occurred, leading to a double-free of read locks and potential premature deletion of temporary objects. The fix ensures that exceptions are re-raised, allowing callers to correctly handle errors and prevent resource leaks or corruption.

Highlights

Exception Handling: Modified the read_prefetched_results context manager to re-raise exceptions, preventing silent swallowing of errors that could lead to resource mismanagement.
Resource Management: Addressed a double-free issue of read locks that occurred when exceptions were silently suppressed, ensuring correct resource release behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/v1/distributed/storage_manager.py
- Added a raise statement within the exception handling block of the read_prefetched_results context manager to ensure exceptions are propagated.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly fixes a bug in the read_prefetched_results context manager where exceptions were being swallowed, potentially leading to resource mismanagement. By re-raising the exception, it ensures proper error propagation. The change is sound. I have one suggestion to improve the associated logging for better debuggability.

The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ApostaC

I double-checked the logic here. It's quite complicated due to cuda async execution.

For the normal path (i.e., no error happening), the finish_read_prefetched MUST be called by server.py via cupy_stream.launch_host_func, because we don't want the memory objects to be evicted before the CUDA kernels are finished.
For the retrieve failure path, i.e., memory objects are None or the expected count does not match the requested keys in retrieve, the _retrieve_loop won't be executed. In this case, we need to have a clear way to define "who should call the finish_read.

There are two potential designs:

Always call finish_read_prefetched in retrieve via launch_host_func, no matter whether the retrieve loop runs or not. In this case, the finally block in the context manager can be removed.
Only call the finished_read_prefetched inretrieve via launch_host_func if everything runs successfully. Otherwise, we rely on the context manager to free the objects.

Personally, I prefer the second design because when there is something wrong in CUDA (i.e., during retrieve loop), the cupy_stream.launch_host_func may have errors as well.

Let me know what you think @maobaolong

ApostaC · 2026-03-11T19:13:44Z

            successfully_yielded = True
-        except Exception as e:
+        except Exception:
            logger.warning(


We can use logger.exception here. I think it will also include the traceback.

ApostaC · 2026-03-11T19:14:52Z

+                "Exception occurred while processing read prefetched results",
+                exc_info=True,
            )
+            raise


If we raise here, will the finally block below still be executed?

Yeah, finally block will still be executed. but the caller cannot aware this exception.

ApostaC · 2026-03-11T19:44:33Z

Ah, I see #2736. So if we combine this one and #2736 together, it's basically the "second design" in my above description, right?

If that's the case, can we do it in the same PR to streamline the merging process?

maobaolong · 2026-03-12T07:05:06Z

@ApostaC Thanks for the review, i'll close this PR and push a new commit to the #2376

The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…lag (#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR #2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…lag (LMCache#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…lag (LMCache#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread lmcache/v1/distributed/storage_manager.py

maobaolong mentioned this pull request Mar 11, 2026

fix(server): guard finish_read_prefetched behind retrieve_succeeded flag #2736

Merged

2 tasks

maobaolong force-pushed the fix/cm-reraise-exception branch from def6348 to 6f66935 Compare March 11, 2026 08:59

ApostaC reviewed Mar 11, 2026

View reviewed changes

maobaolong closed this Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(storage_manager): re-raise exception in read_prefetched_results#2737

fix(storage_manager): re-raise exception in read_prefetched_results#2737
maobaolong wants to merge 1 commit intoLMCache:devfrom
maobaolong:fix/cm-reraise-exception

maobaolong commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ApostaC left a comment

Uh oh!

ApostaC Mar 11, 2026

Uh oh!

maobaolong Mar 12, 2026

Uh oh!

ApostaC Mar 11, 2026

Uh oh!

maobaolong Mar 12, 2026

Uh oh!

ApostaC commented Mar 11, 2026 •

edited

Loading

Uh oh!

maobaolong commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maobaolong commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maobaolong commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ApostaC commented Mar 11, 2026 •

edited

Loading