fix(storage_manager): re-raise exception in read_prefetched_results#2737
fix(storage_manager): re-raise exception in read_prefetched_results#2737maobaolong wants to merge 1 commit intoLMCache:devfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug in the read_prefetched_results context manager where exceptions were being swallowed, potentially leading to resource mismanagement. By re-raising the exception, it ensures proper error propagation. The change is sound. I have one suggestion to improve the associated logging for better debuggability.
The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. Signed-off-by: baoloongmao <baoloongmao@tencent.com>
def6348 to
6f66935
Compare
ApostaC
left a comment
There was a problem hiding this comment.
I double-checked the logic here. It's quite complicated due to cuda async execution.
- For the normal path (i.e., no error happening), the
finish_read_prefetchedMUST be called byserver.pyviacupy_stream.launch_host_func, because we don't want the memory objects to be evicted before the CUDA kernels are finished. - For the
retrievefailure path, i.e., memory objects are None or the expected count does not match the requested keys inretrieve, the_retrieve_loopwon't be executed. In this case, we need to have a clear way to define "who should call thefinish_read.
There are two potential designs:
- Always call
finish_read_prefetchedinretrievevialaunch_host_func, no matter whether the retrieve loop runs or not. In this case, thefinallyblock in the context manager can be removed. - Only call the
finished_read_prefetchedinretrievevialaunch_host_funcif everything runs successfully. Otherwise, we rely on the context manager to free the objects.
Personally, I prefer the second design because when there is something wrong in CUDA (i.e., during retrieve loop), the cupy_stream.launch_host_func may have errors as well.
Let me know what you think @maobaolong
| successfully_yielded = True | ||
| except Exception as e: | ||
| except Exception: | ||
| logger.warning( |
There was a problem hiding this comment.
We can use logger.exception here. I think it will also include the traceback.
| "Exception occurred while processing read prefetched results", | ||
| exc_info=True, | ||
| ) | ||
| raise |
There was a problem hiding this comment.
If we raise here, will the finally block below still be executed?
There was a problem hiding this comment.
Yeah, finally block will still be executed. but the caller cannot aware this exception.
The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com>
…lag (#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR #2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>
…lag (LMCache#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: Aaron Wu <aaron.wu@dell.com>
…lag (LMCache#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>
…lag (LMCache#2736) * fix(server): guard finish_read_prefetched behind retrieve_succeeded flag When _retrieve_loop raises an exception inside the with-block, the read_prefetched_results context manager already releases the read locks in its finally clause. The retrieve() finally block was unconditionally scheduling a GPU callback to call finish_read_prefetched, causing a double-free of read locks. Introduce retrieve_succeeded flag that is only set to True when the with-block exits normally. The GPU callback is now conditional on this flag, forming a defense-in-depth pair with the context manager re-raise fix. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix(storage_manager): re-raise exception in read_prefetched_results The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback. Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched. This change was originally in PR LMCache#2737 and is now combined into this PR per reviewer's suggestion to streamline the merging process. Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>
The context manager was swallowing exceptions raised inside the with-block (e.g. TypeError from _retrieve_loop). This caused the caller to continue as if the retrieve succeeded, leading to a double-free of read locks: once by the context manager's finally clause, and again by the caller's GPU callback.
Add 'raise' to propagate the exception so callers correctly enter their except branch and skip the redundant finish_read_prefetched.
What this PR does / why we need it:
The read_prefetched_results context manager catches exceptions raised inside the with block (e.g., TypeError from _retrieve_loop) but does not re-raise them. Per Python's @contextmanager protocol, this causes the exception to be silently swallowed — the caller continues execution as if the with block succeeded normally.
This leads to a read lock double-free: the context manager's finally clause releases the read lock (correctly), but since the caller doesn't know an exception occurred, it proceeds to schedule a GPU callback that calls finish_read_prefetched again, releasing the same lock a second time. For temporary objects, this drives the lock count to zero and triggers premature deletion.
Fix: Add raise after the warning log in the except block so the exception propagates correctly. The caller then enters its own except branch and skips the redundant finish_read_prefetched call.
Special notes for your reviewers:
If applicable: