Skip to content

[Bugfix] Fix memory leak in asynchronous mode#2559

Merged
maobaolong merged 7 commits intoLMCache:devfrom
deng451e:fix_async_mem_leak
Mar 7, 2026
Merged

[Bugfix] Fix memory leak in asynchronous mode#2559
maobaolong merged 7 commits intoLMCache:devfrom
deng451e:fix_async_mem_leak

Conversation

@deng451e
Copy link
Copy Markdown
Collaborator

@deng451e deng451e commented Feb 6, 2026

What this PR does / why we need it

This PR fixes the memory leak identified in PR #2434 in the asynchronous execution path when LMCACHE_LOCAL_CPU is enabled, where pinned memory objects were not released and metric inconsistencies occurred between active_memory_objs_count and local_cpu_hot_cache_count.

Root cause

In synchronous mode, self.lookup_pins directly tracks key–KV pairs and is used for ref-countdown and unpin.

In asynchronous mode, key–KV pairs are stored in self.event_manager. The event was popped in _async_process_tokens_internal before memory release, so the objects skipped the unpin/ref-countdown path and leaked. Additionally, the async code path did not release the retrieved memory_obj after use, unlike the synchronous flow.

Changes introduced in this PR

  • Adds explicit unpin logic in the asynchronous memory release path within lookup_unpin.
  • Introduces a get_event_future method in event_manager to retrieve events without removing them.
  • Adds for memory objects received via P2PBackend and LocalDiskBackend.
  • Adds unpin cleanup for timed-out asynchronous lookups and prefetches/

Special notes for reviewers

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @deng451e, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical memory leak occurring in the asynchronous execution path of the cache, particularly when LMCACHE_LOCAL_CPU is active. The leak stemmed from pinned memory objects not being properly released. The solution involves integrating explicit unpinning logic into the memory cleanup process for asynchronous operations and refining event retrieval to prevent premature removal, thereby ensuring all resources are correctly managed and deallocated.

Highlights

  • Memory Leak Fix: Addressed a memory leak in the asynchronous execution path when LMCACHE_LOCAL_CPU is enabled, specifically concerning unreleased pinned memory objects.
  • Explicit Unpinning Logic: Implemented explicit 'unpin()' calls for memory objects during cleanup in asynchronous operations to ensure proper resource deallocation.
  • Event Management Enhancement: Introduced a new 'get_event_future' function in 'event_manager' to retrieve event futures without removing them from the queue, preventing unintended side effects.
  • Memory Object Pinning: Ensured that hit memory objects are explicitly pinned in the 'p2p_backend' during non-blocking batched retrieval.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/v1/cache_engine.py
    • Removed a redundant 'ref_count_down()' call in the 'retrieve' method.
    • Modified 'cleanup_memory_objs' to iterate over '(key, memory_obj)' pairs and added an explicit 'memory_obj.unpin()' call before 'ref_count_down()'.
    • Extended 'lookup_unpin' to call 'cleanup_memory_objs' when 'async_loading' is enabled, ensuring proper cleanup for asynchronous contexts.
    • Updated '_async_process_tokens_internal' to use the new 'event_manager.get_event_future' method.
  • lmcache/v1/event_manager.py
    • Added a new public method 'get_event_future' which allows retrieving an 'asyncio.Future' event by type and ID without removing it from the 'DONE' status queue.
  • lmcache/v1/storage_backend/p2p_backend.py
    • Added 'hit_mem_obj.pin()' for each 'hit_mem_obj' within 'batched_get_non_blocking' to explicitly pin memory objects that are successfully retrieved.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a memory leak in the asynchronous execution path by adding explicit unpin logic and modifying event handling. However, it introduces critical security vulnerabilities, including a race condition in LMCacheEngine's get_event_future that can lead to double-free and use-after-free issues, and a lack of authentication/encryption in the P2P backend, exposing sensitive KV cache data to unauthenticated remote exfiltration and cache poisoning attacks. Additionally, the changes introduce a new memory leak in the synchronous path within the retrieve method and a fragile condition in lookup_unpin.

Comment thread lmcache/v1/cache_engine.py Outdated
Comment thread lmcache/v1/cache_engine.py Outdated
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@deng451e deng451e requested a review from maobaolong February 8, 2026 03:18
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deng451e Thanks for the fixing! It would be better if you could double check the exception happen or corner case again.

BTW, the DCO failed.

@deng451e
Copy link
Copy Markdown
Collaborator Author

deng451e commented Feb 8, 2026

@deng451e Thanks for the fixing! It would be better if you could double check the exception happen or corner case again.

BTW, the DCO failed.

Will do — I’ll add more async test cases to cover the exception paths and catch any missed corner cases.
The DCO issue has been fixed as well.

@deng451e deng451e closed this Feb 8, 2026
@deng451e deng451e reopened this Feb 8, 2026
sammshen and others added 2 commits February 8, 2026 19:38
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
@maobaolong maobaolong enabled auto-merge (squash) February 9, 2026 01:59
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Feb 9, 2026
@liubj77
Copy link
Copy Markdown
Contributor

liubj77 commented Feb 9, 2026

👍🏻, there was indeed a memory leak during asynchronous loading, and after I cherry-picked the code to 0.3.12, the memory leak seems to be gone. But some warnings appeared when I run in my env, the log like below:

rker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache WARNING: Ref count of MemoryObj 1085276160is negative: -1.Double free occurred somewhere.Setting ref count back to 0 as a hack but please find the bug. (memory_management.py:475:lmcache.v1.memory_management)
(Worker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache WARNING: Pin count of MemoryObj 434110464is negative: -1.Double unpin occurred somewhere.Setting pin count back to 0 as a hack but please find the bug. (memory_management.py:532:lmcache.v1.memory_management)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Unregistered pinned object 140389175277424 from timeout monitoring (pin_monitor.py:89:lmcache.v1.pin_monitor)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache WARNING: Ref count of MemoryObj 1205862400is negative: -1.Double free occurred somewhere.Setting ref count back to 0 as a hack but please find the bug. (memory_management.py:475:lmcache.v1.memory_management)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache WARNING: Pin count of MemoryObj 434110464is negative: -1.Double unpin occurred somewhere.Setting pin count back to 0 as a hack but please find the bug. (memory_management.py:532:lmcache.v1.memory_management)

Please help confirm if my description is accurate.

Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
@deng451e
Copy link
Copy Markdown
Collaborator Author

👍🏻, there was indeed a memory leak during asynchronous loading, and after I cherry-picked the code to 0.3.12, the memory leak seems to be gone. But some warnings appeared when I run in my env, the log like below:

rker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache WARNING: Ref count of MemoryObj 1085276160is negative: -1.Double free occurred somewhere.Setting ref count back to 0 as a hack but please find the bug. (memory_management.py:475:lmcache.v1.memory_management)
(Worker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP0 pid=178611) [2026-02-08 22:28:03,949] LMCache WARNING: Pin count of MemoryObj 434110464is negative: -1.Double unpin occurred somewhere.Setting pin count back to 0 as a hack but please find the bug. (memory_management.py:532:lmcache.v1.memory_management)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Unregistered pinned object 140389175277424 from timeout monitoring (pin_monitor.py:89:lmcache.v1.pin_monitor)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache WARNING: Ref count of MemoryObj 1205862400is negative: -1.Double free occurred somewhere.Setting ref count back to 0 as a hack but please find the bug. (memory_management.py:475:lmcache.v1.memory_management)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache DEBUG: Releasing memory object for lookup_id=cmpl-c26a732a-b438-441f-baf1-0e178bf94fda-0-a50b61d1 (cache_engine.py:1148:lmcache.v1.cache_engine)
(Worker_TP1 pid=178612) [2026-02-08 22:28:03,950] LMCache WARNING: Pin count of MemoryObj 434110464is negative: -1.Double unpin occurred somewhere.Setting pin count back to 0 as a hack but please find the bug. (memory_management.py:532:lmcache.v1.memory_management)

Please help confirm if my description is accurate.

Thank you for reporting this issue. Similar logs were also observed in CI when running tests with async.yaml.

The root cause is that memory_obj was not pinned in host memory when data was retrieved from disk via batched_get_non_blocking in LocalDiskBackend under asynchronous mode, while an explicit unpin operation was subsequently invoked.

The latest commit adds the missing pin logic in LocalDiskBackend, which should resolve the warning.

@maobaolong maobaolong merged commit e2adfbf into LMCache:dev Mar 7, 2026
23 of 28 checks passed
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
* fix memory leak bug for async mode

Signed-off-by: deng451e <838677410@qq.com>

* Fix async pinned memory unpin

Signed-off-by: deng451e <838677410@qq.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
* fix memory leak bug for async mode

Signed-off-by: deng451e <838677410@qq.com>

* Fix async pinned memory unpin

Signed-off-by: deng451e <838677410@qq.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
* fix memory leak bug for async mode

Signed-off-by: deng451e <838677410@qq.com>

* Fix async pinned memory unpin

Signed-off-by: deng451e <838677410@qq.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* fix memory leak bug for async mode

Signed-off-by: deng451e <838677410@qq.com>

* Fix async pinned memory unpin

Signed-off-by: deng451e <838677410@qq.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* fix memory leak bug for async mode

Signed-off-by: deng451e <838677410@qq.com>

* Fix async pinned memory unpin

Signed-off-by: deng451e <838677410@qq.com>

---------

Signed-off-by: deng451e <838677410@qq.com>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants