[Refactor] extract virtual address manager from TensorMemoryAllocator and disable LazyMemoryAllocator by ApostaC · Pull Request #2380 · LMCache/LMCache

ApostaC · 2026-01-10T01:59:55Z

What this PR does / why we need it:
Part of #2379

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist · 2026-01-10T02:00:12Z

Summary of Changes

Hello @ApostaC, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the memory management system by introducing a new AddressManager class. This class centralizes the handling of virtual address space, abstracting it away from the TensorMemoryAllocator. Concurrently, the LazyMemoryAllocator has been temporarily disabled as part of ongoing refactoring efforts. These changes aim to enhance the clarity, maintainability, and testability of the memory allocation infrastructure.

Highlights

Virtual Address Manager Extraction: The core logic for managing virtual address space, including allocation, freeing, and coalescing of memory blocks, has been extracted from TensorMemoryAllocator into a new, dedicated AddressManager class. This improves modularity and separation of concerns.
LazyMemoryAllocator Disabled: The LazyMemoryAllocator and its associated components (CompositeBuffer, CompositeTensorMemoryAllocator, AsyncMemoryExpander) have been removed. Attempts to use lazy memory allocation will now result in a RuntimeError, indicating that this feature is currently under refactoring.
TensorMemoryAllocator Refactoring: TensorMemoryAllocator has been refactored to delegate its memory management operations to an instance of the new AddressManager. This simplifies TensorMemoryAllocator by focusing it on physical memory interaction rather than virtual address management.
Improved Testability and Consistency Checks: Memory management tests have been updated to leverage the memcheck() method, which now uses the AddressManager's consistency checks, ensuring robust validation of the allocator's state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ApostaC · 2026-01-10T02:00:57Z

Hey @maobaolong , I will work on reimplementing LazyMemoryAllocator over this weekend.
This PR will first remove the old one, and there will be the new implementation in the next PR.

gemini-code-assist

Code Review

This pull request refactors the memory management by extracting an AddressManager from TensorMemoryAllocator and disabling the LazyMemoryAllocator. The refactoring improves the code structure by separating virtual address management from the underlying buffer management. The changes are generally good, but I've found a few critical issues that need to be addressed. One is a breaking change in the TensorMemoryAllocator constructor signature, and another is a bug in error handling that will cause a TypeError. I've also pointed out a potential bug in batched_free due to an incorrect assumption about input order, and a minor design issue regarding encapsulation. After these issues are fixed, the PR should be in good shape.

gemini-code-assist · 2026-01-10T02:02:00Z

+            raise RuntimeError(
+                "Failed to allocate memory block of size %d "
+                "because no memory is available",
+                size,
            )


The raise RuntimeError(...) call is incorrect. It passes the message format string and the argument separately to the RuntimeError constructor, which will result in a TypeError. The string should be formatted before being passed to the exception. Using an f-string for both logging and the exception message would be cleaner and more idiomatic.

msg = f"Failed to allocate memory block of size {size} because no memory is available" logger.debug(msg) raise RuntimeError(msg)

gemini-code-assist · 2026-01-10T02:02:00Z

+    def __init__(self, tensor: torch.Tensor):
+        self.buffer = tensor.view(torch.uint8).flatten()
+
+        # Use AddressManager for address space management
+        self.address_manager = AddressManager(self.buffer.numel())


The refactoring of TensorMemoryAllocator's __init__ method has removed the align_bytes parameter. This is a breaking change for callers like GPUMemoryAllocator which pass this parameter. The __init__ method should be updated to accept align_bytes and pass it to the AddressManager to maintain backward compatibility and functionality.

def __init__(self, tensor: torch.Tensor, align_bytes: int = AddressManager.ALIGN_BYTES): self.buffer = tensor.view(torch.uint8).flatten() # Use AddressManager for address space management self.address_manager = AddressManager(self.buffer.numel(), align_bytes=align_bytes)

good catch, fixed

gemini-code-assist · 2026-01-10T02:02:00Z

-        new_free_blocks = []
-        num_valid_blocks = 0
-        total_freed_size = 0
        for memory_obj in memory_objs:


The logic for coalescing blocks in batched_free assumes that the input memory_objs list is sorted by memory address. However, callers of this function may not guarantee this order, which can lead to missed coalescing opportunities and increased memory fragmentation. To fix this, the memory_objs list should be sorted by address at the beginning of the method.

memory_objs.sort(key=lambda obj: obj.meta.address) for memory_obj in memory_objs:

sure, advice taken

gemini-code-assist · 2026-01-10T02:02:00Z

+        # Address manager only tracks one allocation, but we have batch_size items
+        # Adjust the tracking: we allocated once but have batch_size logical items
+        self.address_manager.num_active_allocations += batch_size - 1


Directly manipulating self.address_manager.num_active_allocations from TensorMemoryAllocator breaks the encapsulation of AddressManager. While this correctly adjusts the allocation count for batched operations, it makes the logic harder to maintain as it's split across two classes. Consider adding a method to AddressManager to handle this adjustment, for example increment_active_allocations(n), to keep the state management within the AddressManager class. A similar issue exists in batched_free.

Signed-off-by: ApostaC <yihua98@uchicago.edu>

maobaolong · 2026-01-12T00:55:24Z

@gemini-code-assist Could you help to check is this PR is a refactor only?

gemini-code-assist · 2026-01-12T00:55:37Z

Yes, @maobaolong, this Pull Request is indeed a refactor only.

The title and description clearly state its purpose is to refactor the memory management system by extracting the AddressManager and temporarily disabling the LazyMemoryAllocator for future re-implementation.

The changes in the code, including the introduction of the AddressManager class, the modification of TensorMemoryAllocator to utilize it, the removal of the LazyMemoryAllocator and its associated components, and the corresponding test updates (including moving lazy allocator tests to a skipped directory), all align with a refactoring effort. No new end-user facing features are introduced in this PR.

maobaolong

LGTM, Thanks @ApostaC for the helpful work on this!

maobaolong

+1

sammshen · 2026-01-12T21:44:47Z

-                        * 1024**3
-                    ),
-                )
+                raise RuntimeError("LazyMixedMemoryAllocator being refactored.")


sammshen · 2026-01-12T21:50:26Z

+        Args:
+            size: The size to expand the address space.
+        """
+        new_block = FreeBlock(start=self._size, size=size)


do we need to assert that size is a multiple of ALIGN? this way we can keep the invariant that every block has an aligned start?

or else we could run into some tricky bugs if this is not enforced since the allocator implicitly relies on this invariant.

sammshen · 2026-01-12T21:52:49Z

+
+        # Current implementation: explicit list
+        self._explicit_list: SortedList[FreeBlock] = SortedList(key=lambda x: x.start)
+        self._explicit_list.add(FreeBlock(start=0, size=size))


nit: can we add a small helpful comment that an invariant of the AllocatorManager is that all starting blocks are aligned by:

initialization and sbrk keep total size aligned

every allocation size is aligned

free() only reinserts blocks whose (address, size) came from allocate()

This could be a helpful shortcut for reading this code in the future

sammshen

LGTM! Left two small nits but ok to not address!

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC · 2026-01-12T22:22:12Z

LGTM! Left two small nits but ok to not address!

Thanks for the advice, addressed the concerns!

… and disable LazyMemoryAllocator (LMCache#2380) * Refactor: extract virtual address manager from TensorMemoryAllocator Signed-off-by: ApostaC <yihua98@uchicago.edu>

… and disable LazyMemoryAllocator (LMCache#2380) * Refactor: extract virtual address manager from TensorMemoryAllocator Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

Refactor: extract virtual address manager from TensorMemoryAllocator

8269f06

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC requested review from maobaolong and sammshen January 10, 2026 02:00

ApostaC added the full Run comprehensive tests on this PR label Jan 10, 2026

gemini-code-assist Bot reviewed Jan 10, 2026

View reviewed changes

ApostaC added 3 commits January 10, 2026 04:22

fix problems

19784af

Signed-off-by: ApostaC <yihua98@uchicago.edu>

move num_active_allocations counter

8c53ccb

Signed-off-by: ApostaC <yihua98@uchicago.edu>

skip tests in github ci

dede415

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC force-pushed the local-dev/allocator-refactor branch from e4f30de to dede415 Compare January 10, 2026 04:34

ApostaC added 2 commits January 10, 2026 05:06

add thread safe guarantee

de6abe8

Signed-off-by: ApostaC <yihua98@uchicago.edu>

remove other files

3fa4131

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC mentioned this pull request Jan 11, 2026

[Hardware] Enable Intel Gaudi (HPU) support #1066

Closed

maobaolong reviewed Jan 12, 2026

View reviewed changes

maobaolong approved these changes Jan 12, 2026

View reviewed changes

sammshen reviewed Jan 12, 2026

View reviewed changes

sammshen approved these changes Jan 12, 2026

View reviewed changes

add more details

80f1817

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC enabled auto-merge (squash) January 12, 2026 22:22

ApostaC disabled auto-merge January 12, 2026 22:22

github-actions Bot removed the full Run comprehensive tests on this PR label Jan 12, 2026

ApostaC enabled auto-merge (squash) January 12, 2026 22:22

github-actions Bot added the full Run comprehensive tests on this PR label Jan 12, 2026

ApostaC merged commit 6f90e22 into LMCache:dev Jan 13, 2026
34 of 35 checks passed

ApostaC mentioned this pull request Jan 13, 2026

[Core] Reimplement lazy memory allcator #2379

Closed

7 tasks

Conversation

ApostaC commented Jan 10, 2026

Uh oh!

gemini-code-assist Bot commented Jan 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

ApostaC commented Jan 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong commented Jan 12, 2026

Uh oh!

gemini-code-assist Bot commented Jan 12, 2026

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

sammshen Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sammshen Jan 12, 2026 •

edited

Loading

sammshen left a comment •

edited

Loading