Skip to content

[Refactor] extract virtual address manager from TensorMemoryAllocator and disable LazyMemoryAllocator#2380

Merged
ApostaC merged 7 commits intoLMCache:devfrom
ApostaC:local-dev/allocator-refactor
Jan 13, 2026
Merged

[Refactor] extract virtual address manager from TensorMemoryAllocator and disable LazyMemoryAllocator#2380
ApostaC merged 7 commits intoLMCache:devfrom
ApostaC:local-dev/allocator-refactor

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Jan 10, 2026

What this PR does / why we need it:
Part of #2379

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ApostaC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the memory management system by introducing a new AddressManager class. This class centralizes the handling of virtual address space, abstracting it away from the TensorMemoryAllocator. Concurrently, the LazyMemoryAllocator has been temporarily disabled as part of ongoing refactoring efforts. These changes aim to enhance the clarity, maintainability, and testability of the memory allocation infrastructure.

Highlights

  • Virtual Address Manager Extraction: The core logic for managing virtual address space, including allocation, freeing, and coalescing of memory blocks, has been extracted from TensorMemoryAllocator into a new, dedicated AddressManager class. This improves modularity and separation of concerns.
  • LazyMemoryAllocator Disabled: The LazyMemoryAllocator and its associated components (CompositeBuffer, CompositeTensorMemoryAllocator, AsyncMemoryExpander) have been removed. Attempts to use lazy memory allocation will now result in a RuntimeError, indicating that this feature is currently under refactoring.
  • TensorMemoryAllocator Refactoring: TensorMemoryAllocator has been refactored to delegate its memory management operations to an instance of the new AddressManager. This simplifies TensorMemoryAllocator by focusing it on physical memory interaction rather than virtual address management.
  • Improved Testability and Consistency Checks: Memory management tests have been updated to leverage the memcheck() method, which now uses the AddressManager's consistency checks, ensuring robust validation of the allocator's state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ApostaC ApostaC added the full Run comprehensive tests on this PR label Jan 10, 2026
@ApostaC
Copy link
Copy Markdown
Contributor Author

ApostaC commented Jan 10, 2026

Hey @maobaolong , I will work on reimplementing LazyMemoryAllocator over this weekend.
This PR will first remove the old one, and there will be the new implementation in the next PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the memory management by extracting an AddressManager from TensorMemoryAllocator and disabling the LazyMemoryAllocator. The refactoring improves the code structure by separating virtual address management from the underlying buffer management. The changes are generally good, but I've found a few critical issues that need to be addressed. One is a breaking change in the TensorMemoryAllocator constructor signature, and another is a bug in error handling that will cause a TypeError. I've also pointed out a potential bug in batched_free due to an incorrect assumption about input order, and a minor design issue regarding encapsulation. After these issues are fixed, the PR should be in good shape.

Comment on lines +963 to 967
raise RuntimeError(
"Failed to allocate memory block of size %d "
"because no memory is available",
size,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The raise RuntimeError(...) call is incorrect. It passes the message format string and the argument separately to the RuntimeError constructor, which will result in a TypeError. The string should be formatted before being passed to the exception. Using an f-string for both logging and the exception message would be cleaner and more idiomatic.

            msg = f"Failed to allocate memory block of size {size} because no memory is available"
            logger.debug(msg)
            raise RuntimeError(msg)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread lmcache/v1/memory_management.py Outdated
Comment on lines +1054 to +1058
def __init__(self, tensor: torch.Tensor):
self.buffer = tensor.view(torch.uint8).flatten()

# Use AddressManager for address space management
self.address_manager = AddressManager(self.buffer.numel())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The refactoring of TensorMemoryAllocator's __init__ method has removed the align_bytes parameter. This is a breaking change for callers like GPUMemoryAllocator which pass this parameter. The __init__ method should be updated to accept align_bytes and pass it to the AddressManager to maintain backward compatibility and functionality.

    def __init__(self, tensor: torch.Tensor, align_bytes: int = AddressManager.ALIGN_BYTES):
        self.buffer = tensor.view(torch.uint8).flatten()

        # Use AddressManager for address space management
        self.address_manager = AddressManager(self.buffer.numel(), align_bytes=align_bytes)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed

new_free_blocks = []
num_valid_blocks = 0
total_freed_size = 0
for memory_obj in memory_objs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for coalescing blocks in batched_free assumes that the input memory_objs list is sorted by memory address. However, callers of this function may not guarantee this order, which can lead to missed coalescing opportunities and increased memory fragmentation. To fix this, the memory_objs list should be sorted by address at the beginning of the method.

        memory_objs.sort(key=lambda obj: obj.meta.address)
        for memory_obj in memory_objs:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, advice taken

Comment thread lmcache/v1/memory_management.py Outdated
Comment on lines +1145 to +1147
# Address manager only tracks one allocation, but we have batch_size items
# Adjust the tracking: we allocated once but have batch_size logical items
self.address_manager.num_active_allocations += batch_size - 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Directly manipulating self.address_manager.num_active_allocations from TensorMemoryAllocator breaks the encapsulation of AddressManager. While this correctly adjusts the allocation count for batched operations, it makes the logic harder to maintain as it's split across two classes. Consider adding a method to AddressManager to handle this adjustment, for example increment_active_allocations(n), to keep the state management within the AddressManager class. A similar issue exists in batched_free.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC force-pushed the local-dev/allocator-refactor branch from e4f30de to dede415 Compare January 10, 2026 04:34
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@maobaolong
Copy link
Copy Markdown
Collaborator

@gemini-code-assist Could you help to check is this PR is a refactor only?

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Yes, @maobaolong, this Pull Request is indeed a refactor only.

The title and description clearly state its purpose is to refactor the memory management system by extracting the AddressManager and temporarily disabling the LazyMemoryAllocator for future re-implementation.

The changes in the code, including the introduction of the AddressManager class, the modification of TensorMemoryAllocator to utilize it, the removal of the LazyMemoryAllocator and its associated components, and the corresponding test updates (including moving lazy allocator tests to a skipped directory), all align with a refactoring effort. No new end-user facing features are introduced in this PR.

Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @ApostaC for the helpful work on this!

Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

* 1024**3
),
)
raise RuntimeError("LazyMixedMemoryAllocator being refactored.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

Args:
size: The size to expand the address space.
"""
new_block = FreeBlock(start=self._size, size=size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to assert that size is a multiple of ALIGN? this way we can keep the invariant that every block has an aligned start?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or else we could run into some tricky bugs if this is not enforced since the allocator implicitly relies on this invariant.


# Current implementation: explicit list
self._explicit_list: SortedList[FreeBlock] = SortedList(key=lambda x: x.start)
self._explicit_list.add(FreeBlock(start=0, size=size))
Copy link
Copy Markdown
Contributor

@sammshen sammshen Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a small helpful comment that an invariant of the AllocatorManager is that all starting blocks are aligned by:

  1. initialization and sbrk keep total size aligned
  2. every allocation size is aligned
  3. free() only reinserts blocks whose (address, size) came from allocate()

This could be a helpful shortcut for reading this code in the future

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left two small nits but ok to not address!

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC
Copy link
Copy Markdown
Contributor Author

ApostaC commented Jan 12, 2026

LGTM! Left two small nits but ok to not address!

Thanks for the advice, addressed the concerns!

@ApostaC ApostaC enabled auto-merge (squash) January 12, 2026 22:22
@ApostaC ApostaC disabled auto-merge January 12, 2026 22:22
@github-actions github-actions Bot removed the full Run comprehensive tests on this PR label Jan 12, 2026
@ApostaC ApostaC enabled auto-merge (squash) January 12, 2026 22:22
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Jan 12, 2026
@ApostaC ApostaC merged commit 6f90e22 into LMCache:dev Jan 13, 2026
34 of 35 checks passed
DongDongJu pushed a commit to DongDongJu/LMCache that referenced this pull request Feb 22, 2026
… and disable LazyMemoryAllocator (LMCache#2380)

* Refactor: extract virtual address manager from TensorMemoryAllocator

Signed-off-by: ApostaC <yihua98@uchicago.edu>
sammshen pushed a commit to sammshen/LMCache that referenced this pull request Mar 1, 2026
… and disable LazyMemoryAllocator (LMCache#2380)

* Refactor: extract virtual address manager from TensorMemoryAllocator

Signed-off-by: ApostaC <yihua98@uchicago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
… and disable LazyMemoryAllocator (LMCache#2380)

* Refactor: extract virtual address manager from TensorMemoryAllocator

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants