[Refactor] extract virtual address manager from TensorMemoryAllocator and disable LazyMemoryAllocator#2380
Conversation
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Summary of ChangesHello @ApostaC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the memory management system by introducing a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Hey @maobaolong , I will work on reimplementing LazyMemoryAllocator over this weekend. |
There was a problem hiding this comment.
Code Review
This pull request refactors the memory management by extracting an AddressManager from TensorMemoryAllocator and disabling the LazyMemoryAllocator. The refactoring improves the code structure by separating virtual address management from the underlying buffer management. The changes are generally good, but I've found a few critical issues that need to be addressed. One is a breaking change in the TensorMemoryAllocator constructor signature, and another is a bug in error handling that will cause a TypeError. I've also pointed out a potential bug in batched_free due to an incorrect assumption about input order, and a minor design issue regarding encapsulation. After these issues are fixed, the PR should be in good shape.
| raise RuntimeError( | ||
| "Failed to allocate memory block of size %d " | ||
| "because no memory is available", | ||
| size, | ||
| ) |
There was a problem hiding this comment.
The raise RuntimeError(...) call is incorrect. It passes the message format string and the argument separately to the RuntimeError constructor, which will result in a TypeError. The string should be formatted before being passed to the exception. Using an f-string for both logging and the exception message would be cleaner and more idiomatic.
msg = f"Failed to allocate memory block of size {size} because no memory is available"
logger.debug(msg)
raise RuntimeError(msg)| def __init__(self, tensor: torch.Tensor): | ||
| self.buffer = tensor.view(torch.uint8).flatten() | ||
|
|
||
| # Use AddressManager for address space management | ||
| self.address_manager = AddressManager(self.buffer.numel()) |
There was a problem hiding this comment.
The refactoring of TensorMemoryAllocator's __init__ method has removed the align_bytes parameter. This is a breaking change for callers like GPUMemoryAllocator which pass this parameter. The __init__ method should be updated to accept align_bytes and pass it to the AddressManager to maintain backward compatibility and functionality.
def __init__(self, tensor: torch.Tensor, align_bytes: int = AddressManager.ALIGN_BYTES):
self.buffer = tensor.view(torch.uint8).flatten()
# Use AddressManager for address space management
self.address_manager = AddressManager(self.buffer.numel(), align_bytes=align_bytes)| new_free_blocks = [] | ||
| num_valid_blocks = 0 | ||
| total_freed_size = 0 | ||
| for memory_obj in memory_objs: |
There was a problem hiding this comment.
The logic for coalescing blocks in batched_free assumes that the input memory_objs list is sorted by memory address. However, callers of this function may not guarantee this order, which can lead to missed coalescing opportunities and increased memory fragmentation. To fix this, the memory_objs list should be sorted by address at the beginning of the method.
memory_objs.sort(key=lambda obj: obj.meta.address)
for memory_obj in memory_objs:| # Address manager only tracks one allocation, but we have batch_size items | ||
| # Adjust the tracking: we allocated once but have batch_size logical items | ||
| self.address_manager.num_active_allocations += batch_size - 1 |
There was a problem hiding this comment.
Directly manipulating self.address_manager.num_active_allocations from TensorMemoryAllocator breaks the encapsulation of AddressManager. While this correctly adjusts the allocation count for batched operations, it makes the logic harder to maintain as it's split across two classes. Consider adding a method to AddressManager to handle this adjustment, for example increment_active_allocations(n), to keep the state management within the AddressManager class. A similar issue exists in batched_free.
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
e4f30de to
dede415
Compare
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
@gemini-code-assist Could you help to check is this PR is a refactor only? |
|
Yes, @maobaolong, this Pull Request is indeed a refactor only. The title and description clearly state its purpose is to refactor the memory management system by extracting the The changes in the code, including the introduction of the |
maobaolong
left a comment
There was a problem hiding this comment.
LGTM, Thanks @ApostaC for the helpful work on this!
| * 1024**3 | ||
| ), | ||
| ) | ||
| raise RuntimeError("LazyMixedMemoryAllocator being refactored.") |
| Args: | ||
| size: The size to expand the address space. | ||
| """ | ||
| new_block = FreeBlock(start=self._size, size=size) |
There was a problem hiding this comment.
do we need to assert that size is a multiple of ALIGN? this way we can keep the invariant that every block has an aligned start?
There was a problem hiding this comment.
or else we could run into some tricky bugs if this is not enforced since the allocator implicitly relies on this invariant.
|
|
||
| # Current implementation: explicit list | ||
| self._explicit_list: SortedList[FreeBlock] = SortedList(key=lambda x: x.start) | ||
| self._explicit_list.add(FreeBlock(start=0, size=size)) |
There was a problem hiding this comment.
nit: can we add a small helpful comment that an invariant of the AllocatorManager is that all starting blocks are aligned by:
- initialization and sbrk keep total size aligned
- every allocation size is aligned
- free() only reinserts blocks whose (address, size) came from allocate()
This could be a helpful shortcut for reading this code in the future
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Thanks for the advice, addressed the concerns! |
… and disable LazyMemoryAllocator (LMCache#2380) * Refactor: extract virtual address manager from TensorMemoryAllocator Signed-off-by: ApostaC <yihua98@uchicago.edu>
… and disable LazyMemoryAllocator (LMCache#2380) * Refactor: extract virtual address manager from TensorMemoryAllocator Signed-off-by: ApostaC <yihua98@uchicago.edu>
… and disable LazyMemoryAllocator (LMCache#2380) * Refactor: extract virtual address manager from TensorMemoryAllocator Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>
What this PR does / why we need it:
Part of #2379
Special notes for your reviewers:
If applicable: