Checklist
Describe the bug
Bug Description
Double free bug in cache_controller.py causes host memory corruption in HiCache.
Root Cause
The same host_indices are freed twice in the prefetch flow:
First free - in _page_transfer():
def _page_transfer(self, operation):
...
self.append_host_mem_release(
operation.host_indices[operation.completed_tokens :]
)
Second free - in prefetch_io_aux_func():
def prefetch_io_aux_func(self):
...
operation = self.prefetch_buffer.get(block=True, timeout=1)
self._page_transfer(operation)
# ❌ Double free
self.append_host_mem_release(
operation.host_indices[operation.completed_tokens :]
)
Fix
Remove the duplicate release in prefetch_io_aux_func():
def prefetch_io_aux_func(self):
while not self.stop_event.is_set():
try:
operation = self.prefetch_buffer.get(block=True, timeout=1)
self._page_transfer(operation)
# Remove these lines - already freed in _page_transfer
# self.append_host_mem_release(
# operation.host_indices[operation.completed_tokens :]
# )
except Empty:
continue
Impact
This causes the same host memory indices to be allocated multiple times, leading to KV cache corruption and cross-query data leakage.
Additional Recommendations
Add double-free detection in mem_pool_host.free() to catch similar bugs:
def free(self, indices: torch.Tensor) -> int:
if self.free_slots.numel() > 0:
dup_mask = torch.isin(indices, self.free_slots)
if dup_mask.any():
raise ValueError(f"Double free detected: {indices[dup_mask].tolist()}")
self.free_slots = torch.cat([self.free_slots, indices])
return len(indices)
Reproduction
Add double-free detection in mem_pool_host.free(). Run any workload with HiCache storage backend enabled. The error will be triggered immediately.
Environment
HiCache with backend storage enabled (any storage backend).
Checklist
Describe the bug
Bug Description
Double free bug in
cache_controller.pycauses host memory corruption in HiCache.Root Cause
The same
host_indicesare freed twice in the prefetch flow:First free - in
_page_transfer():Second free - in
prefetch_io_aux_func():Fix
Remove the duplicate release in
prefetch_io_aux_func():Impact
This causes the same host memory indices to be allocated multiple times, leading to KV cache corruption and cross-query data leakage.
Additional Recommendations
Add double-free detection in
mem_pool_host.free()to catch similar bugs:Reproduction
Add double-free detection in
mem_pool_host.free(). Run any workload with HiCache storage backend enabled. The error will be triggered immediately.Environment
HiCache with backend storage enabled (any storage backend).