Skip to content

[Bug] Double free in HiCache prefetch causing host memory corruption #13483

@cighao

Description

@cighao

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

Bug Description

Double free bug in cache_controller.py causes host memory corruption in HiCache.

Root Cause

The same host_indices are freed twice in the prefetch flow:

First free - in _page_transfer():

def _page_transfer(self, operation):
    ...
    self.append_host_mem_release(
        operation.host_indices[operation.completed_tokens :]
    )

Second free - in prefetch_io_aux_func():

  def prefetch_io_aux_func(self):
      ...
      operation = self.prefetch_buffer.get(block=True, timeout=1)
      self._page_transfer(operation)
      # ❌ Double free
      self.append_host_mem_release(
          operation.host_indices[operation.completed_tokens :]
      )

Fix

Remove the duplicate release in prefetch_io_aux_func():

  def prefetch_io_aux_func(self):
      while not self.stop_event.is_set():
          try:
              operation = self.prefetch_buffer.get(block=True, timeout=1)
              self._page_transfer(operation)
              # Remove these lines - already freed in _page_transfer
              # self.append_host_mem_release(
              #     operation.host_indices[operation.completed_tokens :]
              # )
          except Empty:
              continue

Impact

This causes the same host memory indices to be allocated multiple times, leading to KV cache corruption and cross-query data leakage.

Additional Recommendations

Add double-free detection in mem_pool_host.free() to catch similar bugs:

def free(self, indices: torch.Tensor) -> int:
    if self.free_slots.numel() > 0:
        dup_mask = torch.isin(indices, self.free_slots)
        if dup_mask.any():
            raise ValueError(f"Double free detected: {indices[dup_mask].tolist()}")
    self.free_slots = torch.cat([self.free_slots, indices])
    return len(indices)

Reproduction

Add double-free detection in mem_pool_host.free(). Run any workload with HiCache storage backend enabled. The error will be triggered immediately.

Environment

HiCache with backend storage enabled (any storage backend).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions