Skip to content

Add /dev/dax (Device-DAX) backend for KV cache storage#2714

Closed
jayhpark530 wants to merge 4 commits intoLMCache:devfrom
ComputeOffload:feat/dax-backend
Closed

Add /dev/dax (Device-DAX) backend for KV cache storage#2714
jayhpark530 wants to merge 4 commits intoLMCache:devfrom
ComputeOffload:feat/dax-backend

Conversation

@jayhpark530
Copy link
Copy Markdown
Contributor

Fixes #2572

What this PR does / why we need it

This PR implements the /dev/dax (Device-DAX) backend discussed in RFC issue #2572.

The goal is to enable KV cache storage on byte-addressable memory devices exposed through /dev/dax, such as:

  • Persistent memory
  • CXL-based memory expanders
  • Other byte-addressable memory tiers

Since LMCache already supports multiple storage backends, adding a DAX backend enables experimentation with KV cache tiering on emerging heterogeneous memory systems.

The backend maps a /dev/dax device into userspace and uses the mapped region as a KV cache arena managed by LMCache.

Two usage modes are supported:

1. Tiered backend mode

  • /dev/dax is used as a storage tier after LocalCPUBackend, similar to other LMCache storage backends
  • KV cache chunks are stored in the DAX arena when evicted from DRAM
  • Reads copy cached chunks back into CPU memory

This is the recommended configuration.

2. Primary backend mode

Unlike traditional storage backends, /dev/dax exposes byte-addressable memory that can be directly mapped into userspace.
Because of this property, the mapped DAX arena can also serve as an allocator backend in LMCache.

  • /dev/dax replaces the DRAM tier and acts as both the allocator and storage backend
  • KV cache chunks are stored directly inside the DAX arena
  • Returned MemoryObjs are backed by the mapped region

When supported by the platform, the backend registers the mapped address range using cudaHostRegister, allowing GPU ↔ DAX transfers to bypass an additional DRAM staging copy.
As discussed in the RFC thread, this mode is platform-dependent, so it is implemented as an opt-in fast path.

Potential benefits include:

  • Extending KV cache capacity beyond DRAM using byte-addressable memory devices
  • Potentially reducing DRAM staging by using a /dev/dax mmap arena as KV cache storage
  • Enabling KV cache tiering experiments on emerging memory tiers such as PMem and CXL memory expanders

Key features

  • /dev/dax storage backend implemented as an LMCache storage plugin
  • Fixed-size slot allocator on top of a mapped DAX arena
  • LRU eviction with pin support
  • Safe lifetime management using arena leases and slot generations
  • Zero-copy reads in primary mode
  • Graceful shutdown with in-flight operation tracking
  • Integration with LMCache AllocatorBackendInterface

Documentation

Documentation for the DAX backend configuration and usage is included:

docs/source/kv_cache/storage_backends/dax.rst

If applicable

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LMCache's capabilities by integrating a new backend for /dev/dax devices. This allows the KV cache to leverage emerging byte-addressable memory technologies, offering flexible configurations for memory tiering and potentially improving performance by reducing data movement between CPU and GPU memory. The implementation includes robust memory management and error handling, ensuring reliable operation within the LMCache ecosystem.

Highlights

  • New /dev/dax Backend: Introduced a new storage plugin for LMCache that utilizes /dev/dax (Device-DAX) for KV cache storage, enabling the use of byte-addressable memory devices like persistent memory and CXL-based memory expanders.
  • Dual Operating Modes: The DAX backend supports two modes: 'tiered' mode, where it acts as a lower tier to LocalCPUBackend, and 'primary' mode, where it replaces the DRAM tier and acts as both allocator and storage, potentially allowing direct GPU mapping.
  • Memory Management Features: Implemented a fixed-size slot allocator, LRU eviction with pin support, and safe lifetime management using arena leases and slot generations for the DAX arena.
  • Direct GPU Mapping (Primary Mode): In primary mode, the backend registers the mapped DAX address range with cudaHostRegister to enable zero-copy GPU ↔ DAX transfers, bypassing additional DRAM staging.
  • Comprehensive Testing and Documentation: Added extensive unit tests to validate the functionality and robustness of the DaxBackend, along with detailed documentation covering configuration, usage, runtime requirements, and troubleshooting.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source/kv_cache/storage_backends/dax.rst
    • Added new documentation for the Device-DAX (/dev/dax) storage backend, detailing its overview, configuration examples for tiered and primary modes, runtime requirements, validation, limits, and troubleshooting.
  • docs/source/kv_cache/storage_backends/index.rst
    • Updated the index of storage backends to include the newly added 'dax' documentation.
  • lmcache/v1/storage_backend/plugins/dax_backend.py
    • Added the core implementation of the DaxBackend class, providing functionality for mmap-backed KV cache storage on /dev/dax devices, including slot allocation, LRU eviction, and direct GPU mapping support.
    • Implemented internal classes for managing memory objects, arena state, and tracking in-flight operations and leases.
  • lmcache/v1/storage_backend/storage_manager.py
    • Modified the _get_allocator_backend method to conditionally select the DaxBackend as the primary allocator when configured in 'primary' mode.
  • tests/v1/storage_backend/test_dax_backend.py
    • Added a comprehensive suite of unit tests for the DaxBackend, covering tiered and primary mode functionality, error handling, eviction policies, multithreading, and resource cleanup.
Activity
  • The pull request introduces a new feature, so there is no prior activity to summarize.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /dev/dax storage backend for the KV cache, supporting both a tiered mode (as a backing store for CPU memory) and a primary mode (as a zero-copy primary allocator for direct GPU access). While the implementation is generally well-structured with good resource management and testing, a security audit identified two medium-severity vulnerabilities. First, a resource leak may occur due to the use of WeakKeyDictionary in the cleanup path, potentially leading to resource exhaustion and Denial of Service. Second, the device_path configuration is used in a file open operation without validation, posing a risk of path traversal or arbitrary file access. It is recommended to use weakref.finalize for robust resource cleanup and to implement strict path validation for the device_path. Additionally, a critical correctness issue was found where ctypes.memmove is used for potential GPU-to-host transfers, which is unsafe and should be replaced with a device-aware copy mechanism like torch.Tensor.copy_. A minor style improvement also suggests moving a local import to the top level. Addressing these issues will enhance the backend's robustness, stability, and security, particularly in primary mode.

Comment on lines +906 to +918
def _do_write(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
ctypes.memmove(
ctypes.c_void_p(self._base_ptr + offset),
ctypes.c_void_p(memory_obj.data_ptr),
ctypes.c_size_t(size),
)

def _do_read(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
ctypes.memmove(
ctypes.c_void_p(memory_obj.data_ptr),
ctypes.c_void_p(self._base_ptr + offset),
ctypes.c_size_t(size),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _do_write and _do_read methods use ctypes.memmove for data transfers. This is only safe for host-to-host memory copies. If the source MemoryObj in _do_write or the destination MemoryObj in _do_read is backed by a GPU tensor, memmove will receive a GPU pointer and likely cause a segmentation fault, as it's not designed for device memory.

Given that this backend is intended to work with GPU-based models and the "primary" mode aims for direct GPU-DAX transfers, using a device-agnostic copy mechanism is crucial. I suggest using torch.Tensor.copy_ which correctly handles copies between different devices (CPU, CUDA). This would make the implementation more robust and prevent crashes when operating with GPU tensors.

    def _do_write(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
        dst_view = self._arena_tensor.narrow(0, offset, size)
        src_view = memory_obj.raw_tensor.narrow(0, 0, size)
        dst_view.copy_(src_view, non_blocking=True)

    def _do_read(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
        dst_view = memory_obj.raw_tensor.narrow(0, 0, size)
        src_view = self._arena_tensor.narrow(0, offset, size)
        dst_view.copy_(src_view, non_blocking=True)

Comment on lines +641 to +647
def _release_memory_obj(self, memory_obj: MemoryObj) -> None:
with self._state_lock:
state = self._mark_memory_obj_released_locked(memory_obj)
if state is None:
return
self._release_arena_handle_locked(state.handle)
state.lease.release()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The DaxBackend uses a WeakKeyDictionary (self._memory_obj_states) to store the state (including the arena lease and slot handle) of MemoryObj instances. When a MemoryObj is garbage collected, its __del__ method calls the allocator's free method, which in turn calls _release_memory_obj. However, because the object is already being finalized, it may have been removed from the WeakKeyDictionary before the lookup occurs, causing _mark_memory_obj_released_locked to return None. This results in leaked DAX arena leases and un-reclaimable memory slots. Over time, this can lead to resource exhaustion (file descriptors and memory mappings) and a Denial of Service (DoS) as the backend runs out of available slots.

    def _release_memory_obj(self, memory_obj: MemoryObj) -> None: 
        # Note: Relying on WeakKeyDictionary during __del__ is unreliable.
        # Consider using weakref.finalize for robust cleanup or storing state directly on the object.
        with self._state_lock:
            state = self._mark_memory_obj_released_locked(memory_obj)
            if state is None:
                # If the object is already being finalized and removed from the weak dict,
                # we may need an alternative way to retrieve its handle and lease.
                return
            self._release_arena_handle_locked(state.handle)
        state.lease.release()

Comment on lines +368 to +370
self.device_path = str(extra.get("dax.device_path", "")).strip()
if not self.device_path:
raise ValueError("extra_config['dax.device_path'] is required")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The device_path parameter, obtained from the extra_config dictionary, is passed directly to os.open without any validation or sanitization. An attacker who can control the configuration (e.g., through environment variables or a configuration file in a multi-tenant environment) could point this to arbitrary files on the system. Since the file is opened with os.O_RDWR and subsequently mapped with PROT_WRITE, this could allow an attacker to read from or overwrite sensitive system files, leading to information disclosure or system compromise.

        self.device_path = str(extra.get("dax.device_path", "")).strip()
        if not self.device_path:
            raise ValueError("extra_config['dax.device_path'] is required")
        if not os.path.abspath(self.device_path).startswith("/dev/"):
            raise ValueError(f"Invalid device path: {self.device_path}. Path must be within /dev/")

arena_tensor = torch.frombuffer(arena_view, dtype=torch.uint8)
else:
# Third Party
import numpy as np
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports should usually be at the top of the file. Please move this import to the top of the file under the # Third Party section to improve code clarity and consistency.

@DongDongJu DongDongJu self-requested a review March 9, 2026 14:20
Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jayhpark530,
Thanks for the terrific work!
IMO, It seems too big to merge it at once.
Could you break down this one into smaller series?
Also, primary allocator is hard to get it the advantage comparing with DRAM based.

@jayhpark530
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback, @DongDongJu!

I agree that the change set is quite large to merge at once.
I'll split this into a smaller series of PRs focusing on the basic DAX backend.

I believe the primary allocator could have potential advantages in some scenarios, but the benefits over the DRAM-based allocator are not yet clear and it significantly increases the PR size.
For now, I'll leave it out and focus on getting the basic DAX backend merged first through smaller PRs.

@jayhpark530
Copy link
Copy Markdown
Contributor Author

A new PR for the basic tiered Device-DAX backend is available here: #2788

@DongDongJu
Copy link
Copy Markdown
Collaborator

A new PR for the basic tiered Device-DAX backend is available here: #2788

Thanks. I will take a look. Could you close this one for now?

@jayhpark530
Copy link
Copy Markdown
Contributor Author

Thanks. I will take a look. Could you close this one for now?

Sure, I’ll close this for now. Appreciate it!

sammshen pushed a commit that referenced this pull request Mar 23, 2026
…2714) (#2788)

* feat(kv_cache): add Device-DAX backend

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* cleanup: remove dead code

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Fix issues raised in Gemini code review

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix(dax): use refcounted pinning and tighten DAX validation/docs

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* docs(dax): add missing docstrings and reorganize helper methods

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix: resolve pre-commit lint, format, and type errors in DAX backend

Fix ruff E501 line-length violations, ruff format inconsistencies,
isort import ordering, and mypy type errors across dax_backend.py
and test_dax_backend.py so that CI code quality checks pass cleanly.

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Add DAX arena field comments

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

---------

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
…MCache#2714) (LMCache#2788)

* feat(kv_cache): add Device-DAX backend

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* cleanup: remove dead code

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Fix issues raised in Gemini code review

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix(dax): use refcounted pinning and tighten DAX validation/docs

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* docs(dax): add missing docstrings and reorganize helper methods

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix: resolve pre-commit lint, format, and type errors in DAX backend

Fix ruff E501 line-length violations, ruff format inconsistencies,
isort import ordering, and mypy type errors across dax_backend.py
and test_dax_backend.py so that CI code quality checks pass cleanly.

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Add DAX arena field comments

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

---------

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
…MCache#2714) (LMCache#2788)

* feat(kv_cache): add Device-DAX backend

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* cleanup: remove dead code

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Fix issues raised in Gemini code review

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix(dax): use refcounted pinning and tighten DAX validation/docs

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* docs(dax): add missing docstrings and reorganize helper methods

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix: resolve pre-commit lint, format, and type errors in DAX backend

Fix ruff E501 line-length violations, ruff format inconsistencies,
isort import ordering, and mypy type errors across dax_backend.py
and test_dax_backend.py so that CI code quality checks pass cleanly.

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Add DAX arena field comments

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

---------

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…MCache#2714) (LMCache#2788)

* feat(kv_cache): add Device-DAX backend

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* cleanup: remove dead code

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Fix issues raised in Gemini code review

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix(dax): use refcounted pinning and tighten DAX validation/docs

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* docs(dax): add missing docstrings and reorganize helper methods

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix: resolve pre-commit lint, format, and type errors in DAX backend

Fix ruff E501 line-length violations, ruff format inconsistencies,
isort import ordering, and mypy type errors across dax_backend.py
and test_dax_backend.py so that CI code quality checks pass cleanly.

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Add DAX arena field comments

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

---------

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…MCache#2714) (LMCache#2788)

* feat(kv_cache): add Device-DAX backend

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* cleanup: remove dead code

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Fix issues raised in Gemini code review

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix(dax): use refcounted pinning and tighten DAX validation/docs

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* docs(dax): add missing docstrings and reorganize helper methods

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* fix: resolve pre-commit lint, format, and type errors in DAX backend

Fix ruff E501 line-length violations, ruff format inconsistencies,
isort import ordering, and mypy type errors across dax_backend.py
and test_dax_backend.py so that CI code quality checks pass cleanly.

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

* Add DAX arena field comments

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

---------

Signed-off-by: JaeHyeong Park <tino.park@samsung.com>
Signed-off-by: jayhpark530 <jayhpark530@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Add /dev/dax (Device-DAX) backend support for KV cache storage

2 participants