Add /dev/dax (Device-DAX) backend for KV cache storage by jayhpark530 · Pull Request #2714 · LMCache/LMCache

jayhpark530 · 2026-03-09T10:28:37Z

What this PR does / why we need it

This PR implements the /dev/dax (Device-DAX) backend discussed in RFC issue #2572.

The goal is to enable KV cache storage on byte-addressable memory devices exposed through /dev/dax, such as:

Persistent memory
CXL-based memory expanders
Other byte-addressable memory tiers

Since LMCache already supports multiple storage backends, adding a DAX backend enables experimentation with KV cache tiering on emerging heterogeneous memory systems.

The backend maps a /dev/dax device into userspace and uses the mapped region as a KV cache arena managed by LMCache.

Two usage modes are supported:

1. Tiered backend mode

/dev/dax is used as a storage tier after LocalCPUBackend, similar to other LMCache storage backends
KV cache chunks are stored in the DAX arena when evicted from DRAM
Reads copy cached chunks back into CPU memory

This is the recommended configuration.

2. Primary backend mode

Unlike traditional storage backends, /dev/dax exposes byte-addressable memory that can be directly mapped into userspace.
Because of this property, the mapped DAX arena can also serve as an allocator backend in LMCache.

/dev/dax replaces the DRAM tier and acts as both the allocator and storage backend
KV cache chunks are stored directly inside the DAX arena
Returned MemoryObjs are backed by the mapped region

When supported by the platform, the backend registers the mapped address range using cudaHostRegister, allowing GPU ↔ DAX transfers to bypass an additional DRAM staging copy.
As discussed in the RFC thread, this mode is platform-dependent, so it is implemented as an opt-in fast path.

Potential benefits include:

Extending KV cache capacity beyond DRAM using byte-addressable memory devices
Potentially reducing DRAM staging by using a /dev/dax mmap arena as KV cache storage
Enabling KV cache tiering experiments on emerging memory tiers such as PMem and CXL memory expanders

Key features

/dev/dax storage backend implemented as an LMCache storage plugin
Fixed-size slot allocator on top of a mapped DAX arena
LRU eviction with pin support
Safe lifetime management using arena leases and slot generations
Zero-copy reads in primary mode
Graceful shutdown with in-flight operation tracking
Integration with LMCache AllocatorBackendInterface

Documentation

Documentation for the DAX backend configuration and usage is included:

docs/source/kv_cache/storage_backends/dax.rst

If applicable

this PR contains user facing changes - docs added
this PR contains unit tests

…for KV cache

gemini-code-assist · 2026-03-09T10:28:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LMCache's capabilities by integrating a new backend for /dev/dax devices. This allows the KV cache to leverage emerging byte-addressable memory technologies, offering flexible configurations for memory tiering and potentially improving performance by reducing data movement between CPU and GPU memory. The implementation includes robust memory management and error handling, ensuring reliable operation within the LMCache ecosystem.

Highlights

New /dev/dax Backend: Introduced a new storage plugin for LMCache that utilizes /dev/dax (Device-DAX) for KV cache storage, enabling the use of byte-addressable memory devices like persistent memory and CXL-based memory expanders.
Dual Operating Modes: The DAX backend supports two modes: 'tiered' mode, where it acts as a lower tier to LocalCPUBackend, and 'primary' mode, where it replaces the DRAM tier and acts as both allocator and storage, potentially allowing direct GPU mapping.
Memory Management Features: Implemented a fixed-size slot allocator, LRU eviction with pin support, and safe lifetime management using arena leases and slot generations for the DAX arena.
Direct GPU Mapping (Primary Mode): In primary mode, the backend registers the mapped DAX address range with cudaHostRegister to enable zero-copy GPU ↔ DAX transfers, bypassing additional DRAM staging.
Comprehensive Testing and Documentation: Added extensive unit tests to validate the functionality and robustness of the DaxBackend, along with detailed documentation covering configuration, usage, runtime requirements, and troubleshooting.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/kv_cache/storage_backends/dax.rst
- Added new documentation for the Device-DAX (/dev/dax) storage backend, detailing its overview, configuration examples for tiered and primary modes, runtime requirements, validation, limits, and troubleshooting.
docs/source/kv_cache/storage_backends/index.rst
- Updated the index of storage backends to include the newly added 'dax' documentation.
lmcache/v1/storage_backend/plugins/dax_backend.py
- Added the core implementation of the DaxBackend class, providing functionality for mmap-backed KV cache storage on /dev/dax devices, including slot allocation, LRU eviction, and direct GPU mapping support.
- Implemented internal classes for managing memory objects, arena state, and tracking in-flight operations and leases.
lmcache/v1/storage_backend/storage_manager.py
- Modified the _get_allocator_backend method to conditionally select the DaxBackend as the primary allocator when configured in 'primary' mode.
tests/v1/storage_backend/test_dax_backend.py
- Added a comprehensive suite of unit tests for the DaxBackend, covering tiered and primary mode functionality, error handling, eviction policies, multithreading, and resource cleanup.

Activity

The pull request introduces a new feature, so there is no prior activity to summarize.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new /dev/dax storage backend for the KV cache, supporting both a tiered mode (as a backing store for CPU memory) and a primary mode (as a zero-copy primary allocator for direct GPU access). While the implementation is generally well-structured with good resource management and testing, a security audit identified two medium-severity vulnerabilities. First, a resource leak may occur due to the use of WeakKeyDictionary in the cleanup path, potentially leading to resource exhaustion and Denial of Service. Second, the device_path configuration is used in a file open operation without validation, posing a risk of path traversal or arbitrary file access. It is recommended to use weakref.finalize for robust resource cleanup and to implement strict path validation for the device_path. Additionally, a critical correctness issue was found where ctypes.memmove is used for potential GPU-to-host transfers, which is unsafe and should be replaced with a device-aware copy mechanism like torch.Tensor.copy_. A minor style improvement also suggests moving a local import to the top level. Addressing these issues will enhance the backend's robustness, stability, and security, particularly in primary mode.

gemini-code-assist · 2026-03-09T10:34:50Z

+    def _do_write(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
+        ctypes.memmove(
+            ctypes.c_void_p(self._base_ptr + offset),
+            ctypes.c_void_p(memory_obj.data_ptr),
+            ctypes.c_size_t(size),
+        )
+
+    def _do_read(self, offset: int, memory_obj: MemoryObj, size: int) -> None:
+        ctypes.memmove(
+            ctypes.c_void_p(memory_obj.data_ptr),
+            ctypes.c_void_p(self._base_ptr + offset),
+            ctypes.c_size_t(size),
+        )


The _do_write and _do_read methods use ctypes.memmove for data transfers. This is only safe for host-to-host memory copies. If the source MemoryObj in _do_write or the destination MemoryObj in _do_read is backed by a GPU tensor, memmove will receive a GPU pointer and likely cause a segmentation fault, as it's not designed for device memory.

Given that this backend is intended to work with GPU-based models and the "primary" mode aims for direct GPU-DAX transfers, using a device-agnostic copy mechanism is crucial. I suggest using torch.Tensor.copy_ which correctly handles copies between different devices (CPU, CUDA). This would make the implementation more robust and prevent crashes when operating with GPU tensors.

def _do_write(self, offset: int, memory_obj: MemoryObj, size: int) -> None: dst_view = self._arena_tensor.narrow(0, offset, size) src_view = memory_obj.raw_tensor.narrow(0, 0, size) dst_view.copy_(src_view, non_blocking=True) def _do_read(self, offset: int, memory_obj: MemoryObj, size: int) -> None: dst_view = memory_obj.raw_tensor.narrow(0, 0, size) src_view = self._arena_tensor.narrow(0, offset, size) dst_view.copy_(src_view, non_blocking=True)

gemini-code-assist · 2026-03-09T10:34:50Z

+    def _release_memory_obj(self, memory_obj: MemoryObj) -> None:
+        with self._state_lock:
+            state = self._mark_memory_obj_released_locked(memory_obj)
+            if state is None:
+                return
+            self._release_arena_handle_locked(state.handle)
+        state.lease.release()


The DaxBackend uses a WeakKeyDictionary (self._memory_obj_states) to store the state (including the arena lease and slot handle) of MemoryObj instances. When a MemoryObj is garbage collected, its __del__ method calls the allocator's free method, which in turn calls _release_memory_obj. However, because the object is already being finalized, it may have been removed from the WeakKeyDictionary before the lookup occurs, causing _mark_memory_obj_released_locked to return None. This results in leaked DAX arena leases and un-reclaimable memory slots. Over time, this can lead to resource exhaustion (file descriptors and memory mappings) and a Denial of Service (DoS) as the backend runs out of available slots.

def _release_memory_obj(self, memory_obj: MemoryObj) -> None: # Note: Relying on WeakKeyDictionary during __del__ is unreliable. # Consider using weakref.finalize for robust cleanup or storing state directly on the object. with self._state_lock: state = self._mark_memory_obj_released_locked(memory_obj) if state is None: # If the object is already being finalized and removed from the weak dict, # we may need an alternative way to retrieve its handle and lease. return self._release_arena_handle_locked(state.handle) state.lease.release()

gemini-code-assist · 2026-03-09T10:34:50Z

+        self.device_path = str(extra.get("dax.device_path", "")).strip()
+        if not self.device_path:
+            raise ValueError("extra_config['dax.device_path'] is required")


The device_path parameter, obtained from the extra_config dictionary, is passed directly to os.open without any validation or sanitization. An attacker who can control the configuration (e.g., through environment variables or a configuration file in a multi-tenant environment) could point this to arbitrary files on the system. Since the file is opened with os.O_RDWR and subsequently mapped with PROT_WRITE, this could allow an attacker to read from or overwrite sensitive system files, leading to information disclosure or system compromise.

self.device_path = str(extra.get("dax.device_path", "")).strip() if not self.device_path: raise ValueError("extra_config['dax.device_path'] is required") if not os.path.abspath(self.device_path).startswith("/dev/"): raise ValueError(f"Invalid device path: {self.device_path}. Path must be within /dev/")

gemini-code-assist · 2026-03-09T10:34:50Z

+                arena_tensor = torch.frombuffer(arena_view, dtype=torch.uint8)
+            else:
+                # Third Party
+                import numpy as np


According to PEP 8, imports should usually be at the top of the file. Please move this import to the top of the file under the # Third Party section to improve code clarity and consistency.

DongDongJu

Hello @jayhpark530,
Thanks for the terrific work!
IMO, It seems too big to merge it at once.
Could you break down this one into smaller series?
Also, primary allocator is hard to get it the advantage comparing with DRAM based.

jayhpark530 · 2026-03-10T03:08:21Z

Thanks for the feedback, @DongDongJu!

I agree that the change set is quite large to merge at once.
I'll split this into a smaller series of PRs focusing on the basic DAX backend.

I believe the primary allocator could have potential advantages in some scenarios, but the benefits over the DRAM-based allocator are not yet clear and it significantly increases the PR size.
For now, I'll leave it out and focus on getting the basic DAX backend merged first through smaller PRs.

jayhpark530 · 2026-03-16T11:30:46Z

A new PR for the basic tiered Device-DAX backend is available here: #2788

DongDongJu · 2026-03-16T14:06:26Z

A new PR for the basic tiered Device-DAX backend is available here: #2788

Thanks. I will take a look. Could you close this one for now?

jayhpark530 · 2026-03-17T00:56:32Z

Thanks. I will take a look. Could you close this one for now?

Sure, I’ll close this for now. Appreciate it!

…2714) (#2788) * feat(kv_cache): add Device-DAX backend Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * cleanup: remove dead code Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * Fix issues raised in Gemini code review Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * fix(dax): use refcounted pinning and tighten DAX validation/docs Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * docs(dax): add missing docstrings and reorganize helper methods Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * fix: resolve pre-commit lint, format, and type errors in DAX backend Fix ruff E501 line-length violations, ruff format inconsistencies, isort import ordering, and mypy type errors across dax_backend.py and test_dax_backend.py so that CI code quality checks pass cleanly. Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * Add DAX arena field comments Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> --------- Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

…MCache#2714) (LMCache#2788) * feat(kv_cache): add Device-DAX backend Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * cleanup: remove dead code Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * Fix issues raised in Gemini code review Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * fix(dax): use refcounted pinning and tighten DAX validation/docs Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * docs(dax): add missing docstrings and reorganize helper methods Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * fix: resolve pre-commit lint, format, and type errors in DAX backend Fix ruff E501 line-length violations, ruff format inconsistencies, isort import ordering, and mypy type errors across dax_backend.py and test_dax_backend.py so that CI code quality checks pass cleanly. Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> * Add DAX arena field comments Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com> --------- Signed-off-by: JaeHyeong Park <tino.park@samsung.com> Signed-off-by: jayhpark530 <jayhpark530@gmail.com>

jayhpark530 added 2 commits March 9, 2026 19:04

feat: add Device-DAX backend with tiered and primary allocator modes …

4f6fea1

…for KV cache

Merge branch 'LMCache:dev' into feat/dax-backend

5cfd9f7

gemini-code-assist Bot reviewed Mar 9, 2026

View reviewed changes

DongDongJu self-requested a review March 9, 2026 14:20

DongDongJu reviewed Mar 9, 2026

View reviewed changes

jayhpark530 added 2 commits March 13, 2026 14:38

Merge branch 'LMCache:dev' into feat/dax-backend

3d1e2a8

Merge branch 'LMCache:dev' into feat/dax-backend

8d373c2

jayhpark530 mentioned this pull request Mar 16, 2026

Add Device-DAX (/dev/dax) storage backend for KV cache (follow-up to #2714) #2788

Merged

2 tasks

jayhpark530 closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /dev/dax (Device-DAX) backend for KV cache storage#2714

Add /dev/dax (Device-DAX) backend for KV cache storage#2714
jayhpark530 wants to merge 4 commits intoLMCache:devfrom
ComputeOffload:feat/dax-backend

jayhpark530 commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Uh oh!

DongDongJu left a comment

Uh oh!

jayhpark530 commented Mar 10, 2026

Uh oh!

jayhpark530 commented Mar 16, 2026

Uh oh!

DongDongJu commented Mar 16, 2026

Uh oh!

jayhpark530 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayhpark530 commented Mar 9, 2026

What this PR does / why we need it

1. Tiered backend mode

2. Primary backend mode

Key features

Documentation

If applicable

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

DongDongJu left a comment

Choose a reason for hiding this comment

Uh oh!

jayhpark530 commented Mar 10, 2026

Uh oh!

jayhpark530 commented Mar 16, 2026

Uh oh!

DongDongJu commented Mar 16, 2026

Uh oh!

jayhpark530 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants