Introduce native fs connector by maobaolong · Pull Request #2779 · LMCache/LMCache

maobaolong · 2026-03-14T15:46:48Z

What this PR does / why we need it:

This PR is c++ implementation for FS connector.

test

#!/bin/bash
set -e

LOG_DIR=/data1/LMCache/logs/plugin_test
mkdir -p $LOG_DIR /tmp/test_fs_native_l2
rm -rf /tmp/test_fs_native_l2/*

echo "=== Step 1: Start LMCache MP Server (fs_native with read_ahead_size=4096) ==="
L2_JSON='{"type": "fs_native", "base_path": "/tmp/test_fs_native_l2", "num_workers": 4, "read_ahead_size": 4096}'
export LMCACHE_LOG_LEVEL=DEBUG
/usr/bin/python3 -m lmcache.v1.multiprocess.server \
    --host localhost --port 15556 \
    --chunk-size 256 --l1-size-gb 0.03 \
    --eviction-policy LRU --max-workers 1 \
    --l2-adapter "$L2_JSON" \
    > $LOG_DIR/lmcache_server.log 2>&1 &
LMCACHE_PID=$!
echo "LMCache PID: $LMCACHE_PID"
sleep 5

if ! kill -0 $LMCACHE_PID 2>/dev/null; then
    echo "FAIL: LMCache server died. Log:"
    cat $LOG_DIR/lmcache_server.log
    exit 1
fi
echo "LMCache server is running."

echo "=== Step 2: Start vLLM ==="
MODEL_PATH="/data1/model/DeepSeek-V2-Lite-Chat/"
KV_CFG='{"kv_connector":"LMCacheMPConnector","kv_role":"kv_both","kv_connector_extra_config":{"lmcache.mp.port":15556}}'
export CUDA_VISIBLE_DEVICES=0,1

/usr/bin/python3 -m vllm.entrypoints.cli.main serve $MODEL_PATH \
    -tp 2 \
    --load-format dummy \
    --trust-remote-code \
    --served-model-name vllm_cpu_offload \
    --gpu_memory_utilization 0.85 \
    --max-num-seqs 64 \
    --no-enable-prefix-caching --enforce-eager --max-model-len 8192 \
    --port 8001 \
    --disable-log-requests \
    --kv-transfer-config "$KV_CFG" \
    > $LOG_DIR/vllm_server.log 2>&1 &
VLLM_PID=$!
echo "vLLM PID: $VLLM_PID"
echo "Waiting 60s for vLLM to load model..."
sleep 60

if ! kill -0 $VLLM_PID 2>/dev/null; then
    echo "FAIL: vLLM server died. Last 20 lines:"
    tail -20 $LOG_DIR/vllm_server.log
    exit 1
fi
echo "vLLM server is running."

echo "=== Step 3: Send test requests ==="
LONG_PROMPT="The history of artificial intelligence (AI) began in antiquity, with myths, stories and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The seeds of modern AI were planted by philosophers who attempted to describe the process of human thinking as the mechanical manipulation of symbols. This work culminated in the invention of the programmable digital computer in the 1940s, a machine based on the abstract essence of mathematical reasoning. This device and the ideas behind it inspired a handful of scientists to begin seriously discussing the possibility of building an electronic brain. The field of AI research was founded at a workshop held on the campus of Dartmouth College, USA during the summer of 1956. Those who attended would become the leaders of AI research for decades. Many of them predicted that a machine as intelligent as a human being would exist in no more than a generation, and they were given millions of dollars to make this vision come true. Eventually, it became obvious that commercial developers and researchers had grossly underestimated the difficulty of the project. In 1974, in response to the criticism from James Lighthill and ongoing pressure from congress, the U.S. and British Governments stopped funding undirected research into artificial intelligence, and the difficult years that followed became known as an AI winter. Seven years later, a visionary initiative by the Japanese Government inspired governments and industry to provide AI with billions of dollars, but by the late 1980s the investors became disillusioned and withdrew funding again. Investment and interest in AI boomed in the first decades of the 21st century when machine learning was successfully applied to many problems in academia and industry due to the availability of large amounts of data and fast computers. The achievements of deep learning in the 2010s, particularly breakthroughs in areas like image recognition, natural language processing, and game playing, led to a renewed surge of interest in AI technologies. Transformer architectures, introduced in 2017, revolutionized the field and enabled the development of large language models that could generate human-like text, answer questions, write code, and engage in sophisticated reasoning tasks that were previously thought to be exclusively human capabilities."

echo "--- Request 1 (store to L2) ---"
RESP1=$(curl -s http://localhost:8001/v1/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"vllm_cpu_offload\", \"prompt\": \"$LONG_PROMPT\", \"max_tokens\": 10, \"temperature\": 0}")
echo "$RESP1" | python3 -m json.tool
sleep 3

echo ""
echo "--- L2 files after request 1 ---"
ls -la /tmp/test_fs_native_l2/ 2>/dev/null || echo "(empty)"
FILE_COUNT=$(ls /tmp/test_fs_native_l2/*.data 2>/dev/null | wc -l)
echo "Data files: $FILE_COUNT"

echo ""
echo "--- Request 2 (should load from L2/L1) ---"
RESP2=$(curl -s http://localhost:8001/v1/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"vllm_cpu_offload\", \"prompt\": \"$LONG_PROMPT\", \"max_tokens\": 10, \"temperature\": 0}")
echo "$RESP2" | python3 -m json.tool
sleep 2

echo ""
echo "=== Step 4: Check LMCache logs ==="
echo "--- FS native adapter creation ---"
grep -i "Created FS native" $LOG_DIR/lmcache_server.log || echo "(not found)"
echo "--- read_ahead in logs ---"
grep -i "read_ahead" $LOG_DIR/lmcache_server.log || echo "(not found)"
echo "--- L2 store/load operations ---"
grep -iE "(stored|loaded|L2|l2_store|l2_load)" $LOG_DIR/lmcache_server.log | tail -20 || echo "(not found)"

echo ""
echo "=== Step 5: Cleanup ==="
kill $VLLM_PID 2>/dev/null || true
kill $LMCACHE_PID 2>/dev/null || true
sleep 2

echo ""
echo "=== TEST COMPLETE ==="
echo "Data files in /tmp/test_fs_native_l2:"
ls -lh /tmp/test_fs_native_l2/*.data 2>/dev/null || echo "(none)"

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

gemini-code-assist · 2026-03-14T15:47:06Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LMCache's storage capabilities by introducing a high-performance native C++ filesystem connector. This new connector allows for efficient direct file system access, supporting features like O_DIRECT and read-ahead for optimized I/O. It also improves the robustness of batch operations by enabling per-key error handling for data retrieval and standardizing the way results are reported for both existence checks and data loads.

Highlights

Native Filesystem Connector: Introduced a new native C++ filesystem connector (FSConnector) for LMCache, enabling direct file system interaction for KV cache storage.
Per-Key Error Tolerance: Implemented per-key error tolerance for batch GET operations, allowing individual key failures without halting the entire batch and recording success/failure for each key.
Unified Batch Results: Refactored batch operation results to use a single per_key_results vector for both BATCH_TILE_EXISTS and BATCH_TILE_GET operations, simplifying result handling.
Python Integration: Added Python bindings and a new FSNativeL2AdapterConfig to integrate the native filesystem connector seamlessly into the Python LMCache framework.
Build System Update: Updated the setup.py script to include the new filesystem connector in the build process for both CUDA and ROCm environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

csrc/storage_backends/connector_base.h
- Pre-allocated per_key_results for BATCH_TILE_GET operations.
- Replaced exists_results with per_key_results for BATCH_TILE_EXISTS.
- Added start_idx to tile_req for accurate result indexing in batch operations.
- Implemented try-catch block within do_single_get to provide per-key error tolerance and record success/failure.
- Updated completion handling to move per_key_results for both BATCH_TILE_EXISTS and BATCH_TILE_GET.
csrc/storage_backends/connector_types.h
- Renamed exists_results to per_key_results and updated its description to clarify its use for both EXISTS and GET operations, supporting per-key success/failure.
csrc/storage_backends/fs/connector.cpp
- Added implementation for the FSConnector class, including file I/O operations (do_single_get, do_single_set, do_single_exists).
- Implemented helper functions replace_all and key_to_filename for path manipulation.
- Included write_all and read_all static helpers for robust file read/write operations.
- Integrated O_DIRECT support and read-ahead functionality for optimized disk access.
- Managed directory creation for base and temporary paths during initialization.
csrc/storage_backends/fs/connector.h
- Defined the FSConnector class interface, inheriting from ConnectorBase.
- Introduced WorkerFSConn struct to hold per-worker connection state, including base path, temporary directory, O_DIRECT flag, disk block size, and read-ahead size.
- Declared key encoding constants (KEY_SEP, PATH_SLASH_REPLACEMENT, FILE_EXT, TMP_EXT) for consistent file naming.
csrc/storage_backends/fs/pybind.cpp
- Added Python bindings for the FSConnector class using pybind11, exposing its constructor and methods to Python.
lmcache/v1/distributed/l2_adapters/native_connector_l2_adapter.py
- Modified _demux_loop to process result_bools for _OP_LOAD, enabling per-key load status, with a fallback for connectors that do not provide this detail.
- Introduced FSNativeL2AdapterConfig for configuring the new filesystem connector from Python, including parameters like base_path, num_workers, use_odirect, and read_ahead_size.
- Added _create_fs_native_l2_adapter function to instantiate the native C++ FSConnector client.
- Registered the new fs_native L2 adapter type and factory for dynamic loading.
setup.py
- Added fs_sources list to include the new C++ source files for the filesystem connector.
- Configured lmcache.lmcache_fs as a new CppExtension in ext_modules for both CUDA and ROCm build paths, ensuring the native connector is compiled and linked.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a native C++ filesystem connector, which is a significant and well-implemented feature. The C++ implementation is robust, using modern features like std::filesystem and employing best practices such as atomic writes via temporary files and rename. The error handling, especially the per-key error tolerance for batch get operations, is a great addition. The Python bindings and integration into the L2 adapter framework are also clean and follow existing patterns. I have one minor suggestion to remove some unused code. Overall, this is an excellent contribution.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

sammshen · 2026-03-16T06:15:22Z

+                  // Per-key error tolerance: record failure
+                  // but continue processing remaining keys
+                  req.batch->per_key_results[req.start_idx + i] = 0;
+                  fprintf(stderr, "[LMCache FS GET] key %s failed: %s\n",


there should not be FS specific stuff in teh connector base

@maobaolong could you address this?

Do you mean to remove the FS here?
I have removed it maobaolong@95e1ce4

thanks for addressing

sammshen · 2026-03-16T06:16:16Z

+                try {
+                  do_single_get(conn, req.keys[i], req.buf_ptrs[i],
+                                req.buf_lens[i], req.batch_chunk_num_bytes);
+                  req.batch->per_key_results[req.start_idx + i] = 1;


1 is false or true, leave comment here?

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

sammshen

LGTM! @maobaolong Please address the comment in teh connector base

sammshen

LGTM!

deng451e

LGTM

* Add native FS L2 connector for MP mode * Introduce native fs connector Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Remove the unused path_buf Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Add comment and fix output Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Samuel Shen <slshen@tensormesh.ai>

maobaolong added 2 commits March 14, 2026 11:35

Add native FS L2 connector for MP mode

2c85467

Introduce native fs connector

c5e0569

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong requested a review from sammshen March 14, 2026 15:46

gemini-code-assist Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread csrc/storage_backends/fs/connector.h Outdated

Remove the unused path_buf

425ee16

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

sammshen reviewed Mar 16, 2026

View reviewed changes

Add comment and fix output

95e1ce4

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong requested a review from sammshen March 18, 2026 11:55

sammshen requested changes Mar 18, 2026

View reviewed changes

sammshen approved these changes Mar 19, 2026

View reviewed changes

Merge branch 'dev' into mp_native_fs

6add613

sammshen requested a review from deng451e March 20, 2026 01:03

deng451e approved these changes Mar 23, 2026

View reviewed changes

maobaolong enabled auto-merge (squash) March 24, 2026 08:00

github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026

maobaolong merged commit 729ff73 into LMCache:dev Mar 24, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce native fs connector#2779

Introduce native fs connector#2779
maobaolong merged 5 commits intoLMCache:devfrom
maobaolong:mp_native_fs

maobaolong commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

sammshen Mar 16, 2026

Uh oh!

sammshen Mar 18, 2026

Uh oh!

maobaolong Mar 18, 2026

Uh oh!

sammshen Mar 20, 2026

Uh oh!

sammshen Mar 16, 2026

Uh oh!

maobaolong Mar 18, 2026

Uh oh!

sammshen left a comment •

edited

Loading

Uh oh!

sammshen left a comment

Uh oh!

deng451e left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maobaolong commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

sammshen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

deng451e left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maobaolong commented Mar 14, 2026 •

edited

Loading

sammshen left a comment •

edited

Loading