Skip to content

Introduce native fs connector#2779

Merged
maobaolong merged 5 commits intoLMCache:devfrom
maobaolong:mp_native_fs
Mar 24, 2026
Merged

Introduce native fs connector#2779
maobaolong merged 5 commits intoLMCache:devfrom
maobaolong:mp_native_fs

Conversation

@maobaolong
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong commented Mar 14, 2026

What this PR does / why we need it:

This PR is c++ implementation for FS connector.

  • test
#!/bin/bash
set -e

LOG_DIR=/data1/LMCache/logs/plugin_test
mkdir -p $LOG_DIR /tmp/test_fs_native_l2
rm -rf /tmp/test_fs_native_l2/*

echo "=== Step 1: Start LMCache MP Server (fs_native with read_ahead_size=4096) ==="
L2_JSON='{"type": "fs_native", "base_path": "/tmp/test_fs_native_l2", "num_workers": 4, "read_ahead_size": 4096}'
export LMCACHE_LOG_LEVEL=DEBUG
/usr/bin/python3 -m lmcache.v1.multiprocess.server \
    --host localhost --port 15556 \
    --chunk-size 256 --l1-size-gb 0.03 \
    --eviction-policy LRU --max-workers 1 \
    --l2-adapter "$L2_JSON" \
    > $LOG_DIR/lmcache_server.log 2>&1 &
LMCACHE_PID=$!
echo "LMCache PID: $LMCACHE_PID"
sleep 5

if ! kill -0 $LMCACHE_PID 2>/dev/null; then
    echo "FAIL: LMCache server died. Log:"
    cat $LOG_DIR/lmcache_server.log
    exit 1
fi
echo "LMCache server is running."

echo "=== Step 2: Start vLLM ==="
MODEL_PATH="/data1/model/DeepSeek-V2-Lite-Chat/"
KV_CFG='{"kv_connector":"LMCacheMPConnector","kv_role":"kv_both","kv_connector_extra_config":{"lmcache.mp.port":15556}}'
export CUDA_VISIBLE_DEVICES=0,1

/usr/bin/python3 -m vllm.entrypoints.cli.main serve $MODEL_PATH \
    -tp 2 \
    --load-format dummy \
    --trust-remote-code \
    --served-model-name vllm_cpu_offload \
    --gpu_memory_utilization 0.85 \
    --max-num-seqs 64 \
    --no-enable-prefix-caching --enforce-eager --max-model-len 8192 \
    --port 8001 \
    --disable-log-requests \
    --kv-transfer-config "$KV_CFG" \
    > $LOG_DIR/vllm_server.log 2>&1 &
VLLM_PID=$!
echo "vLLM PID: $VLLM_PID"
echo "Waiting 60s for vLLM to load model..."
sleep 60

if ! kill -0 $VLLM_PID 2>/dev/null; then
    echo "FAIL: vLLM server died. Last 20 lines:"
    tail -20 $LOG_DIR/vllm_server.log
    exit 1
fi
echo "vLLM server is running."

echo "=== Step 3: Send test requests ==="
LONG_PROMPT="The history of artificial intelligence (AI) began in antiquity, with myths, stories and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The seeds of modern AI were planted by philosophers who attempted to describe the process of human thinking as the mechanical manipulation of symbols. This work culminated in the invention of the programmable digital computer in the 1940s, a machine based on the abstract essence of mathematical reasoning. This device and the ideas behind it inspired a handful of scientists to begin seriously discussing the possibility of building an electronic brain. The field of AI research was founded at a workshop held on the campus of Dartmouth College, USA during the summer of 1956. Those who attended would become the leaders of AI research for decades. Many of them predicted that a machine as intelligent as a human being would exist in no more than a generation, and they were given millions of dollars to make this vision come true. Eventually, it became obvious that commercial developers and researchers had grossly underestimated the difficulty of the project. In 1974, in response to the criticism from James Lighthill and ongoing pressure from congress, the U.S. and British Governments stopped funding undirected research into artificial intelligence, and the difficult years that followed became known as an AI winter. Seven years later, a visionary initiative by the Japanese Government inspired governments and industry to provide AI with billions of dollars, but by the late 1980s the investors became disillusioned and withdrew funding again. Investment and interest in AI boomed in the first decades of the 21st century when machine learning was successfully applied to many problems in academia and industry due to the availability of large amounts of data and fast computers. The achievements of deep learning in the 2010s, particularly breakthroughs in areas like image recognition, natural language processing, and game playing, led to a renewed surge of interest in AI technologies. Transformer architectures, introduced in 2017, revolutionized the field and enabled the development of large language models that could generate human-like text, answer questions, write code, and engage in sophisticated reasoning tasks that were previously thought to be exclusively human capabilities."

echo "--- Request 1 (store to L2) ---"
RESP1=$(curl -s http://localhost:8001/v1/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"vllm_cpu_offload\", \"prompt\": \"$LONG_PROMPT\", \"max_tokens\": 10, \"temperature\": 0}")
echo "$RESP1" | python3 -m json.tool
sleep 3

echo ""
echo "--- L2 files after request 1 ---"
ls -la /tmp/test_fs_native_l2/ 2>/dev/null || echo "(empty)"
FILE_COUNT=$(ls /tmp/test_fs_native_l2/*.data 2>/dev/null | wc -l)
echo "Data files: $FILE_COUNT"

echo ""
echo "--- Request 2 (should load from L2/L1) ---"
RESP2=$(curl -s http://localhost:8001/v1/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"vllm_cpu_offload\", \"prompt\": \"$LONG_PROMPT\", \"max_tokens\": 10, \"temperature\": 0}")
echo "$RESP2" | python3 -m json.tool
sleep 2

echo ""
echo "=== Step 4: Check LMCache logs ==="
echo "--- FS native adapter creation ---"
grep -i "Created FS native" $LOG_DIR/lmcache_server.log || echo "(not found)"
echo "--- read_ahead in logs ---"
grep -i "read_ahead" $LOG_DIR/lmcache_server.log || echo "(not found)"
echo "--- L2 store/load operations ---"
grep -iE "(stored|loaded|L2|l2_store|l2_load)" $LOG_DIR/lmcache_server.log | tail -20 || echo "(not found)"

echo ""
echo "=== Step 5: Cleanup ==="
kill $VLLM_PID 2>/dev/null || true
kill $LMCACHE_PID 2>/dev/null || true
sleep 2

echo ""
echo "=== TEST COMPLETE ==="
echo "Data files in /tmp/test_fs_native_l2:"
ls -lh /tmp/test_fs_native_l2/*.data 2>/dev/null || echo "(none)"

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong requested a review from sammshen March 14, 2026 15:46
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LMCache's storage capabilities by introducing a high-performance native C++ filesystem connector. This new connector allows for efficient direct file system access, supporting features like O_DIRECT and read-ahead for optimized I/O. It also improves the robustness of batch operations by enabling per-key error handling for data retrieval and standardizing the way results are reported for both existence checks and data loads.

Highlights

  • Native Filesystem Connector: Introduced a new native C++ filesystem connector (FSConnector) for LMCache, enabling direct file system interaction for KV cache storage.
  • Per-Key Error Tolerance: Implemented per-key error tolerance for batch GET operations, allowing individual key failures without halting the entire batch and recording success/failure for each key.
  • Unified Batch Results: Refactored batch operation results to use a single per_key_results vector for both BATCH_TILE_EXISTS and BATCH_TILE_GET operations, simplifying result handling.
  • Python Integration: Added Python bindings and a new FSNativeL2AdapterConfig to integrate the native filesystem connector seamlessly into the Python LMCache framework.
  • Build System Update: Updated the setup.py script to include the new filesystem connector in the build process for both CUDA and ROCm environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • csrc/storage_backends/connector_base.h
    • Pre-allocated per_key_results for BATCH_TILE_GET operations.
    • Replaced exists_results with per_key_results for BATCH_TILE_EXISTS.
    • Added start_idx to tile_req for accurate result indexing in batch operations.
    • Implemented try-catch block within do_single_get to provide per-key error tolerance and record success/failure.
    • Updated completion handling to move per_key_results for both BATCH_TILE_EXISTS and BATCH_TILE_GET.
  • csrc/storage_backends/connector_types.h
    • Renamed exists_results to per_key_results and updated its description to clarify its use for both EXISTS and GET operations, supporting per-key success/failure.
  • csrc/storage_backends/fs/connector.cpp
    • Added implementation for the FSConnector class, including file I/O operations (do_single_get, do_single_set, do_single_exists).
    • Implemented helper functions replace_all and key_to_filename for path manipulation.
    • Included write_all and read_all static helpers for robust file read/write operations.
    • Integrated O_DIRECT support and read-ahead functionality for optimized disk access.
    • Managed directory creation for base and temporary paths during initialization.
  • csrc/storage_backends/fs/connector.h
    • Defined the FSConnector class interface, inheriting from ConnectorBase.
    • Introduced WorkerFSConn struct to hold per-worker connection state, including base path, temporary directory, O_DIRECT flag, disk block size, and read-ahead size.
    • Declared key encoding constants (KEY_SEP, PATH_SLASH_REPLACEMENT, FILE_EXT, TMP_EXT) for consistent file naming.
  • csrc/storage_backends/fs/pybind.cpp
    • Added Python bindings for the FSConnector class using pybind11, exposing its constructor and methods to Python.
  • lmcache/v1/distributed/l2_adapters/native_connector_l2_adapter.py
    • Modified _demux_loop to process result_bools for _OP_LOAD, enabling per-key load status, with a fallback for connectors that do not provide this detail.
    • Introduced FSNativeL2AdapterConfig for configuring the new filesystem connector from Python, including parameters like base_path, num_workers, use_odirect, and read_ahead_size.
    • Added _create_fs_native_l2_adapter function to instantiate the native C++ FSConnector client.
    • Registered the new fs_native L2 adapter type and factory for dynamic loading.
  • setup.py
    • Added fs_sources list to include the new C++ source files for the filesystem connector.
    • Configured lmcache.lmcache_fs as a new CppExtension in ext_modules for both CUDA and ROCm build paths, ensuring the native connector is compiled and linked.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a native C++ filesystem connector, which is a significant and well-implemented feature. The C++ implementation is robust, using modern features like std::filesystem and employing best practices such as atomic writes via temporary files and rename. The error handling, especially the per-key error tolerance for batch get operations, is a great addition. The Python bindings and integration into the L2 adapter framework are also clean and follow existing patterns. I have one minor suggestion to remove some unused code. Overall, this is an excellent contribution.

Comment thread csrc/storage_backends/fs/connector.h Outdated
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Comment thread csrc/storage_backends/connector_base.h Outdated
// Per-key error tolerance: record failure
// but continue processing remaining keys
req.batch->per_key_results[req.start_idx + i] = 0;
fprintf(stderr, "[LMCache FS GET] key %s failed: %s\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should not be FS specific stuff in teh connector base

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maobaolong could you address this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to remove the FS here?
I have removed it maobaolong@95e1ce4

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing

try {
do_single_get(conn, req.keys[i], req.buf_ptrs[i],
req.buf_lens[i], req.batch_chunk_num_bytes);
req.batch->per_key_results[req.start_idx + i] = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 is false or true, leave comment here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong requested a review from sammshen March 18, 2026 11:55
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @maobaolong Please address the comment in teh connector base

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sammshen sammshen requested a review from deng451e March 20, 2026 01:03
Copy link
Copy Markdown
Collaborator

@deng451e deng451e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maobaolong maobaolong enabled auto-merge (squash) March 24, 2026 08:00
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026
@maobaolong maobaolong merged commit 729ff73 into LMCache:dev Mar 24, 2026
27 checks passed
maobaolong added a commit to maobaolong/LMCache that referenced this pull request Mar 25, 2026
* Add native FS L2 connector for MP mode

* Introduce native fs connector

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove the unused path_buf

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add comment and fix output

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
* Add native FS L2 connector for MP mode

* Introduce native fs connector

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove the unused path_buf

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add comment and fix output

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
* Add native FS L2 connector for MP mode

* Introduce native fs connector

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove the unused path_buf

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add comment and fix output

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add native FS L2 connector for MP mode

* Introduce native fs connector

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove the unused path_buf

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add comment and fix output

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
* Add native FS L2 connector for MP mode

* Introduce native fs connector

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove the unused path_buf

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add comment and fix output

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Samuel Shen <slshen@tensormesh.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants