Skip to content

[Remote Connector]: cpp multi-threaded RESP#2541

Merged
maobaolong merged 32 commits intoLMCache:devfrom
sammshen:redis-cpp-client
Feb 24, 2026
Merged

[Remote Connector]: cpp multi-threaded RESP#2541
maobaolong merged 32 commits intoLMCache:devfrom
sammshen:redis-cpp-client

Conversation

@sammshen
Copy link
Copy Markdown
Contributor

@sammshen sammshen commented Feb 4, 2026

@ApostaC as discussed, the interface is agnostic to asyncio.Future versus concurrent.future.Future so can be folded in as a Controller inside MP Mode.

This cpp integration has brought the raw client throughput to 10 GB / s (see the examples/ folder addition) from previous 4-5 GB/s limit with python (optimizations are thoroughly documented in /csrc/redis/resp.cpp). LMCache integration now stands at ~7GB/s (compared to previous 0.5 GB/s)

Samuel Shen added 10 commits February 3, 2026 19:05
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen sammshen changed the title [Remote Connector]: multi-threaded RESP remote connector [Remote Connector]: cpp multi-threaded RESP Feb 4, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @sammshen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a substantial performance upgrade for LMCache's remote key-value store capabilities by implementing a new multi-threaded C++ client for the Redis RESP protocol. This client is exposed to Python via efficient bindings that minimize overhead, allowing for non-blocking operations and significantly boosting data transfer rates. The integration provides LMCache with a high-throughput remote connector, although further work on multiprocessing mode is identified to fully leverage the new client's potential within LMCache.

Highlights

  • New Multi-threaded C++ Redis RESP Client: Introduced a high-performance C++ client for Redis using the RESP protocol, designed for multi-threaded operations to maximize throughput.
  • Efficient Python Bindings with eventfd: Provided Python bindings for the C++ client, leveraging eventfd for non-blocking I/O and efficient asynchronous communication between C++ worker threads and the Python event loop, minimizing overhead.
  • Enhanced Data Transfer Throughput: Achieved a significant increase in raw client throughput to 10 GB/s, doubling previous Python-limited performance, through C++ optimizations like scatter/gather I/O and zero-copy.
  • LMCache Integration: Integrated the new RESP client as a dedicated RESPConnector within the LMCache system, enabling its use as a high-throughput remote KV cache backend.
  • Codebase Clarity and Consistency: Refactored naming conventions from full_chunk_size to full_chunk_bytes across several Python files for improved clarity and consistency in byte-related measurements.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • csrc/pybind.cpp
    • Removed a trailing newline character.
  • csrc/redis/pybind.cpp
    • Added new file to define Python bindings for the MultiRESPClient class, exposing its methods to Python.
  • csrc/redis/resp.cpp
    • Added new file implementing the core MultiRESPClient logic, including multi-threaded request processing, socket communication, and RESP protocol handling with performance optimizations.
  • csrc/redis/resp.h
    • Added new header file declaring the MultiRESPClient class and related structures.
  • examples/kv_cache_reuse/remote_backends/resp/README.md
    • Added new documentation detailing the LMCacheRedis KV Connector, its optimizations, quickstart guide, and usage examples.
  • examples/kv_cache_reuse/remote_backends/resp/example_resp_client.py
    • Added new Python example and benchmark script demonstrating the usage and performance of the RESPClient wrapper for the C++ backend.
  • examples/kv_cache_reuse/remote_backends/resp/resp.yaml
    • Added new YAML configuration file for the RESP connector, specifying parameters like chunk_size and resp_num_threads.
  • lmcache/v1/cache_engine.py
    • Updated store and retrieve methods to record end_time for more accurate performance metrics.
  • lmcache/v1/storage_backend/connector/init.py
    • Introduced RESPConnectorAdapter to enable dynamic loading and creation of the new RESPConnector based on the URL schema.
  • lmcache/v1/storage_backend/connector/base_connector.py
    • Renamed full_chunk_size to full_chunk_bytes in BaseConnector and its methods for better semantic clarity.
  • lmcache/v1/storage_backend/connector/redis_adapter.py
    • Added RESPConnectorAdapter class to handle the resp:// URL scheme, validating configuration and instantiating RESPConnector.
  • lmcache/v1/storage_backend/connector/redis_connector.py
    • Implemented RESPClient (Python wrapper for LMCacheRedisClient) and RESPConnector to integrate the C++ RESP client into LMCache's remote storage backend, supporting async/sync batch operations.
  • lmcache/v1/storage_backend/connector/s3_connector.py
    • Updated s3_part_size to use the new full_chunk_bytes attribute.
  • lmcache/v1/storage_backend/local_cpu_backend.py
    • Renamed get_full_chunk_size to get_full_chunk_bytes and updated its internal usage.
  • lmcache/v1/storage_backend/pd_backend.py
    • Renamed full_chunk_size to full_chunk_bytes in PDBackend and its related methods.
  • setup.py
    • Modified to include the new lmcache.lmcache_redis C++ extension in the build process for both CUDA and ROCm environments.
Activity
  • The author, sammshen, has implemented a multi-threaded C++ RESP client and integrated it into the LMCache system.
  • Performance benchmarks show a significant increase in raw client throughput (10 GB/s) compared to previous Python-only implementations (4-5 GB/s).
  • The author notes that LMCache integration currently caps at ~2GB/s, suggesting that further work on multiprocessing (MP) mode is required to fully utilize the C++ client's speed.
  • The PR description indicates communication with @ApostaC regarding the interface's flexibility for asyncio.Future versus concurrent.future.Future and its potential integration into MP Mode.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a high-performance, multi-threaded C++ RESP client to improve throughput for remote KV cache operations, utilizing modern C++ features for concurrency and an eventfd-based mechanism for asyncio integration. It also includes Python bindings, a new RESPConnector for the LMCache framework, and related refactoring. While the overall design is solid, a security audit identified several critical and medium severity issues, including a potential division-by-zero crash in batch operations, a Python GIL violation during object destruction in background threads, a race condition in the asynchronous future registration logic, and the lack of socket timeouts in worker threads which could lead to indefinite hangs. Additionally, a critical issue was found in the RESP protocol parsing that could lead to connection failures (e.g., key not found). Further improvements are needed in error handling, performance, and code clarity.

Comment thread csrc/redis/resp.cpp
Comment on lines +315 to +349
static void do_get_into(WorkerConn& conn, const std::string& key, void* buf,
size_t len) {
// we only read exactly chunk_bytes bytes (save_unfull_chunk must be off)
if (len != conn.chunk_bytes) {
throw std::runtime_error("buffer size mismatch");
}

// build key header (can't pre-allocate)
std::string key_header = conn.make_key_header(key);

// send GET cmd
// iovec let's us combine pre-built parts and dynamic strings
conn.send_multipart({{conn.get_prefix.data(), conn.get_prefix.size()},
{key_header.data(), key_header.size()}});

// parse in 3 steps

// 1. parse the size header and validate
std::vector<char> size_header_buf(conn.size_header.size());
conn.recv_exactly(size_header_buf.data(), size_header_buf.size());
if (std::memcmp(size_header_buf.data(), conn.size_header.data(),
conn.size_header.size()) != 0) {
throw std::runtime_error("GET: size header mismatch");
}

// 2. parse the payload (KV Cache)
conn.recv_exactly(buf, len);

// 3. parse the trailer and validate
char trailer[WorkerConn::crlf_len];
conn.recv_exactly(trailer, WorkerConn::crlf_len);
if (std::memcmp(trailer, conn.crlf, WorkerConn::crlf_len) != 0) {
throw std::runtime_error("GET: trailer mismatch");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current RESP parsing logic is very fragile as it assumes a fixed response structure for GET operations. It doesn't handle Redis error responses (e.g., -ERR ...) or nil responses for non-existent keys ($-1\r\n). If Redis returns an unexpected response, the client will likely desynchronize from the TCP stream, leading to cascading failures for all subsequent operations on that connection.

To make this more robust, the parsing logic should inspect the first byte of the response to determine its type (e.g., + for simple string, $ for bulk string, - for error, : for integer) and parse the rest of the response accordingly. This is crucial for handling cases like a key not being found during a GET operation.

A similar issue exists in do_set_from and do_exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is reasonable. The no parsing optimization might be a little aggressive. Parsing only the first byte avoid O(n) equality check.

Comment thread csrc/redis/resp.cpp
Comment thread csrc/redis/resp.cpp
Comment thread csrc/redis/resp.cpp
Comment thread csrc/redis/resp.cpp
Comment on lines +112 to +113
fut.set_exception(exc)
self._pending.clear()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In RESPClient, the submit_* methods call the native client and then register a future using the returned future_id. There is a race condition where the native worker thread could complete the operation and signal the event_fd before the future is registered in self._pending. If this happens, _on_ready will ignore the completion because the future_id is not yet in self._pending, and the future will never be resolved, leading to a hang.

Recommendation: The future_id should be known or reserved before the task is submitted to the native worker, allowing the future to be registered before the worker can possibly complete the task.

Comment thread csrc/redis/resp.cpp Outdated
Comment thread csrc/redis/resp.cpp Outdated
Comment on lines +301 to +305
std::string make_key_header(const std::string& key) {
std::ostringstream oss;
oss << "$" << key.size() << "\r\n" << key << "\r\n";
return oss.str();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The make_key_header function uses std::ostringstream to construct the key header string for each request. While convenient, this can introduce performance overhead due to repeated dynamic memory allocations in a performance-critical path.

For a high-throughput client, consider optimizing this by using a pre-allocated buffer (e.g., thread-local) with snprintf, or a more efficient string building mechanism to avoid allocations on each call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make_key_header function uses std::ostringstream to construct the key header string for each request. While convenient, this can introduce performance overhead due to repeated dynamic memory allocations in a performance-critical path.

For a high-throughput client, consider optimizing this by using a pre-allocated buffer (e.g., thread-local) with snprintf, or a more efficient string building mechanism to avoid allocations on each call.

I agree with Gemini's opinion here. (But only when we can have an upper-bound on the key's size)


# sync and asyncio wrapper around the LMCacheRedisClient
# the pybinding interface allows us to work with both sync and async code
class RESPClient:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The RESPClient class in this example file is a duplicate of the one defined in lmcache/v1/storage_backend/connector/redis_connector.py. This code duplication makes maintenance harder. The example should import the RESPClient from the library itself rather than redefining it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread lmcache/v1/storage_backend/connector/redis_adapter.py
@sammshen
Copy link
Copy Markdown
Contributor Author

sammshen commented Feb 4, 2026

@hickeyma

do you have any ideas why the github runners cannot run the free disk space action?

Prepare all required actions
Run ./.github/actions/free-disk-space
Run df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       145G   53G   92G  37% /
tmpfs           7.9G   84K  7.9G   1% /dev/shm
tmpfs           3.2G 1008K  3.2G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
efivarfs        128M   32K  128M   1% /sys/firmware/efi/efivars
/dev/sda16      881M   62M  758M   8% /boot
/dev/sda15      105M  6.2M   99M   6% /boot/efi
tmpfs           1.6G   12K  1.6G   1% /run/user/1001
Run sudo docker rmi "$(docker image ls -aq)" >/dev/null 2>&1 || true
Error: Process completed with exit code 100.

Update: should be fixed by #2543

Comment thread csrc/redis/pybind.cpp
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: whether to have "pin"/"lock" semantics

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to be in RESP right now

Comment thread csrc/redis/resp.cpp
tile_req.buf_owners.push_back(mv);
}

enqueue_request(std::move(tile_req));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check whether std::move and auto move constructor can work on all the fields inside Request struct (especially py::object)

ApostaC
ApostaC previously requested changes Feb 8, 2026
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass. The major things

  1. Try to avoid having python-related stuff in resp.h and resp.cpp
  2. only keep the "batched" primitives

Comment thread lmcache/v1/cache_engine.py Outdated
self.storage_manager.remove(key)
memory_obj.ref_count_down()

retrieve_stats.end_time = time.perf_counter()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to check the throughput. this has been broken for a while now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can separate a new PR for this but this PR needs it as well to test so it will create redundant changes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename it as "full_chunk_size_bytes"? It's clearer because full_chunk_bytes sounds like a data chunk of bytes.

Btw, this name change can be separated to another PR to keep this PR clean.

Comment thread csrc/redis/resp.h
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid having py-related stuff in resp.h and resp.cpp? Let's try to put all the python-related types and operations into pybind.cpp.

FYI, py::list can be directly (and automatically) converted to std::vector if we include pybind11/stl.h (https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html)

We can also manually convert py::memoryview to std::vector as well (or maybe there are better practices).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing this, we can also have more freedom to release GIL. See: https://pybind11.readthedocs.io/en/stable/advanced/misc.html#global-interpreter-lock-gil

The Python C API dictates that the Global Interpreter Lock (GIL) must always be held by the current thread to safely access Python objects. As a result, when Python calls into C++ via pybind11 the GIL must be held, and pybind11 will never implicitly release the GIL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes GIL only needed for py::object and py::handle so this will clarify where GIL can be released (do scoped release ASAP in every cpp function)

Comment thread csrc/redis/pybind.cpp Outdated
Comment on lines +11 to +26
.def(py::init<std::string, int, size_t, int>(), py::arg("host"),
py::arg("port"), py::arg("chunk_bytes"), py::arg("num_workers"))
.def("event_fd", &MultiRESPClient::event_fd)
.def("submit_get", &MultiRESPClient::submit_get, py::arg("key"),
py::arg("memoryview"))
.def("submit_set", &MultiRESPClient::submit_set, py::arg("key"),
py::arg("memoryview"))
.def("submit_exists", &MultiRESPClient::submit_exists, py::arg("key"))
.def("submit_batch_get", &MultiRESPClient::submit_batch_get,
py::arg("keys"), py::arg("memoryviews"))
.def("submit_batch_set", &MultiRESPClient::submit_batch_set,
py::arg("keys"), py::arg("memoryviews"))
.def("submit_batch_exists", &MultiRESPClient::submit_batch_exists,
py::arg("keys"))
.def("drain_completions", &MultiRESPClient::drain_completions)
.def("close", &MultiRESPClient::close);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we removed all the py-related stuff from resp.h and resp.cpp, we could have a few very simple lambda expressions to do the type conversion here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread csrc/redis/resp.h Outdated
Comment on lines +34 to +36
uint64_t submit_get(const std::string& key, py::memoryview mv);
uint64_t submit_set(const std::string& key, py::memoryview mv);
uint64_t submit_exists(const std::string& key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can force all the operations to be "batched" and remove these three functions. It can simplify the cpp implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread csrc/redis/resp.cpp Outdated
Comment on lines +160 to +174
this->host = host;
this->port = port;
this->chunk_bytes = chunk_bytes;

// reusable headers (for scatter/gather)

{
std::ostringstream oss;
oss << "$" << chunk_bytes << "\r\n";
size_header = oss.str();
}

get_prefix = "*2\r\n$3\r\nGET\r\n";
set_prefix = "*3\r\n$3\r\nSET\r\n";
exists_prefix = "*2\r\n$6\r\nEXISTS\r\n";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the code here can be put into the constructor.

Also, the get_prefix, set_prefix, and exists_prefix can be constexpr std::string_view IIUC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread csrc/redis/resp.cpp Outdated
Comment on lines +301 to +305
std::string make_key_header(const std::string& key) {
std::ostringstream oss;
oss << "$" << key.size() << "\r\n" << key << "\r\n";
return oss.str();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make_key_header function uses std::ostringstream to construct the key header string for each request. While convenient, this can introduce performance overhead due to repeated dynamic memory allocations in a performance-critical path.

For a high-throughput client, consider optimizing this by using a pre-allocated buffer (e.g., thread-local) with snprintf, or a more efficient string building mechanism to avoid allocations on each call.

I agree with Gemini's opinion here. (But only when we can have an upper-bound on the key's size)

Comment thread csrc/redis/resp.cpp Outdated
Comment on lines +408 to +410
MultiRESPClient::MultiRESPClient(std::string host, int port, size_t chunk_bytes,
int num_workers)
: host_(std::move(host)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to have std::string host and host_(std::move(host)) instead of using const std::string &host) with host_(host)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since host will always be stored in the object (it's a "sink"), this pattern lets the compiler choose the optimal path based on what the caller provides, without needing overloads.
https://stackoverflow.com/questions/21035417/is-the-pass-by-value-and-then-move-construct-a-bad-idiom

Comment thread csrc/redis/resp.cpp Outdated
// these will be the keys and buffers that this Request is responsible for
std::vector<std::string> keys;
std::vector<void*> buf_ptrs;
std::vector<py::object> buf_owners;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert here, but I do want to get rid of buf_owners (and buf_owner) here.

From pybind11's doc, I do see we can do inc_ref and dec_ref on the python objects, including py::memoryview (see https://pybind11.readthedocs.io/en/stable/reference.html#without-reference-counting)

Not sure what the best practice is, but let's investigate and learn.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we need to be careful about is that we need to do dec_ref when the failure happens. Since the tasks are executed asynchronously, I'm not sure what's the best place to do the dec_ref.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the best place is the destructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of buf_owners since we can make the assumption in LMCache use case that the memory objects will not be garbage collected

Comment thread csrc/redis/resp.cpp
void MultiRESPClient::enqueue_request(Request&& req) {
{
std::lock_guard<std::mutex> lk(req_mu_);
requests_.push(std::move(req));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably do requests_.emplace(req) here (see https://en.cppreference.com/w/cpp/container/queue/emplace.html)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the req struct already exists (and is populated dynamically) so this does nothing? unless the cpp compiler is a super genius and can somehow figure this out.

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sammshen, Thanks for the hard work.
It seems very similar like what I currently do right now.
Left few comments and will revisit tomorrow afternoon.

Comment thread csrc/redis/resp.cpp Outdated
std::string first_error;

// for batch exists, store the boolean results (0/1)
// IMPORTANT: not vector<bool> due to concurrent write data race
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ApostaC From my understanding,
vector is [b0,b1,b2 ...,b7] in 1 byte.
and vector can save like [b0],[b1],[b2].
So, we can not guarantee the atomic byte access for 8 continuous bool value.

Is this what you mean? @sammshen

Comment thread csrc/redis/resp.cpp
Comment thread csrc/redis/resp.cpp Outdated
// these will be the keys and buffers that this Request is responsible for
std::vector<std::string> keys;
std::vector<void*> buf_ptrs;
std::vector<py::object> buf_owners;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the best place is the destructor.

Samuel Shen added 9 commits February 10, 2026 02:11
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
@sammshen sammshen added the full Run comprehensive tests on this PR label Feb 10, 2026
Samuel Shen added 4 commits February 10, 2026 07:00
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.
Could you help me to understand the info.size issue question I left?
Let me test it tomorrow.

Comment thread csrc/redis/pybind.cpp Outdated
Comment thread examples/kv_cache_reuse/remote_backends/resp/resp.yaml Outdated

# future_id -> (Future, op_name)
# we support both types of futures since we only their basic interface
self._pending: Dict[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for preventing the python memory gc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to find the python future to which a completion belongs to from the cpp side.

cpp wakes up python via eventfd and outputs (future_id, ok, result_bool, error, result_bools) and python side needs to populate the future indexed by the future_id

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then it does prevent GC of python future object not memoryview.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility that completion drained before _pending insert?
If then we need another types of structure that Tuple[pending_future structure, str, Tuple[memoryview, ...]] types of thing for safe resolving. but It seems this case didnt happened in my test case.
So, just left as a concern in here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not happen because the callback is not preemptive at the python level and the submission is non-blocking!

Samuel Shen added 5 commits February 11, 2026 05:53
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.
I checked that functionally working.
but I left on concern for memoryview gc issue.
We can take care it when it happened.


# future_id -> (Future, op_name)
# we support both types of futures since we only their basic interface
self._pending: Dict[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then it does prevent GC of python future object not memoryview.


# future_id -> (Future, op_name)
# we support both types of futures since we only their basic interface
self._pending: Dict[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility that completion drained before _pending insert?
If then we need another types of structure that Tuple[pending_future structure, str, Tuple[memoryview, ...]] types of thing for safe resolving. but It seems this case didnt happened in my test case.
So, just left as a concern in here.

@sammshen
Copy link
Copy Markdown
Contributor Author

Addressed correctness concern @DongDongJu . Thanks for your review!

@sammshen
Copy link
Copy Markdown
Contributor Author

Potential future ideas:
autodiscovery tool for ideal chunk size (based on model and running tests on local redis)

@sammshen
Copy link
Copy Markdown
Contributor Author

can follow #2607 to extract out some interfaces (batching, GIL release patterns etc.)

@maobaolong
Copy link
Copy Markdown
Collaborator

@sammshen Would you like to implement a fs connector also?

@maobaolong
Copy link
Copy Markdown
Collaborator

maobaolong commented Feb 24, 2026

BTW, as discussed with @sammshen and @DongDongJu , #2607 may be a reference.

We can do an abstraction to native remote backend.

Clipboard_Screenshot_1771899931

@maobaolong maobaolong enabled auto-merge (squash) February 24, 2026 06:04
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this great work!

@sammshen sammshen requested a review from ApostaC February 24, 2026 06:52
@sammshen sammshen dismissed ApostaC’s stale review February 24, 2026 06:53

All requested changes addressed

@maobaolong maobaolong merged commit bab09df into LMCache:dev Feb 24, 2026
24 checks passed
sammshen added a commit to sammshen/LMCache that referenced this pull request Mar 1, 2026
* initial commit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* initial pybinding

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix race condition

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* remove resp.h import in csrc/pybind.cpp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fixing installation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add batched exists

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add working example

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* batched_contains is sync

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix cache engine time logging

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address cpp side comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix benchmarking test script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* create single resp client file

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* dynamic chunk_bytes

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* nit renaming

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* readme nit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix rust raw block backend

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix non CUDA UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* revert test.yml

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* delete **.cpp and **.h path for non cuda UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add RESP unit tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add __init__.py to tests/v1/storage_backend/

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add fixture for resp tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT with authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* found golden payload size

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add no build isolation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
hlin99 pushed a commit to hlin99/LMCache that referenced this pull request Mar 2, 2026
* initial commit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* initial pybinding

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix race condition

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* remove resp.h import in csrc/pybind.cpp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fixing installation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add batched exists

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add working example

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* batched_contains is sync

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix cache engine time logging

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address cpp side comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix benchmarking test script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* create single resp client file

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* dynamic chunk_bytes

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* nit renaming

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* readme nit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix rust raw block backend

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix non CUDA UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* revert test.yml

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* delete **.cpp and **.h path for non cuda UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add RESP unit tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add __init__.py to tests/v1/storage_backend/

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add fixture for resp tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT with authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* found golden payload size

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add no build isolation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
* initial commit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* initial pybinding

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix race condition

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* remove resp.h import in csrc/pybind.cpp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fixing installation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add batched exists

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add working example

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* batched_contains is sync

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix cache engine time logging

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address cpp side comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix benchmarking test script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* create single resp client file

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* dynamic chunk_bytes

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* nit renaming

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* readme nit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix rust raw block backend

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix non CUDA UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* revert test.yml

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* delete **.cpp and **.h path for non cuda UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add RESP unit tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add __init__.py to tests/v1/storage_backend/

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add fixture for resp tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT with authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* found golden payload size

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add no build isolation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
* initial commit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* initial pybinding

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix race condition

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* remove resp.h import in csrc/pybind.cpp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fixing installation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add batched exists

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add working example

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* batched_contains is sync

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix cache engine time logging

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address cpp side comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix benchmarking test script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* create single resp client file

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* dynamic chunk_bytes

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* nit renaming

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* readme nit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix rust raw block backend

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix non CUDA UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* revert test.yml

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* delete **.cpp and **.h path for non cuda UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add RESP unit tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add __init__.py to tests/v1/storage_backend/

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add fixture for resp tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT with authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* found golden payload size

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add no build isolation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
* initial commit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* initial pybinding

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix race condition

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* remove resp.h import in csrc/pybind.cpp

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fixing installation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add batched exists

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add working example

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* batched_contains is sync

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix cache engine time logging

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* address cpp side comments

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix benchmarking test script

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* create single resp client file

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* dynamic chunk_bytes

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* nit renaming

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* readme nit

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix rust raw block backend

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix non CUDA UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* revert test.yml

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* delete **.cpp and **.h path for non cuda UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add RESP unit tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add __init__.py to tests/v1/storage_backend/

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add fixture for resp tests

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* fix UT with authentication

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* found golden payload size

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

* add no build isolation

Signed-off-by: Samuel Shen <slshen@uchciago.edu>

---------

Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants