Introduce Redis Over RDMA protocol #12217

pizhenwei · 2023-05-23T09:43:12Z

RDMA is the abbreviation of remote direct memory access. It is a technology that enables computers in a network to exchange data in the main memory without involving the processor, cache, or operating system of either computer. This means RDMA has a better performance than TCP, the test results show Redis Over RDMA has a ~2.5X QPS and lower latency.

In recent years, RDMA gets popular in the data center, especially RoCE(RDMA over Converged Ethernet) architecture has been widely used.

Introduce Redis Over RDMA protocol as a new transport for Redis. For now, we defined 4 commands:

GetServerFeature & SetClientFeature: the two commands are used to negotiate features for further extension. There is no feature definition in this version. Flow control and multi-buffer may be supported in the future, this needs feature negotiation.
Keepalive
RegisterXferMemory: the heart to transfer the real payload.

The 'TX buffer' and 'RX buffer' are designed by RDMA remote memory with RDMA write/write with imm, it's similar to several mechanisms introduced by papers(but not same):

Socksdirect: datacenter sockets can be fast and compatible https://dl.acm.org/doi/10.1145/3341302.3342071
LITE Kernel RDMA Support for Datacenter Applications https://dl.acm.org/doi/abs/10.1145/3132747.3132762
FaRM: Fast Remote Memory https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf

With this version of protocol, we achieve goals:

a high performance design for Redis
fully support current Redis operations/commands
good compatibility for optimization in future

RDMA is the abbreviation of remote direct memory access. It is a technology that enables computers in a network to exchange data in the main memory without involving the processor, cache, or operating system of either computer. This means RDMA has a better performance than TCP, the test results show Redis Over RDMA has a ~2.5X QPS and lower latency. In recent years, RDMA gets popular in the data center, especially RoCE(RDMA over Converged Ethernet) architecture has been widely used. Introduce Redis Over RDMA protocol as a new transport for Redis. For now, we defined 4 commands: - GetServerFeature & SetClientFeature: the two commands are used to negotiate features for further extension. There is no feature definition in this version. Flow control and multi-buffer may be supported in the future, this needs feature negotiation. - Keepalive - RegisterXferMemory: the heart to transfer the real payload. The 'TX buffer' and 'RX buffer' are designed by RDMA remote memory with RDMA write/write with imm, it's similar to several mechanisms introduced by papers(but not same): - Socksdirect: datacenter sockets can be fast and compatible <https://dl.acm.org/doi/10.1145/3341302.3342071> - LITE Kernel RDMA Support for Datacenter Applications <https://dl.acm.org/doi/abs/10.1145/3132747.3132762> - FaRM: Fast Remote Memory <https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf> Co-authored-by: Xinhao Kong <xinhao.kong@duke.edu> Co-authored-by: Huaping Zhou <zhouhuaping.san@bytedance.com> Co-authored-by: zhuo jiang <jiangzhuo.cs@bytedance.com> Co-authored-by: Yiming Zhang <zhangyiming1201@bytedance.com> Co-authored-by: Jianxi Ye <jianxi.ye@bytedance.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

uvletter · 2023-05-31T12:21:19Z

Hello @pizhenwei , I'm very interested in you proposal, and the protocol seems very novel comparing to other rdma implementation like brpc and NVMe-oF. I also have some questions about the proposal, hoping I didn't miss anything.

What's the mapping between QP and client/server. If many QPs per client, it's somehow a little wasteful. But if one QP per client, it may need some multiplexing mechanism, since some Redis command is blocking, a request may block the following ones.
Registering the memory region in batch benefits the performance, but it also involves the problem of low memory utilization. Supposing a redis-server with 1000 clients, which's normal in production environment, and reserving 1 MB memory region for every client, then 1GB is used, it's a little wasteful for memory database.
What about the huge request/response case, that the size of request/response is larger than the preserved memory region size, e.g. a string larger than 1MB, will the protocol support interleaving write and register?

In general I think the protocol and implementation is neat and beautiful, it deserve more attention, for the sake of research/study or production.

pizhenwei · 2023-05-31T13:22:15Z

Hello @pizhenwei , I'm very interested in you proposal, and the protocol seems very novel comparing to other rdma implementation like brpc and NVMe-oF. I also have some questions about the proposal, hoping I didn't miss anything.

Hi, I tried to describe the deference and comparing to other protocols, please see link.

What's the mapping between QP and client/server. If many QPs per client, it's somehow a little wasteful. But if one QP per client, it may need some multiplexing mechanism, since some Redis command is blocking, a request may block the following ones.

Just imagine a QP(RC type) as a connection of TCP/TLS/Unix socket. If a client uses N sockets, it may need N QPs. (in fact, many sockets also waste resources in the kernel).

Registering the memory region in batch benefits the performance, but it also involves the problem of low memory utilization. Supposing a redis-server with 1000 clients, which's normal in production environment, and reserving 1 MB memory region for every client, then 1GB is used, it's a little wasteful for memory database.

Currently, only one memory region per QP is defined. And no strict memory region size limitation in protocol. As far as I can see in the engineering implementation:

the server side could use a configurable size of 'RX' memory region.(the more memory used, the higher performance got. so I guess a typical size will be found during testing the real workload). I have implemented a POC version, see PR.
the server side could use a small memory region as 'TX memory' against a large 'RX memory' of a client. (In my plan, but not implement currently)

What about the huge request/response case, that the size of request/response is larger than the preserved memory region size, e.g. a string larger than 1MB, will the protocol support interleaving write and register?

For example, transfer 10MB string over 1MB memory, this works like:

Register 1M memory, send 0-1MB, register 1M memory, sent 1MB-2MB ....

In general I think the protocol and implementation is neat and beautiful, it deserve more attention, for the sake of research/study or production.

Thanks!
This can be tested by repo

client:
use branch: feature-rdma-with-cli
make distclean; make BUILD_RDMA=yes -j

server:
use branch: feature-rdma
make distclean; make BUILD_RDMA=module -j

CLAassistant · 2024-03-24T23:07:54Z

All committers have signed the CLA.

pizhenwei · 2024-12-18T03:46:20Z

So sad, years of waiting have made me lose patience and confidence.

pizhenwei mentioned this pull request May 23, 2023

Introduce RDMA transport #11182

Closed

pizhenwei closed this Dec 18, 2024

pizhenwei deleted the redis-over-rdma-protocol branch April 15, 2025 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce Redis Over RDMA protocol #12217

Introduce Redis Over RDMA protocol #12217

Uh oh!

pizhenwei commented May 23, 2023 •

edited

Loading

Uh oh!

uvletter commented May 31, 2023

Uh oh!

pizhenwei commented May 31, 2023 •

edited

Loading

Uh oh!

CLAassistant commented Mar 24, 2024 •

edited

Loading

Uh oh!

pizhenwei commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Introduce Redis Over RDMA protocol #12217

Introduce Redis Over RDMA protocol #12217

Uh oh!

Conversation

pizhenwei commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uvletter commented May 31, 2023

Uh oh!

pizhenwei commented May 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pizhenwei commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pizhenwei commented May 23, 2023 •

edited

Loading

pizhenwei commented May 31, 2023 •

edited

Loading

CLAassistant commented Mar 24, 2024 •

edited

Loading