-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Fully abstract connection and make TLS dynamically loadable #9320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @pizhenwei, this looks like a big chunk of work - thank you! I only quickly looked at the code and plan to do so more in-depth in a few days, but I wanted to ping @redis/core-team about this. I think we need to agree on the supported use cases and scope here, and the kind of problems that can and cannot be solved by pluggable connection modules. There are many constraints that will make it difficult to support any arbitrary configuration:
Based on that, use cases that could reasonably be supported are:
I think we need to agree that supporting the "all channels" for arbitrary connection types is out of scope for now. Another issue I noticed with this PR is that it implements a dedicated mechanism for loading the module. I lean towards using the existing modules mechanism, even if this kind of third party modules have a surface area that's much bigger than the Redis Module API. Would be happy to hear other thoughts. |
|
@yossigo just to be clear, you mean that for client connections we can support multiple plugins, even working side by side, but these won't (at least in this stage) integrate with cluster bus and replication channels. Regarding the module loading interface, i actually believe this mechanism should be completely separate from the current Redis Modules one, i don't wanna see module capabilities or limitations get mixed with the capabilities and limitations of this mechanism. |
Sure, I'm looking forward to seeing it. |
yossigo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pizhenwei I've started reviewing the code and made a few comments. My main input is we need to break down this PR into several phases to make it easier to handle.
I believe the first phase should be about connection API change/cleanup only - fully encapsulate ConnectionType, only use generic connCreate, connAccept, etc. Leave everything in connection.c to avoid big diffs due to code reorganization at this point.
At this point I'd also want to be sure we don't introduce any unexpected performance regressions.
The next stages would be --
- Add
tls-module-pathand the ability to optionally build TLS support separately (i.e.BUILD_TLS=yes|no|ext). Maybe at this point already consider a more genericload-extensionmechanism (the term "module" is used for Redis Modules, we don't want to confuse the two). - Consider if/how additional arbitrary connection types can be supported.
- Finalize code reorganization.
@oranagra Do you have any other/additional thoughts?
|
@yossigo it makes sense to split it into several stages, and review each separately, but then i may be worried that we're not seeing the end goal properly when working on stage one, so i'm also willing to consider splitting to several commits, each with a clear separate purpose. not sure which approach is better in this case, i leave it to you to decide. |
71d7a47 to
d6e6b36
Compare
52fbccf to
1ecf098
Compare
Hi, @yossigo , I have fixed several problems as your comments. I'm a little confused about
Should I take the next step to separate connection.c into connection.c(abstract layer) and socket.c? |
1ecf098 to
3408b4b
Compare
3ce63db to
d4f3dab
Compare
|
@yossigo @oranagra Could you take a look at this PR? |
6f6e533 to
807744f
Compare
|
Hi, @yossigo @oranagra @yoav-steinberg I reworked this series, and separated a large patch into 9 small changes: I would appreciate it if you could review this series! |
|
|
@pizhenwei i'm really sorry for not responding sooner (wanted to wait till i have a chance to look at the code) |
Some guys contacted me to know more about the test result and the plan of Redis over RDMA, this makes me anxious to do this work. Sorry to let you feel pushy, that was not my intention. Thanks a lot! |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pizhenwei thank you for your patience, sorry it takes so long.
i wasn't part of the loop earlier and my review now is just through the peephole of the github diff, so i might be off or missing things.
i did review the commits in the PR one by one, and the refactoring work you did seem pretty solid (gave a few minor comments).
I also looked at a few commits at the top of https://github.com/pizhenwei/redis/commits/feature-rdma-v2 to see how you took it further and how RDMA and possibly other mediums can be integrated with minimal or no changes to redis, hoping for redis to be flexible enough to support various nuances and oddities each one brings.
I do remember you mentioned something about RDMA not being able to deliver a writable event, and i assume we might need to pull portions of ae.c into this extensions mechanism.
A few random notes about changes i see in the commits of that other branch:
refactory related:
- i don't think it's appropriate clientAcceptHandler (protected mode, error replies, module hooks, freeing clients, server stats) is in connection.c (should somehow move back to networking IMHO)
- same for acceptCommonHandler and clusterAcceptHandler
- i'm unsure about connSocketListenToPort. it seems a good fit for connection.c, but maybe we don't have to move it (unnecessary blamelog and review overhead).
general:
- maybe instead of "burning" the medium specific configs into redis, we should either use the new module api config system (#10285) to configure these.
there are two ways we can do that:
one is that these extensions can be both an extension, but also a module, and use the module API to register configs.
the other is that maybe we'll extract the mechanism that allows modules to register configs (which are applied after config file parsing instead of during), in a way that the extensions will be able to use them too. - maybe instead of the several hard coded
ifs in redis that init and setup (listen) the extensions, we can have a loop that handles all the loaded extensions, in which case instead of hard coded indexes to the connection type array, they'll be dynamic. i.e. each entry in the array has a pointer to the callback struct, but we don't know which index has which type (Socket and TLS may be an exception since we due to compatibility of previous state, we may have to refer to them by index).
f5148f4 to
71e936a
Compare
* Support BUILD_TLS=module to be loaded as a module via config file or
command line. e.g. redis-server --loadmodule redis-tls.so
* Updates to redismodule.h to allow it to be used side by side with
server.h by defining REDISMODULE_CORE_MODULE
* Changes to server.h, redismodule.h and module.c to avoid repeated
type declarations (gcc 4.8 doesn't like these)
* Add a mechanism for non-ABI neutral modules (ones who include
server.h) to refuse loading if they detect not being built together with
redis (release.c)
* Fix wrong signature of RedisModuleDefragFunc, this could break
compilation of a module, but not the ABI
* Move initialization of listeners in server.c to be after loading
the modules
* Config TLS after initialization of listeners
* Init cluster after initialization of listeners
* Add TLS module to CI
* Fix a test suite race conditions:
Now that the listeners are initialized later, it's not sufficient to
wait for the PID message in the log, we need to wait for the "Server
Initialized" message.
* Fix issues with moduleconfigs test as a result from start_server
waiting for "Server Initialized"
* Fix issues with modules/infra test as a result of an additional module
present
Notes about Sentinel:
Sentinel can't really rely on the tls module, since it uses hiredis to
initiate connections and depends on OpenSSL (won't be able to use any
other connection modules for that), so it was decided that when TLS is
built as a module, sentinel does not support TLS at all.
This means that it keeps using redis_tls_ctx and redis_tls_client_ctx directly.
Example code of config in redis-tls.so(may be use in the future):
RedisModuleString *tls_cfg = NULL;
void tlsInfo(RedisModuleInfoCtx *ctx, int for_crash_report) {
UNUSED(for_crash_report);
RedisModule_InfoAddSection(ctx, "");
RedisModule_InfoAddFieldLongLong(ctx, "var", 42);
}
int tlsCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc)
{
if (argc != 2) return RedisModule_WrongArity(ctx);
return RedisModule_ReplyWithString(ctx, argv[1]);
}
RedisModuleString *getStringConfigCommand(const char *name, void *privdata) {
REDISMODULE_NOT_USED(name);
REDISMODULE_NOT_USED(privdata);
return tls_cfg;
}
int setStringConfigCommand(const char *name, RedisModuleString *new, void *privdata, RedisModuleString **err) {
REDISMODULE_NOT_USED(name);
REDISMODULE_NOT_USED(err);
REDISMODULE_NOT_USED(privdata);
if (tls_cfg) RedisModule_FreeString(NULL, tls_cfg);
RedisModule_RetainString(NULL, new);
tls_cfg = new;
return REDISMODULE_OK;
}
int RedisModule_OnLoad(void *ctx, RedisModuleString **argv, int argc)
{
....
if (RedisModule_CreateCommand(ctx,"tls",tlsCommand,"",0,0,0) == REDISMODULE_ERR)
return REDISMODULE_ERR;
if (RedisModule_RegisterStringConfig(ctx, "cfg", "", REDISMODULE_CONFIG_DEFAULT, getStringConfigCommand, setStringConfigCommand, NULL, NULL) == REDISMODULE_ERR)
return REDISMODULE_ERR;
if (RedisModule_LoadConfigs(ctx) == REDISMODULE_ERR) {
if (tls_cfg) {
RedisModule_FreeString(ctx, tls_cfg);
tls_cfg = NULL;
}
return REDISMODULE_ERR;
}
...
}
Co-authored-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
3c8c3de to
4faddf1
Compare
|
@pizhenwei please have a look at my last batch of changes other than fixing the test failures with old TCL and the test framework race, i also made sure that when redis is built with tls as a module, it will not link against the tls libraries (only redis-cli, redis-benchmark, and the redis-tls.so will, but not redis-server and thereby redis-sentinel, which is the same binary). new CI run: https://github.com/oranagra/redis/actions/runs/2904590091 [edit] some test fail on [edit] triggered another CI batch for the ones that failed: https://github.com/oranagra/redis/actions/runs/2911398026 |
This looks good to me! Thanks! |
|
WOW... just slightly over a year after the PR was opened, and nearly 1.5 years after the original RDMA PR was submitted, this journey is finally complete (or maybe it's just it's first step?). |
The following [PR](redis#9320) instoduces initialization order changes. Now cluster is initialized after modules. This changes causes a crash if the module uses RM_Call inside the load function on cluster mode (the code will try to access `server.cluster` which at this point is NULL). To solve it, seperate cluster initialization into 2 phases: 1. Structure initialization that happened before the modules initialization 2. Listener initialization that happened after. Test was added to verify the fix.
PR #9320 introduces initialization order changes. Now cluster is initialized after modules. This changes causes a crash if the module uses RM_Call inside the load function on cluster mode (the code will try to access `server.cluster` which at this point is NULL). To solve it, separate cluster initialization into 2 phases: 1. Structure initialization that happened before the modules initialization 2. Listener initialization that happened after. Test was added to verify the fix.
Apparently we set `loglevel debug` for tls in spawn_instance. At the same time, in order to better distinguish the tests, change the name of `test-centos7-tls` to `test-centos7-tls-module`, change the name of `test-centos7-tls-no-tls` to `test-centos7-tls-module-no-tls-module`. Note that in `test-centos7-tls-module`, we did not pass `--tls-module` in sentinel test because it is not supported, see 4faddf1, added in redis#9320. So only test-ubuntu-tls fails in daily CI. The test was introduced in redis#11214.
Apparently we used to set `loglevel debug` for tls in spawn_instance. I.e. cluster and sentinel tests used to run with debug logging, only when tls mode was enabled. this was probably a leftover from when creating the tls mode tests. it cause a new test created for #11214 to fail in tls mode. At the same time, in order to better distinguish the tests, change the name of `test-centos7-tls` to `test-centos7-tls-module`, change the name of `test-centos7-tls-no-tls` to `test-centos7-tls-module-no-tls`. Note that in `test-centos7-tls-module`, we did not pass `--tls-module` in sentinel test because it is not supported, see 4faddf1, added in #9320. So only `test-ubuntu-tls` fails in daily CI. Co-authored-by: Oran Agra <oran@redislabs.com>
PR redis#9320 introduces initialization order changes. Now cluster is initialized after modules. This changes causes a crash if the module uses RM_Call inside the load function on cluster mode (the code will try to access `server.cluster` which at this point is NULL). To solve it, separate cluster initialization into 2 phases: 1. Structure initialization that happened before the modules initialization 2. Listener initialization that happened after. Test was added to verify the fix.
Apparently we used to set `loglevel debug` for tls in spawn_instance. I.e. cluster and sentinel tests used to run with debug logging, only when tls mode was enabled. this was probably a leftover from when creating the tls mode tests. it cause a new test created for redis#11214 to fail in tls mode. At the same time, in order to better distinguish the tests, change the name of `test-centos7-tls` to `test-centos7-tls-module`, change the name of `test-centos7-tls-no-tls` to `test-centos7-tls-module-no-tls`. Note that in `test-centos7-tls-module`, we did not pass `--tls-module` in sentinel test because it is not supported, see 4faddf1, added in redis#9320. So only `test-ubuntu-tls` fails in daily CI. Co-authored-by: Oran Agra <oran@redislabs.com>
PR redis#9320 introduces initialization order changes. Now cluster is initialized after modules. This changes causes a crash if the module uses RM_Call inside the load function on cluster mode (the code will try to access `server.cluster` which at this point is NULL). To solve it, separate cluster initialization into 2 phases: 1. Structure initialization that happened before the modules initialization 2. Listener initialization that happened after. Test was added to verify the fix.
Apparently we used to set `loglevel debug` for tls in spawn_instance. I.e. cluster and sentinel tests used to run with debug logging, only when tls mode was enabled. this was probably a leftover from when creating the tls mode tests. it cause a new test created for redis#11214 to fail in tls mode. At the same time, in order to better distinguish the tests, change the name of `test-centos7-tls` to `test-centos7-tls-module`, change the name of `test-centos7-tls-no-tls` to `test-centos7-tls-module-no-tls`. Note that in `test-centos7-tls-module`, we did not pass `--tls-module` in sentinel test because it is not supported, see 4faddf1, added in redis#9320. So only `test-ubuntu-tls` fails in daily CI. Co-authored-by: Oran Agra <oran@redislabs.com>
Adds an option to build RDMA support as a module:
make BUILD_RDMA=module
To start valkey-server with RDMA, use a command line like the following:
./src/valkey-server --loadmodule src/valkey-rdma.so \
port=6379 bind=xx.xx.xx.xx
* Implement server side of connection module only, this means we can
*NOT*
compile RDMA support as built-in.
* Add necessary information in README.md
* Support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380',
then
check this by 'rdma res show cm_id' and valkey-cli (with RDMA support,
but not implemented in this patch).
* The full listeners show like:
listener0:name=tcp,bind=*,bind=-::*,port=6379
listener1:name=unix,bind=/var/run/valkey.sock
listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379
listener3:name=tls,bind=*,bind=-::*,port=16379
Because the lack of RDMA support from TCL, use a simple C program to
test
Valkey Over RDMA (under tests/rdma/). This is a quite raw version with
basic
library dependence: libpthread, libibverbs, librdmacm. Run using the
script:
./runtest-rdma [ OPTIONS ]
To run RDMA in GitHub actions, a kernel module RXE for emulated soft
RDMA, needs
to be installed. The kernel module source code is fetched a repo
containing only
the RXE kernel driver from the Linux kernel, but stored in an separate
repo to
avoid cloning the whole Linux kernel repo.
----
Since 2021/06, I created a
[PR](redis/redis#9161) for *Redis Over RDMA*
proposal. Then I did some work to [fully abstract connection and make
TLS dynamically loadable](redis/redis#9320), a
new connection type could be built into Redis statically, or a separated
shared library(loaded by Redis on startup) since Redis 7.2.0.
Base on the new connection framework, I created a new
[PR](redis/redis#11182), some
guys(@xiezhq-hermann @zhangyiming1201 @JSpewock @uvletter @FujiZ)
noticed, played and tested this PR. However, because of the lack of time
and knowledge from the maintainers, this PR has been pending about 2
years.
Related doc: [Introduce *Valkey Over RDMA*
specification](valkey-io/valkey-doc#123). (same
as Redis, and this should be same)
Changes in this PR:
- implement *Valkey Over RDMA*. (compact the Valkey style)
Finally, if this feature is considered to merge, I volunteer to maintain
it.
---------
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
There are many commits in this PR, the detailed changes is described in each commit message.
Main changes in this PR
make BUILD_TLS=module). Loading the shared library(redis-tls.so) into Redis by Redis module subsystem, and Redis starts to listen TLS port. Ex:Support RDMA as tranport layer protocol #9161
Interface changes
Other notes
compilation of a module, but not the ABI