-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Sharded pubsub implementation #8621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't take a super deep look as this is still a draft, but looks consistent with what we talked about.
@redis/core-team Might be useful to take a look now while this is in the early draft stage for how this solves the cluster pubsub scaling problem. The interesting bit is that it's introduce 2 new commands publishlocal and subscribelocal, which exist in their own namespaces.
Full suggestion is here: #8029
src/pubsub.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is "local", should we be propagating it? Some thoughts here:
- Always require it on a primary and then replicate it normally.
- Don't replicate it and require consistent behavior from the clients about where you are connecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should replicate it within the (master, replica) set. Otherwise, I believe we are restricting the data to a single instance.
I don't see any advantage to always publish the data on master. We should allow the client to publish data irrespective of a master or a replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cluster mode, you can issue publish commands to replicas and they get propagated to primaries through the clusterbus. Pubsub local doesn't do that, it expects it to go to the primary to be replicated. This seems like weird behavior to break. This is especially odd since we don't make publish consistent (at least I think) and don't redirect it to primaries by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying this. I've fixed the implementation to support pubsub local commands to work on the master/replica and data broadcast to happen through clusterbus within a slot.
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid progress, I did look at everything except the tests.
src/pubsub.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cluster mode, you can issue publish commands to replicas and they get propagated to primaries through the clusterbus. Pubsub local doesn't do that, it expects it to go to the primary to be replicated. This seems like weird behavior to break. This is especially odd since we don't make publish consistent (at least I think) and don't redirect it to primaries by default.
|
@redis/core-team Hey, wanted to get your guy's opinions about a couple of open questions. Skip to the third section if you are aware of the context. @zuiderkwast I know you're away, but take a look when you're back as well. High level changes
Assumptions that are staying the same for local channels
Open questionsHere are the open questions that have come up while reviewing that the requester wanted to close on before continuing. They're summarized as multiple options, I've included my recommendation.
|
|
not sure how i agree about these limitations:
regarding open questions:
i feel i'm not knowledgeable enough in these topics, so take my opinion with a grain of salt. |
|
…s during migration.
| if (type == CLUSTERMSG_TYPE_PUBLISHSHARD) { | ||
| pubsubPublishMessageShard(channel, message); | ||
| } else { | ||
| pubsubPublishMessage(channel,message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pubsubPublishMessageShard(channel, message) parameters have space but pubsubPublishMessage(channel,message); don't have space.
what about this? pubsubPublishMessage(channel, message);
|
can someone look into this CI failure: |
|
@oranagra Seems like that isn't related to this issue, seems related to the #9774 (comment) |
|
looks like the test hung here (saw it only once, so may be a false report) |
|
@hpatro FYI. I'm not aware of any place it's likely to get stuck, let's keep a look out if it happens again. |
|
Test hung on my fork : https://github.com/tezc/redis/runs/4834585930?check_suite_focus=true#step:9:630 |
|
Taking a look. |
@oranagra @hpatro Just realized in both failures, previous test case failed. Maybe, this is not related to this PR at all. Maybe it's about #9774 , something is leaking to pubsub test after the failure. |
|
Thanks for the work on this @hpatro! Sorry for commenting on an already closed PR but I just want to point out a small inconsistency between the behaviour of redis-cli for |
|
@elena-kolevska I get the below view after |
|
@hpatro I'm sorry, this was a silly (and embarrassing) mistake, I had ran redis-cli from my path by mistake, not from unstable. Works as expected from unstable. |
|
@elena-kolevska Ah, makes sense. Np. |
publishshard was added in redis#8621 (7.0 RC1), but the publishshard_sent stat is not shown in CLUSTER INFO command. Other changes: 1. Remove useless `needhelp` statements, it was removed in 3dad819. 2. Use `LL_WARNING` log level for some error logs (I/O error, Connection failed). 3. Fix typos that saw by the way.
publishshard was added in #8621 (7.0 RC1), but the publishshard_sent stat is not shown in CLUSTER INFO command. Other changes: 1. Remove useless `needhelp` statements, it was removed in 3dad819. 2. Use `LL_WARNING` log level for some error logs (I/O error, Connection failed). 3. Fix typos that saw by the way.
We already have `pubsub_channels` and `pubsub_patterns` in INFO stats, now add `pubsubshard_channels` (symmetry). Sharded pubsub was added in redis#8621
We already have `pubsub_channels` and `pubsub_patterns` in INFO stats, now add `pubsubshard_channels` (symmetry). Sharded pubsub was added in #8621
We already have `pubsub_channels` and `pubsub_patterns` in INFO stats, now add `pubsubshard_channels` (symmetry). Sharded pubsub was added in redis#8621
This is a PR to support a sharded pubsub implementation based on cluster hashing.
API Definitions can be found here
New commands introduced:
Prefix
SdenotesSHARD.New Sub commands
New Configs
Considerations made:
SETSLOT <slot> NODE <node-id>. Clients will need to handle this message, and choose to either reconnect to the new target OR throw an exception up to the client. (Alternative to this is to disconnect the client)SUBSCRIBELOCALcalls though.Additional considerations: