Skip to content

Conversation

@hwware
Copy link
Contributor

@hwware hwware commented Apr 30, 2021

Before this commit, in Sentinel mode, when the already known sentinel instance switch address, there is a chance making same sentinel instance being considered as two separate sentinel instances which may causing problem.
For example in the following scenario:

  1. we are deploying three sentinels (sentinel1: 127.0.0.1:26379, sentinel2: 127.0.0.1:26380,sentinel3:127.0.0.1:26381)monitoring same redis master/replica instance
  2. we shutdown sentinel 1 down for maintainance, and we start a new sentinel 4 (127.0.0.1:26379) from scrach, which will generate a new runid, in sentinel 2 and sentinel 3's view, when it get the hello package from sentinel 4, sentinel1 port number will be set to 0 temporatorily, but not been deleted.
  3. after sometime we shutdown sentinel 4 and make sentinel 1 back online, in that case, sentinel1 and sentinel 4 will both exist in sentinel 2/3's list, with same ip and port, and this causing the duplication issue. In sentinel 2/3's point of view the sentinel 4 is even ONLINE since the health check (ping/pong) package doesnt contain the runid.

The root cause of this issue is when removing sentinel with same runid, we still need to invalidate other known sentinels with same ip/port.
https://github.com/redis/redis/blob/unstable/src/sentinel.c#L2874

@hwware hwware changed the title set known sentinel instance port to 0 when original instance come online [Sentinel] set known sentinel instance port to 0 when original instance come online Apr 30, 2021
@yossigo
Copy link
Collaborator

yossigo commented May 3, 2021

@hwware I think it may be a good idea to add a test for this scenario, maybe even with some kind of fuzzing (i.e. take up & down multiple sentinels).

@hwware
Copy link
Contributor Author

hwware commented May 5, 2021

thank you @yossigo , i will think about how to add a test case for this

@hwware
Copy link
Contributor Author

hwware commented May 17, 2021

hi @yossigo , i added the test scenario for this fix, please take look, thanks! also there is a bug #8958 makes the test cases fail at first, please review that pr also, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwware Do we really need such a long delay here (and in the other locations)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo we don't have to, the purpose of wait here is to make sure other sentinel get updated information from the pub/sub hello message, the interval is 2s. i think 3s should be enough, or what we can do is continusly waiting for other sentinel and as soon as the pub/sub message get updated we are good to go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwware I think it's worth taking the later approach, Sentinel tests are already pretty slow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwware As mentioned in #8710, maybe we need to make some #define's configurable at least for testing purposes because the Sentinel tests are slow as they are. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo I totally agree with you, and I will create a separated PR for this configuration. Thanks

@hwware hwware force-pushed the sentinel_dupl_instance_when_switch branch from 7786134 to 2d0048c Compare June 30, 2021 20:17
@yossigo yossigo added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 5, 2021
@yossigo
Copy link
Collaborator

yossigo commented Aug 18, 2021

This has been superseded by #9240, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants