Skip to content

Conversation

@yz1509
Copy link
Contributor

@yz1509 yz1509 commented Mar 27, 2021

I always use the following commands to initialize the sentinel configuration:

SENTINEL monitor test master_ip master_port 2
SENTINEL set test auth-pass passwd
SENTINEL set ....

It worked well before 8631e64, because the send INFO command blocks sending PING, the condition of this case (https://github.com/redis/redis/blob/6.2.1/src/sentinel.c#L4178) will be satisfied in 15 seconds, the unauthorized connection will be released and the authenticated connection will be reestablished between sentinel and the instance.

However, in the current implementation, because of the refresh of link->last_pong_time, the above case cannot be satisfied, unauthorized connection cannot be released, sentinel thinks the instance is sdown.

@yossigo
Copy link
Collaborator

yossigo commented Apr 13, 2021

Thank you @yz1509, do you think you can add a test for this case?

@yz1509
Copy link
Contributor Author

yz1509 commented Apr 14, 2021

@yossigo
Test Setting

instance version addr
redis-server 5.0.9 127.0.0.147:6309 (Master)
redis-server 5.0.9 127.0.0.50:6309 (Slave)
redis-server 5.0.9 127.0.0.73:6309 (Slave)
redis-sentinel 3.2.11 127.0.0.73:13211
redis-sentinel 6.2.1 127.0.0.73:16210
redis-sentinel current 127.0.0.73:23333

Script

def monitor_master(sentinel_host, sentinel_port, master_host, master_port):
    r = redis.Redis(sentinel_host, sentinel_port)
    r.execute_command('SENTINEL', 'monitor', 'test', master_host, str(master_port), '2')
    # Adding sleep makes it easier to reproduce
    # make sure https://github.com/redis/redis/blob/6.2.1/src/sentinel.c#L2413 is executed before `SENTINEL set test auth-pass kumiko`
    time.sleep(1)
    r.execute_command('SENTINEL', 'set', 'test', 'auth-pass', 'kumiko')
    # ...
    # ...

Sentinel log

// 3.2.11
50509:X 14 Apr 20:26:44.205 # +monitor master test 127.0.0.147 6309 quorum 2
50509:X 14 Apr 20:26:45.254 # +set master test 127.0.0.147 6309 auth-pass kumiko
50509:X 14 Apr 20:26:45.302 - Client closed connection
// 15 seconds later
50509:X 14 Apr 20:26:59.324 * +slave slave 127.0.0.73:6309 127.0.0.73 6309 @ test 127.0.0.147 6309
50509:X 14 Apr 20:26:59.324 * +slave slave 127.0.0.50:6309 127.0.0.50 6309 @ test 127.0.0.147 6309


// 6.2.1
49270:X 14 Apr 2021 20:27:20.955 # +monitor master test 127.0.0.147 6309 quorum 2
49270:X 14 Apr 2021 20:27:22.003 # +set master test 127.0.0.147 6309 auth-pass kumiko
49270:X 14 Apr 2021 20:27:22.004 . Rewritten config file (/home/yz1509/redis/6.2.1/sentinel.conf) successfully
49270:X 14 Apr 2021 20:27:22.052 - Client closed connection
49270:X 14 Apr 2021 20:27:24.167 * +sentinel sentinel 9be34ec15fdecfbe0aadee37d7985ac942a8d838 127.0.0.73 13211 @ test 127.0.0.147 6309
49270:X 14 Apr 2021 20:27:24.167 . Rewritten config file (/home/yz1509/redis/6.2.1/sentinel.conf) successfully
49270:X 14 Apr 2021 20:27:26.262 - Accepted 127.0.0.73:47265
// sdown
49270:X 14 Apr 2021 20:27:50.996 # +sdown master test 127.0.0.147 6309


// current
33521:X 15 Apr 2021 16:13:26.710 . Rewritten config file (/home/yz1509/redis/current/sentinel.conf) successfully
33521:X 15 Apr 2021 16:13:26.710 # +monitor master test 127.0.0.147 6309 quorum 2
33521:X 15 Apr 2021 16:13:27.763 # +set master test 127.0.0.147 6309 auth-pass kumiko
33521:X 15 Apr 2021 16:13:27.763 . Rewritten config file (/home/yz1509/redis/current/sentinel.conf) successfully
33521:X 15 Apr 2021 16:13:27.801 * +slave slave 127.0.0.73:6309 127.0.0.73 6309 @ test 127.0.0.147 6309
33521:X 15 Apr 2021 16:13:27.802 . Rewritten config file (/home/yz1509/redis/current/sentinel.conf) successfully
33521:X 15 Apr 2021 16:13:27.802 * +slave slave 127.0.0.50:6309 127.0.0.50 6309 @ test 127.0.0.147 6309
33521:X 15 Apr 2021 16:13:27.803 . Rewritten config file (/home/yz1509/redis/current/sentinel.conf) successfully

@itamarhaber itamarhaber requested a review from hwware April 21, 2021 10:49
Copy link
Contributor

@hwware hwware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yz1509 , I have three comments for this PR:

  1. Instead of only disconnecting with masters, I think you also need to disconnect with all slaves(but not sentinels) since the group of redis instances for a single master and a several replica are sharing with the same credenticals when sentinel trying to do authentication.You might need to do similar logic like this: https://github.com/redis/redis/blob/unstable/src/sentinel.c#L3194
  2. I think setting the auth-pass you also need to disconnect with the master/replicas.
  3. Can you provide some simple unit test for this? please check: https://github.com/redis/redis/tree/unstable/tests/sentinel for some existing examples.
    @yossigo do you also have other comments? thanks

@hwware hwware assigned hwware and unassigned hwware Apr 27, 2021
@yz1509
Copy link
Contributor Author

yz1509 commented Apr 30, 2021

@hwware Thanks for your comments.

  1. Sentinel cannot get the correct master's info response before authorization, master->slaves is empty.
  2. I'm not familiar with Tcl, it may take some time to add unit tests.

@yossigo
Copy link
Collaborator

yossigo commented May 2, 2021

@hwware @yz1509
Don't we also need to apply the same to auth-user?
I agree it makes sense to drop slave connections as well, but I'm not sure I follow the problem @yz1509 points to as the information about the slaves remains from the last successful INFO.

I think setting the auth-pass you also need to disconnect with the master/replicas.

What do you mean? Force replication to restart?

@hwware
Copy link
Contributor

hwware commented May 3, 2021

I think setting the auth-pass you also need to disconnect with the master/replicas.

@yossigo sorry i type it wrong, I mean auth-user :)

Sentinel cannot get the correct master's info response before authorization, master->slaves is empty.

@yz1509 @yossigo This is true when sentinel cannot get authenticated with master at first time, but I think we need to pay attention to some edge cases, for example, when master does not have a password, but replica has, sentinel can get replica info and connect with masters but not able connect with replicas, later when we set auth-pass to be replica password in sentinel at run time, in order to let sentinel connect with replica, we need to drop the previous replica connections.

@yz1509
Copy link
Contributor Author

yz1509 commented May 12, 2021

@hwware Thanks for your guidance, I made some changes and added unit tests based on it.

set ::user "testuser"
set ::password "secret"

proc setup_sever_auth {} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small typo in the proc name

repl_ri = dictGetVal(de);
instanceLinkCloseConnection(repl_ri->link, repl_ri->link->cc);
instanceLinkCloseConnection(repl_ri->link, repl_ri->link->pc);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a potential memory leak, consider add dictReleaseIterator(di);

@hwware
Copy link
Contributor

hwware commented May 28, 2021

pinging @yossigo for a second review.

assert_equal {OK} [S $id SENTINEL SET mymaster auth-pass $::password]
}

after 20000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason that such a long delay is needed? Can the test be optimized instead to have a much shorter down-after-milliseconds and a smaller delay here, or even use wait_for_condition already in this level?

Sentinel tests are already very slow and we should avoid making them slower if it's not absolutely necessary.

Copy link
Contributor Author

@yz1509 yz1509 Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo When the value of down-after-milliseconds is greater than SENTINEL_MIN_LINK_RECONNECT_PERIOD, the sentinel with a version lower than 5.0 can pass "auth test".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yz1509 @hwware The problem is sentinel tests are already very slow, and we shouldn't slow them much further. Do you think it makes sense to make this define configurable, at least for testing purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo I agree with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo Yes, I totally agree with you, and I will create a separated PR for this configuration.

@yossigo yossigo added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 5, 2021
@hwware
Copy link
Contributor

hwware commented Aug 25, 2021

Hey @yz1509, The debug command has been implemented which adds configurable parameters in #9291, you can use that to reduce test times as @yossigo asked.

@moticless moticless added state:to-be-closed requesting the core team to close the issue and removed state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten labels Mar 13, 2022
@yossigo
Copy link
Collaborator

yossigo commented Mar 13, 2022

Fixed by #10400

@yossigo yossigo closed this Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sentinel state:to-be-closed requesting the core team to close the issue

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants