-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Make sure replicas don't write their own replies to the replication link #10020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure replicas don't write their own replies to the replication link #10020
Conversation
|
worth to mention that #8868 was a response for a complaint about DoS, user sending SYNC command then then being able to trigger an assertion. so we need to make sure the new assertion is unreachable via user workloads. |
|
The other concern is that a replica can disable replies with 'client reply off', which would bypass this check. If we're concerned about malicious users, I don't think this is sufficient. |
|
we already made sure they can't write to the keyspace. |
The following steps will crash redis-server: ``` [root]# cat crash PSYNC replicationid -1 SLOWLOG GET GET key [root]# nc 127.0.0.1 6379 < crash ``` This one following redis#10020 and the crash was reported in redis#10076. Other changes about the output info: 1. Cmd with a full name by using `getFullCommandName`, now it will print the right subcommand name like `slowlog|get`. 2. Print the full client info by using `catClientInfoString`, the info is also valuable.:
…ink (#10081) The following steps will crash redis-server: ``` [root]# cat crash PSYNC replicationid -1 SLOWLOG GET GET key [root]# nc 127.0.0.1 6379 < crash ``` This one following #10020 and the crash was reported in #10076. Other changes about the output info: 1. Cmd with a full name by using `getFullCommandName`, now it will print the right subcommand name like `slowlog|get`. 2. Print the full client info by using `catClientInfoString`, the info is also valuable.:
Added regression tests for redis#10020 / redis#10081 / redis#10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in redis#9166). This commit added some tests to cover the above scenario. These tests will all fail in redis#9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover redis#8868 (verify the logs). Other changes: reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests.
Added regression tests for #10020 / #10081 / #10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in #9166). This commit added some tests to cover the above scenario. These tests will all fail in #9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover #8868 (verify the logs). Other changes: 1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests. 2. Improve the test infra to print context when `assert_match` fails. 3. Improve the test infra to print `$error` when `assert_error` fails. ``` Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test) ```
…truct (redis#10697) Move the client flags to a more cache friendly position within the client struct we regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ). These are due to higher rate of calls to getClientType due to changes in redis#9166 and redis#10020
Since #9166 we have an assertion here to make sure replica clients don't write anything to their buffer.
But in reality a replica may attempt write data to it's buffer simply by sending a command on the replication link. This command in most cases will be rejected since #8868 but it'll still generate an error.
Actually the only valid command to send on a replication link is 'REPCONF ACK` which generates no response.
We want to keep the design so that replicas can send commands but we need to avoid any situation where we start putting data in their response buffers, especially since they aren't used anymore. This PR makes sure to disconnect a rogue client which generated a write on the replication link that cause something to be written to the response buffer.
To recreate the bug this fixes simply connect via telnet to a redis server and write
sync\r\nwait for the the payload to be written and then write any command (valid or invalid), such asping\r\non the telnet connection. It'll crash the server.