Make sure replicas don't write their own replies to the replication link #10020

yoav-steinberg · 2021-12-28T10:25:55Z

Since #9166 we have an assertion here to make sure replica clients don't write anything to their buffer.
But in reality a replica may attempt write data to it's buffer simply by sending a command on the replication link. This command in most cases will be rejected since #8868 but it'll still generate an error.
Actually the only valid command to send on a replication link is 'REPCONF ACK` which generates no response.

We want to keep the design so that replicas can send commands but we need to avoid any situation where we start putting data in their response buffers, especially since they aren't used anymore. This PR makes sure to disconnect a rogue client which generated a write on the replication link that cause something to be written to the response buffer.

To recreate the bug this fixes simply connect via telnet to a redis server and write sync\r\n wait for the the payload to be written and then write any command (valid or invalid), such as ping\r\n on the telnet connection. It'll crash the server.

src/networking.c

oranagra · 2021-12-29T08:34:05Z

worth to mention that #8868 was a response for a complaint about DoS, user sending SYNC command then then being able to trigger an assertion. so we need to make sure the new assertion is unreachable via user workloads.

madolson · 2021-12-29T14:57:29Z

The other concern is that a replica can disable replies with 'client reply off', which would bypass this check. If we're concerned about malicious users, I don't think this is sufficient.

oranagra · 2021-12-29T15:05:17Z

we already made sure they can't write to the keyspace.
so the concern now is to make sure they can't crash the master by triggering that new assert.
i think we're ok with the current code.

The following steps will crash redis-server: ``` [root]# cat crash PSYNC replicationid -1 SLOWLOG GET GET key [root]# nc 127.0.0.1 6379 < crash ``` This one following redis#10020 and the crash was reported in redis#10076. Other changes about the output info: 1. Cmd with a full name by using `getFullCommandName`, now it will print the right subcommand name like `slowlog|get`. 2. Print the full client info by using `catClientInfoString`, the info is also valuable.:

…ink (#10081) The following steps will crash redis-server: ``` [root]# cat crash PSYNC replicationid -1 SLOWLOG GET GET key [root]# nc 127.0.0.1 6379 < crash ``` This one following #10020 and the crash was reported in #10076. Other changes about the output info: 1. Cmd with a full name by using `getFullCommandName`, now it will print the right subcommand name like `slowlog|get`. 2. Print the full client info by using `catClientInfoString`, the info is also valuable.:

Added regression tests for redis#10020 / redis#10081 / redis#10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in redis#9166). This commit added some tests to cover the above scenario. These tests will all fail in redis#9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover redis#8868 (verify the logs). Other changes: reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests.

Added regression tests for #10020 / #10081 / #10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in #9166). This commit added some tests to cover the above scenario. These tests will all fail in #9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover #8868 (verify the logs). Other changes: 1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests. 2. Improve the test infra to print context when `assert_match` fails. 3. Improve the test infra to print `$error` when `assert_error` fails. ``` Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test) ```

…truct (#10697) Move the client flags to a more cache friendly position within the client struct we regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ). These are due to higher rate of calls to getClientType due to changes in #9166 and #10020

…truct (redis#10697) Move the client flags to a more cache friendly position within the client struct we regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ). These are due to higher rate of calls to getClientType due to changes in redis#9166 and redis#10020

yoav-steinberg added 2 commits December 28, 2021 12:17

Make sure replicas don't write their own replies to the replication link

5977589

typo

a21c7cb

yoav-steinberg requested review from ShooterIT and oranagra December 28, 2021 10:28

madolson reviewed Dec 28, 2021

View reviewed changes

src/networking.c Show resolved Hide resolved

ShooterIT approved these changes Dec 29, 2021

View reviewed changes

oranagra added the 7.0-must-have label Dec 29, 2021

oranagra approved these changes Dec 29, 2021

View reviewed changes

cr comment

8f084b7

oranagra merged commit 2ff3fc1 into redis:unstable Jan 2, 2022

enjoy-binbin mentioned this pull request Jan 8, 2022

[CRASH] ASSERTION FAILED and stack-buffer-overflow in networking.c:1026 #10076

Closed

enjoy-binbin mentioned this pull request Jan 9, 2022

Make sure replicas don't write their own replies to the replication link #10081

Merged

enjoy-binbin mentioned this pull request Feb 6, 2022

[CRASH] Found using fuzzing single redis server instance #10242

Closed

enjoy-binbin mentioned this pull request Feb 12, 2022

Regression test for sync psync crash #10288

Merged

ShooterIT mentioned this pull request Apr 28, 2022

large ammount of getClientType() within the reply code is costing us around 3% of cpu cycles on LRANGE command (or any command that heavily relies on partial reply building) #10648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure replicas don't write their own replies to the replication link #10020

Make sure replicas don't write their own replies to the replication link #10020

Uh oh!

yoav-steinberg commented Dec 28, 2021 •

edited

Loading

Uh oh!

Uh oh!

oranagra commented Dec 29, 2021

Uh oh!

madolson commented Dec 29, 2021

Uh oh!

oranagra commented Dec 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Make sure replicas don't write their own replies to the replication link #10020

Make sure replicas don't write their own replies to the replication link #10020

Uh oh!

Conversation

yoav-steinberg commented Dec 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oranagra commented Dec 29, 2021

Uh oh!

madolson commented Dec 29, 2021

Uh oh!

oranagra commented Dec 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yoav-steinberg commented Dec 28, 2021 •

edited

Loading