Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Feb 6, 2022

PSYNC replicationid str_offset will crash the server.

The reason is in masterTryPartialResynchronization,
we will call getLongLongFromObjectOrReply check the
offset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.

So crash in c->bufpos == 0 && listLength(c->reply) == 0.
In this commit, we check the psync_offset before entering
function masterTryPartialResynchronization, and return.

Fix #10242

The server logs:

20332:M 06 Feb 2022 16:49:27.237 * Replica 127.0.0.1:<unknown-replica-port> asks for synchronization
20332:M 06 Feb 2022 16:49:27.237 # Replica 127.0.0.1:<unknown-replica-port> asks for synchronization but with a wrong offset

`PSYNC replicationid str_offset` will crash the server.

The reason is in `masterTryPartialResynchronization`,
we will call `getLongLongFromObjectOrReply` check the
offset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.

So crash in `c->bufpos == 0 && listLength(c->reply) == 0`.
In this commit, we check the psync_offset before entering
function `masterTryPartialResynchronization`, and return.

Fix redis#10242
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this was actually a bug in the PSYNC command, regardless of the assertion, right?
i.e it accepted the sync attempt, but also replied with an error.

In theory, an alternative fix could have been to replace getLongLongFromObjectOrReply with just getLongLongFromObject (remove the OrReply part)

@enjoy-binbin
Copy link
Contributor Author

Yes. And yes we can also remove the reply part. I thought about it too. which one do you prefer?

a smaller diff and more elegant one
a error one we can throw it to the client. illegal call

@oranagra
Copy link
Member

oranagra commented Feb 6, 2022

i prefer the one you implemented, just wanted to be sure i got the details right

@oranagra oranagra merged commit 344e41c into redis:unstable Feb 6, 2022
@enjoy-binbin enjoy-binbin deleted the fix_psync_crash branch February 6, 2022 11:21
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 11, 2022
Added regression tests for redis#10020 / redis#10081 / redis#10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in redis#9166).

This commit added some tests to cover the above scenario.
These tests will all fail in redis#9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover redis#8868 (verify the logs).

Other changes: reduces the wait time in `waitForBgsave` and
`waitForBgrewriteaof` from 1s to 50ms, which should reduce
the time for some tests.
oranagra pushed a commit that referenced this pull request Feb 13, 2022
Added regression tests for #10020 / #10081 / #10243.
The above PRs fixed some crashes due to an asserting,
see function `clientHasPendingReplies` (introduced in #9166).

This commit added some tests to cover the above scenario.
These tests will all fail in #9166, althought fixed not,
there is value in adding these tests to cover and verify
the changes. And it also can cover #8868 (verify the logs).

Other changes: 
1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof`
from 1s to 50ms, which should reduce the time for some tests.
2. Improve the test infra to print context when `assert_match` fails.
3. Improve the test infra to print `$error` when `assert_error` fails.
```
Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test)
```
oranagra pushed a commit that referenced this pull request Apr 27, 2022
`PSYNC replicationid str_offset` will crash the server.

The reason is in `masterTryPartialResynchronization`,
we will call `getLongLongFromObjectOrReply` check the
offset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.

So crash in `c->bufpos == 0 && listLength(c->reply) == 0`.
In this commit, we check the psync_offset before entering
function `masterTryPartialResynchronization`, and return.

Regardless of that crash, accepting the sync, but also replying
with an error would have corrupt the replication stream.

(cherry picked from commit 344e41c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[CRASH] Found using fuzzing single redis server instance

2 participants