-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Fix PSYNC crash with wrong offset #10243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`PSYNC replicationid str_offset` will crash the server. The reason is in `masterTryPartialResynchronization`, we will call `getLongLongFromObjectOrReply` check the offset. With a wrong offset, it will add a reply and then trigger a full SYNC and the client become a replica. So crash in `c->bufpos == 0 && listLength(c->reply) == 0`. In this commit, we check the psync_offset before entering function `masterTryPartialResynchronization`, and return. Fix redis#10242
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this was actually a bug in the PSYNC command, regardless of the assertion, right?
i.e it accepted the sync attempt, but also replied with an error.
In theory, an alternative fix could have been to replace getLongLongFromObjectOrReply with just getLongLongFromObject (remove the OrReply part)
|
Yes. And yes we can also remove the reply part. I thought about it too. which one do you prefer? a smaller diff and more elegant one |
|
i prefer the one you implemented, just wanted to be sure i got the details right |
Added regression tests for redis#10020 / redis#10081 / redis#10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in redis#9166). This commit added some tests to cover the above scenario. These tests will all fail in redis#9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover redis#8868 (verify the logs). Other changes: reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests.
Added regression tests for #10020 / #10081 / #10243. The above PRs fixed some crashes due to an asserting, see function `clientHasPendingReplies` (introduced in #9166). This commit added some tests to cover the above scenario. These tests will all fail in #9166, althought fixed not, there is value in adding these tests to cover and verify the changes. And it also can cover #8868 (verify the logs). Other changes: 1. Reduces the wait time in `waitForBgsave` and `waitForBgrewriteaof` from 1s to 50ms, which should reduce the time for some tests. 2. Improve the test infra to print context when `assert_match` fails. 3. Improve the test infra to print `$error` when `assert_error` fails. ``` Expected an error matching 'ERR*' but got 'OK' (context: type eval line 4 cmd {assert_error "ERR*" {r set a b}} proc ::test) ```
`PSYNC replicationid str_offset` will crash the server. The reason is in `masterTryPartialResynchronization`, we will call `getLongLongFromObjectOrReply` check the offset. With a wrong offset, it will add a reply and then trigger a full SYNC and the client become a replica. So crash in `c->bufpos == 0 && listLength(c->reply) == 0`. In this commit, we check the psync_offset before entering function `masterTryPartialResynchronization`, and return. Regardless of that crash, accepting the sync, but also replying with an error would have corrupt the replication stream. (cherry picked from commit 344e41c)
PSYNC replicationid str_offsetwill crash the server.The reason is in
masterTryPartialResynchronization,we will call
getLongLongFromObjectOrReplycheck theoffset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.
So crash in
c->bufpos == 0 && listLength(c->reply) == 0.In this commit, we check the psync_offset before entering
function
masterTryPartialResynchronization, and return.Fix #10242
The server logs: