Skip to content

Conversation

@warriorguo
Copy link
Contributor

@warriorguo warriorguo commented Mar 18, 2022

fix #10439. see #9872
When executing SHUTDOWN we pause the client so we can un-pause it if the shutdown fails.
this could happen during the timeout, if the shutdown is aborted, but could also happen from withing the initial call() to shutdown, if the rdb save fails.
in that case when we return to call(), we'll crash if c->cmd has been set to NULL.

The call stack is:

unblockClient(c)
replyToClientsBlockedOnShutdown()
cancelShutdown()
finishShutdown()
prepareForShutdown()
shutdownCommand()

what's special about SHUTDOWN in that respect is that it can be paused, and then un-paused before the original call() returns.
tests where added for both failed shutdown, and a followup successful one.

@oranagra
Copy link
Member

@warriorguo thank you for the report and the fix.
there are a few things i don't understand, maybe you can explain them to me.

  1. i run your tests on the unfixed version, it didn't reproduce the crash, but the tests did fail. i'd expected to see the crash, and if not, i'd expect the tests to still pass. please save me some time and tell me what i'm missing.
  2. i was surprised to see the crash log containing Killed by PID, when si_code is not 0. maybe you can check what's the value of SI_USER on your system, and why isn't that part working.

p.s. if you have a fix and you submit a PR, there's no need to also open an issue.
i'd rather use issues to discuss things for which we don't have an easy solution ready.

@warriorguo
Copy link
Contributor Author

@oranagra I did the tests with the unfixed version and notice the test cases truly can not cause crashes, I made some improvements on the test cases then.

  1. The crash with unfixed can be found here If you need the crash details: https://github.com/warriorguo/redis/runs/5603097401?check_suite_focus=true#step:4:3326.
  2. I did the compile and tests on Mac, the SI_USER defines as #define SI_USER 0x10001 /* [CX] signal from kill() */

and thanks for the info, thought it could be a better practice to track the issue at first lol

@oranagra oranagra merged commit fae5b1a into redis:unstable Mar 20, 2022
@oranagra oranagra mentioned this pull request Apr 5, 2022
sundb added a commit that referenced this pull request Oct 15, 2025
…ancellation (#14420)

This issue was introduced by #10440
In that PR, we avoided resetting the current user during processCommand,
but overlooked the fact that this client might not be the current one,
it could be a client that was previously blocked due to shutdown.
If we don’t reset these clients, and the shutdown is canceled, then when
these clients continue executing other commands, they will trigger an
assertion.

This PR delays the operation of resetting the client to
processUnblockedClients and no longer skips SHUTDOWN_BLOCKED clients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[CRASH] Redis-Server Crashed after failed to save rdb when receive shutdown command

2 participants