Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Oct 9, 2025

This issue was introduced by #10440
In that PR, we avoided resetting the current user during processCommand, but overlooked the fact that this client might not be the current one, it could be a client that was previously blocked due to shutdown.
If we don’t reset these clients, and the shutdown is canceled, then when these clients continue executing other commands, they will trigger an assertion.

This PR delays the operation of resetting the client to processUnblockedClients and no longer skips SHUTDOWN_BLOCKED clients.

@sundb sundb requested a review from Copilot October 9, 2025 01:46
@jit-ci
Copy link

jit-ci bot commented Oct 9, 2025

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where clients blocked during shutdown operations were not being properly reset after shutdown cancellation, preventing them from handling subsequent commands normally.

  • Moved client reset logic from unblockClient to processUnblockedClients to handle all unblocked clients consistently
  • Added client duration reset for shutdown-blocked clients to ensure proper state cleanup
  • Added integration test to verify clients can handle commands normally after shutdown cancellation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/blocked.c Relocated client reset logic and added duration reset for shutdown-blocked clients
tests/integration/shutdown.tcl Added test case to verify client functionality after shutdown cancellation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Oct 9, 2025
@sundb
Copy link
Collaborator Author

sundb commented Oct 9, 2025

CI with smoking test commit: https://github.com/sundb/redis/actions/runs/18363002628

crash report

Logged crash report (pid 30767):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
30767:M 09 Oct 2025 01:46:27.237 # === ASSERTION FAILED CLIENT CONTEXT ===
30767:M 09 Oct 2025 01:46:27.237 # client->flags = 536870912
30767:M 09 Oct 2025 01:46:27.237 # client->conn = fd=14
30767:M 09 Oct 2025 01:46:27.237 # client->argc = 1
30767:M 09 Oct 2025 01:46:27.237 # client->argv[0] = "shutdown" (refcount: 1)
30767:M 09 Oct 2025 01:46:27.237 # === RECURSIVE ASSERTION FAILED ===
30767:M 09 Oct 2025 01:46:27.237 # ==> networking.c:2518 'c->argc == 0' is not true

------ STACK TRACE ------

30773 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0x91117)[0x7f58c1491117]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x211)[0x7f58c1493a41]
src/redis-server 127.0.0.1:28666(bioProcessBackgroundJobs+0x317)[0x5654b506c537]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f58c1494ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x1268c0)[0x7f58c15268c0]

30771 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0x91117)[0x7f58c1491117]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x211)[0x7f58c1493a41]
src/redis-server 127.0.0.1:28666(bioProcessBackgroundJobs+0x317)[0x5654b506c537]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f58c1494ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x1268c0)[0x7f58c15268c0]

30772 bio_aof
/lib/x86_64-linux-gnu/libc.so.6(+0x91117)[0x7f58c1491117]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x211)[0x7f58c1493a41]
src/redis-server 127.0.0.1:28666(bioProcessBackgroundJobs+0x317)[0x5654b506c537]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f58c1494ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x1268c0)[0x7f58c15268c0]

30767 redis-server *
src/redis-server 127.0.0.1:28666(_serverAssertWithInfo+0x8a)[0x5654b502d93a]
src/redis-server 127.0.0.1:28666(processMultibulkBuffer+0x7c6)[0x5654b4f71436]
src/redis-server 127.0.0.1:28666(processInputBuffer+0x1ba)[0x5654b4f724da]
src/redis-server 127.0.0.1:28666(readQueryFromClient+0x898)[0x5654b4f745e8]
src/redis-server 127.0.0.1:28666(+0x3127df)[0x5654b51377df]
src/redis-server 127.0.0.1:28666(aeProcessEvents+0x3fa)[0x5654b4f0867a]
src/redis-server 127.0.0.1:28666(aeMain+0x35)[0x5654b4f08b75]
src/redis-server 127.0.0.1:28666(main+0x65a)[0x5654b4f017aa]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f58c1429d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f58c1429e40]
src/redis-server 127.0.0.1:28666(_start+0x25)[0x5654b4f03cf5]

4/4 expected stacktraces.

------ STACK TRACE DONE ------

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           [http://github.com/redis/redis/issues](http://github.com/redis/redis/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen)

  If a Redis module was involved, please open in the module's repo instead.

  Suspect RAM error? Use redis-server --test-memory to verify it.

  Some other issues could be detected by redis-server --check-system
30841:C 09 Oct 2025 01:46:56.475 - Fork CoW for AOF rewrite: current 7 MB, peak 7 MB, average 7 MB
30841:C 09 Oct 2025 01:46:56.475 # Child failed reporting info to parent, exiting. Broken pipe

[exception]: Executing test client: I/O error reading reply.
I/O error reading reply
    while executing
"$rd1 read"
    ("uplevel" body line 47)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {save ""}} {
            set master [srv -1 client]
            set master_host [srv -1 host]
            set master_port [srv..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {save ""}} {
        start_server {overrides {save ""}} {
            set master [srv -1 client]
            set master_host [..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 59)
    invoked from within
"test "Shutting down master waits for replica then fails" {
    start_server {overrides {save ""}} {
        start_server {overrides {save ""}} {
     ..."
    (file "tests/integration/shutdown.tcl" line 136)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 16067
Killing still running Redis server 16107
Killing still running Redis server 20504
Killing still running Redis server 30615
Killing still running Redis server 30643
Killing still running Redis server 31353
Killing still running Redis server 31368
Killing still running Redis server 31580
Killing still running Redis server 31611
Killing still running Redis server 31653
Killing still running Redis server 31692
Killing still running Redis server 31816
Killing still running Redis server 31831
I/O error reading reply
    while executing
"{*}$r type $k"
    (procedure "createComplexDataset" line 51)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 5)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 13)I/O error reading reply
    while executing
"{*}$r hdel $k $f"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 [lindex $args $path]"
    (procedure "randpath" line 3)
    invoked from within
"randpath {{*}$r hset $k $f $v}  {{*}$r hdel $k $f}"
    (procedure "createComplexDataset" line 88)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 5)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 13)

I/O error reading reply
    while executing
"{*}$r type $k"
    (procedure "createComplexDataset" line 35)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 5)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 13)

@sundb
Copy link
Collaborator Author

sundb commented Oct 11, 2025

@oranagra i fixed another issue in the commit eca9db7
(it should be ||)

reproduce steps:

  1. A client unblock a blocked B, In unblockClient(), we will remove the CLIENT_BLOCKED flag and put it into server.unblocked_clients.
  2. If B has a read event at this time, we will enter processInputBuffer(), because B no longer has CLIENT_BLOCKED at this point, and we will continue to parse the new command for it.
    Now it's possible that assertion serverAssertWithInfo(c,NULL,c->argc == 0); is triggered.

So when the client has CLIENT_UNBLOCKED, we should also skip the command parsing.

crash report

Logged crash report (pid 2601164):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
2601164:M 11 Oct 2025 13:54:13.272 # === ASSERTION FAILED CLIENT CONTEXT ===
2601164:M 11 Oct 2025 13:54:13.272 # client->flags = 536871040
2601164:M 11 Oct 2025 13:54:13.272 # client->conn = fd=35
2601164:M 11 Oct 2025 13:54:13.272 # client->argc = 4
2601164:M 11 Oct 2025 13:54:13.272 # client->argv[0] = "waitaof" (refcount: 1)
2601164:M 11 Oct 2025 13:54:13.272 # client->argv[1] = "1" (refcount: 1)
2601164:M 11 Oct 2025 13:54:13.272 # client->argv[2] = "0" (refcount: 1)
2601164:M 11 Oct 2025 13:54:13.272 # client->argv[3] = "0" (refcount: 1)
2601164:M 11 Oct 2025 13:54:13.272 # === RECURSIVE ASSERTION FAILED ===
2601164:M 11 Oct 2025 13:54:13.272 # ==> networking.c:2518 'c->argc == 0' is not true

------ STACK TRACE ------

2601183 bio_aof
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eae3) [0x7bb8d3a9eae3]
/lib/x86_64-linux-gnu/libc.so.6(+0x9f237) [0x7bb8d3a9f237]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x197) [0x7bb8d3aa1b17]
src/redis-server 127.0.0.1:28163(bioProcessBackgroundJobs+0x1ea) [0x65336f9db7ea]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601187 io_thd_3
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eb63) [0x7bb8d3a9eb63]
/lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x25) [0x7bb8d3b33e25]
src/redis-server 127.0.0.1:28163(+0x95cc4) [0x65336f8d4cc4]
src/redis-server 127.0.0.1:28163(aeMain+0x6f) [0x65336f8d540f]
src/redis-server 127.0.0.1:28163(IOThreadMain+0xa6) [0x65336f8e7186]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601182 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eae3) [0x7bb8d3a9eae3]
/lib/x86_64-linux-gnu/libc.so.6(+0x9f237) [0x7bb8d3a9f237]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x197) [0x7bb8d3aa1b17]
src/redis-server 127.0.0.1:28163(bioProcessBackgroundJobs+0x1ea) [0x65336f9db7ea]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601184 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eae3) [0x7bb8d3a9eae3]
/lib/x86_64-linux-gnu/libc.so.6(+0x9f237) [0x7bb8d3a9f237]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x197) [0x7bb8d3aa1b17]
src/redis-server 127.0.0.1:28163(bioProcessBackgroundJobs+0x1ea) [0x65336f9db7ea]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601185 io_thd_1
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eb63) [0x7bb8d3a9eb63]
/lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x25) [0x7bb8d3b33e25]
src/redis-server 127.0.0.1:28163(+0x95cc4) [0x65336f8d4cc4]
src/redis-server 127.0.0.1:28163(aeMain+0x6f) [0x65336f8d540f]
src/redis-server 127.0.0.1:28163(IOThreadMain+0xa6) [0x65336f8e7186]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601186 io_thd_2
/lib/x86_64-linux-gnu/libc.so.6(+0xaafe2) [0x7bb8d3aaafe2]
/lib/x86_64-linux-gnu/libc.so.6(+0x9eb63) [0x7bb8d3a9eb63]
/lib/x86_64-linux-gnu/libc.so.6(epoll_wait+0x25) [0x7bb8d3b33e25]
src/redis-server 127.0.0.1:28163(+0x95cc4) [0x65336f8d4cc4]
src/redis-server 127.0.0.1:28163(aeMain+0x6f) [0x65336f8d540f]
src/redis-server 127.0.0.1:28163(IOThreadMain+0xa6) [0x65336f8e7186]
/lib/x86_64-linux-gnu/libc.so.6(+0xa27f1) [0x7bb8d3aa27f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x133b5c) [0x7bb8d3b33b5c]

2601164 redis-server *
src/redis-server 127.0.0.1:28163(processMultibulkBuffer+0x600) [0x65336f91bfc0]
src/redis-server 127.0.0.1:28163(processInputBuffer+0x172) [0x65336f91c3c2]
src/redis-server 127.0.0.1:28163(readQueryFromClient+0x348) [0x65336f9216e8]
src/redis-server 127.0.0.1:28163(+0x22cf5a) [0x65336fa6bf5a]
src/redis-server 127.0.0.1:28163(aeMain+0xb2) [0x65336f8d5452]
src/redis-server 127.0.0.1:28163(main+0x46d) [0x65336f8cef5d]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a578) [0x7bb8d3a2a578]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x7bb8d3a2a63b]
src/redis-server 127.0.0.1:28163(_start+0x25) [0x65336f8d0a75]

7/7 expected stacktraces.

------ STACK TRACE DONE ------

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  If a Redis module was involved, please open in the module's repo instead.

  Suspect RAM error? Use redis-server --test-memory to verify it.

  Some other issues could be detected by redis-server --check-system

[exception]: Executing test client: I/O error reading reply.
I/O error reading reply
    while executing
"wait_for_condition 50 100 {
        [s $idx blocked_clients] ne 0
    } else {
        fail "no blocked clients"
    }"
    (procedure "wait_for_blocked_client" line 2)
    invoked from within
"wait_for_blocked_client"
    ("uplevel" body line 19)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 59)
    invoked from within
"test {WAITAOF local client unblock with timeout and error} {
            r config set appendonly yes
            r config set appendfsync no
         ..."
    ("uplevel" body line 39)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 2)
    invoked from within
"start_server {overrides {appendonly {yes} auto-aof-rewrite-percentage {0}}} {
        set master [srv 0 client]

        test {WAITAOF local copy befo..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code"
    (procedure "tags" line 12)
    invoked from within
"tags {"wait aof network external:skip"} {
    start_server {overrides {appendonly {yes} auto-aof-rewrite-percentage {0}}} {
        set master [srv 0 ..."
    (file "tests/unit/wait.tcl" line 104)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "

BTW, VK also skip parsing in this case.

@sundb sundb merged commit 6d89370 into redis:unstable Oct 15, 2025
19 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.4 Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants