Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Jan 29, 2024

In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK scenario,
because of the re-processing command, we will recalculate the block timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in blockForKeys
we will not reset the timeout.

Affected BLOCK cases:

  • list / zset / stream, added test cases for each.

Unaffected cases:

  • module (never re-process the commands).
  • WAIT / WAITAOF (never re-process the commands).

Fixes #12998.

In redis#11012, we will reprocess command when client is unblocked on keys,
In the XREADGROUP BLOCK scenario, because of the re-processing command,
we will recalculate the block timeout, causing the blocking time to be
reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in blockForKeys
we will not reset the timeout.

Fixes redis#12998.
@enjoy-binbin
Copy link
Contributor Author

I'm considering adding a test, such as checking the time-consuming in redis-cli.tcl
I don't know if there is any other better way? We seem to have no way to check how long the client has been blocked?

@enjoy-binbin enjoy-binbin requested a review from oranagra January 29, 2024 08:44
@oranagra
Copy link
Member

I'm considering adding a test, such as checking the time-consuming in redis-cli.tcl I don't know if there is any other better way? We seem to have no way to check how long the client has been blocked?

why is it hard to measure it in TCL with clock milliseconds (i.e. after rd read)?

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if other blocking commands were affected by this bug.
i.e. WAITAOF, modules, BLPOP.
i suppose at least the list blocking commands can get unblocked and re-blocked similarly to XREAD, so we should mention them in the release notes.

@enjoy-binbin
Copy link
Contributor Author

BLPOP / BZPOPMIN have the same issue and i reproduce it in local.

i think modules also have the issue, but i haven't found a test case to reproduce yet, maybe you have a idea?

    c->bstate.timeout = 0;
    if (timeout_ms) {
        mstime_t now = mstime();
        if  (timeout_ms > LLONG_MAX - now) {
            c->bstate.module_blocked_handle = NULL;
            addReplyError(c, "timeout is out of range"); /* 'timeout_ms+now' would overflow */
            return bc;
        }
        c->bstate.timeout = timeout_ms + now;
    }

WAITAOF, i didn't think of a scenario where re-process would be triggered, do you have a idea?

I will continue to think about it tomorrow and push the first

@enjoy-binbin enjoy-binbin changed the title Fix XREADGROUP BLOCK timeout is reset due to re-processing command Fix blocking commands timeout is reset due to re-processing command Jan 29, 2024
@oranagra
Copy link
Member

i don't think we have to test all of them.
just wanted to mention in the release notes which ones are affected.
please update the top comment and ping me when ready for merge

@enjoy-binbin
Copy link
Contributor Author

There may be modules, but I haven't found any examples where we will re-process command (except blocking auth). Just in case, I also made a small change in the last commit, please take a look

wait and waitaof don't seem to re-process command

@oranagra
Copy link
Member

i think that in modules we never re-process the commands, we left it out of scope in #11012 @ranshid please ack.
@enjoy-binbin please update the top comment with a list of what's impacted and what's not.

@enjoy-binbin
Copy link
Contributor Author

i think that in modules we never re-process the commands, we left it out of scope in #11012 @ranshid please ack.

#11832

i updated the top comment

@oranagra oranagra merged commit 492021d into redis:unstable Jan 30, 2024
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Jan 30, 2024
@enjoy-binbin enjoy-binbin deleted the fix_xreadgroup_reprocessing branch January 30, 2024 09:36
@enjoy-binbin
Copy link
Contributor Author

enjoy-binbin commented Jan 30, 2024

https://github.com/redis/redis/actions/runs/7709050607/job/21009405189#step:5:683

i see i missed something, that the ci is fails.

        if (keys) {
            blockForKeys(c,BLOCKED_MODULE,keys,numkeys,timeout,flags&REDISMODULE_BLOCK_UNBLOCK_DELETED);
        } else {
+            c->bstate.timeout = timeout;
            blockClient(c,BLOCKED_MODULE);
        }

@oranagra
Do you need me to handle it this way? It feels like the code is a bit hidden. I guess it was just happen to passed PR CI and locally before, so i didn’t find that the Unblock by timer test case would fail.

@oranagra
Copy link
Member

@enjoy-binbin yes, please fix it in this way.
p.s. i don't understand how come it lead to invalidFunctionWasCalled and why it would be flaky. (didn't bother to look at the test)

@enjoy-binbin
Copy link
Contributor Author

p.s. i don't understand how come it lead to invalidFunctionWasCalled and why it would be flaky. (didn't bother to look at the test)

yes, now it is flaky, i'll try to get it to stabilize and then i will issue the fix.

enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jan 30, 2024
This was introduced in redis#13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut

bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
oranagra pushed a commit that referenced this pull request Jan 30, 2024
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
…edis#13004)

In redis#11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases: 
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes redis#12998.
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
This was introduced in redis#13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 15, 2024
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, tests was introduced in redis#13004.
oranagra pushed a commit that referenced this pull request Feb 15, 2024
…ues (#13053)

These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request May 16, 2024
…edis#13004)

In redis#11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases:
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes redis#12998.

(cherry picked from commit 492021d)
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request May 16, 2024
This was introduced in redis#13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```

(cherry picked from commit 45a35a7)
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request May 16, 2024
…ues (redis#13053)

These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in redis#13004.

(cherry picked from commit 32f44da)
YaacovHazan pushed a commit that referenced this pull request May 19, 2024
…13004)

In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases:
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes #12998.

(cherry picked from commit 492021d)
YaacovHazan pushed a commit that referenced this pull request May 19, 2024
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```

(cherry picked from commit 45a35a7)
YaacovHazan pushed a commit that referenced this pull request May 19, 2024
…ues (#13053)

These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.

(cherry picked from commit 32f44da)
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…edis#13004)

In redis#11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases: 
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes redis#12998.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
This was introduced in redis#13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…ues (redis#13053)

These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in redis#13004.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] XREADGROUP block longer than timeout

2 participants