Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

For the XREADGROUP BLOCK > scenario, there is an endless loop.
Due to #11012, XREADGROUP with ">" may repeatedly reprocess command and blockForKeys.

The right fix is to avoid an endless loop in handleClientsBlockedOnKey and handleClientsBlockedOnKeys,
looks like there was some attempt in handleClientsBlockedOnKeys but maybe not sufficiently good,
and it looks like using a similar trick in handleClientsBlockedOnKey is complicated.
i.e. stashing the list on the stack and iterating on it after creating a fresh one for future use,
is problematic since the code keeps accessing the global list.

The fix is proposed by oranagra.

Fixes #12290

For the XREADGROUP BLOCK > scenario, there is an endless loop.
Due to redis#11012, it keep going, reprocess command -> blockForKeys -> reprocess command

The right fix is to avoid an endless loop in handleClientsBlockedOnKey and handleClientsBlockedOnKeys,
looks like there was some attempt in handleClientsBlockedOnKeys but maybe not sufficiently good,
and it looks like using a similar trick in handleClientsBlockedOnKey is complicated.
i.e. stashing the list on the stack and iterating on it after creating a fresh one for future use,
is problematic since the code keeps accessing the global list.

The fix is proposed by oranagra.

Fixes redis#12290

Co-authored-by: Oran Agra <oran@redislabs.com>
@enjoy-binbin enjoy-binbin requested a review from oranagra June 12, 2023 00:58
@enjoy-binbin
Copy link
Contributor Author

@yossigo yossigo added the release-notes indication that this issue needs to be mentioned in the release notes label Jun 12, 2023
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you confirm that the test reproduces the problem?

@enjoy-binbin
Copy link
Contributor Author

did you confirm that the test reproduces the problem?

yes, it reproduces stably locally

@oranagra
Copy link
Member

oranagra commented Jun 12, 2023

@guybe7 we get the following assertion in the schema validator:

22890:M 12 Jun 2023 03:21:45.863 # === ASSERTION FAILED ===
22890:M 12 Jun 2023 03:21:45.863 # ==> logreqres.c:281 'ret' is not true

i suppose it's because a command was re-processed and blocked again without sending any reply. please advise

@enjoy-binbin
Copy link
Contributor Author

enjoy-binbin commented Jun 12, 2023

i did not see the assertion before (it happened in module-api, I ignored it for granted)
It's a bit strange to fail in module-api instead of before
it can reproduce locally, i will see if I can get the specific reason later

it failed on Module client blocked on keys: Circular BPOPPUSH

@oranagra
Copy link
Member

ohh, i looked at the assertion code in logreqres.c and wrongly concluded it's because of the new test.
please have a look, if you conclude it's not related to this PR, we can leave it for later.

@oranagra oranagra merged commit e7129e4 into redis:unstable Jun 13, 2023
@enjoy-binbin enjoy-binbin deleted the fix_xreadgroup_block_endless_loop branch June 13, 2023 10:28
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jun 16, 2023
…ecution order test

In redis#12301, we observed that if the
`while(listLength(server.ready_keys) != 0)`
in handleClientsBlockedOnKeys is changed to
`if(listLength(server.ready_keys) != 0)`,
the order of command execution will change.

It is wrong to change that. It means that if a command
being unblocked causes another command to get unblocked
(like a BLMOVE would do), then the new unblocked command
will wait for later to get processed rather than right away.

It'll not have any real implication if we change that since
we do call handleClientsBlockedOnKeys in beforeSleep again,
and redis will still behave correctly, but we don't change that.

An example:
1. $rd1 blmove src{t} dst{t} left right 0
2. $rd2 blmove dst{t} src{t} right left 0
3. $rd3 set key1{t}, $rd3 lpush src{t}, $rd3 set key2{t} in a pipeline

The correct order would be:
1. set key1{t}
2. lpush src{t}
3. lmove src{t} dst{t} left right
4. lmove dst{t} src{t} right left
5. set key2{t}

The wrong order would be:
1. set key1{t}
2. lpush src{t}
3. lmove src{t} dst{t} left right
4. set key2{t}
5. lmove dst{t} src{t} right left

This PR adds corresponding test to cover it.
oranagra pushed a commit that referenced this pull request Jun 16, 2023
…ecution order test (#12324)

* Add command being unblocked cause another command to get unblocked execution order test

In #12301, we observed that if the
`while(listLength(server.ready_keys) != 0)`
in handleClientsBlockedOnKeys is changed to
`if(listLength(server.ready_keys) != 0)`,
the order of command execution will change.

It is wrong to change that. It means that if a command
being unblocked causes another command to get unblocked
(like a BLMOVE would do), then the new unblocked command
will wait for later to get processed rather than right away.

It'll not have any real implication if we change that since
we do call handleClientsBlockedOnKeys in beforeSleep again,
and redis will still behave correctly, but we don't change that.

An example:
1. $rd1 blmove src{t} dst{t} left right 0
2. $rd2 blmove dst{t} src{t} right left 0
3. $rd3 set key1{t}, $rd3 lpush src{t}, $rd3 set key2{t} in a pipeline

The correct order would be:
1. set key1{t}
2. lpush src{t}
3. lmove src{t} dst{t} left right
4. lmove dst{t} src{t} right left
5. set key2{t}

The wrong order would be:
1. set key1{t}
2. lpush src{t}
3. lmove src{t} dst{t} left right
4. set key2{t}
5. lmove dst{t} src{t} right left

This PR adds corresponding test to cover it.

* Add comment near while(listLength(server.ready_keys) != 0)
oranagra pushed a commit that referenced this pull request Jun 22, 2023
Apart from adding the missing coverage, this PR also adds `blockedBeforeSleep`
that gathers all block-related functions from `beforeSleep`

The order inside `blockedBeforeSleep` is different: now `handleClientsBlockedOnKeys`
(which may unblock clients) is called before `processUnblockedClients` (which handles
unblocked clients).
It makes sense to have this order.

There are no visible effects of the wrong ordering, except some cleanups of the now-unblocked
client would have  happen in the next `beforeSleep` (will now happen in the current one)

The reason we even got into it is because i triggers an assertion in logresreq.c (breaking
the assumption that `unblockClient` is called **before** actually flushing the reply to the socket):
`handleClientsBlockedOnKeys` is called, then it calls `moduleUnblockClientOnKey`, which calls
`moduleUnblockClient`, which adds the client to `moduleUnblockedClients` back to `beforeSleep`,
we call `handleClientsWithPendingWritesUsingThreads`, it writes the data of buf to the client, so
`client->bufpos` became 0
On the next `beforeSleep`, we call `moduleHandleBlockedClients`, which calls `unblockClient`,
which calls `reqresAppendResponse`, triggering the assert. (because the `bufpos` is 0) - see #12301 (comment)
@oranagra oranagra mentioned this pull request Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Deadlock with streams on redis 7.2

4 participants