Reverts most of the changes of #10969 #11178

MeirShpilraien · 2022-08-23T14:21:02Z

The PR reverts the changes made on #10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged.

The issue is that if there is a lazy expire during the command invocation, the del command is added to the replication stream after the command placeholder. So the logical order on the primary is:

Delete the key (lazy expiration)
Command invocation

But the replication stream gets it the other way around:

Command invocation (because the command is written into the placeholder)
Delete the key (lazy expiration)

So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica.

One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow:

User perform SINTERSTORE
When Redis tries to fetch the second input key it triggers lazy expire
The lazy expire trigger a module logic that deletes the first input key
Now Redis hold the robj of the first input key that was actually freed

We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure.

Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that was reverted are:

Saving a placeholder for replication at the beginning of the command (call function)
Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR)
Spop changes are no longer needed (because we reverted the placeholder code)

Changes that was not reverted:

On expire/eviction, wrap the del and the notification effect in a multi exec.
PropagateNow function can still accept a special dbid, -1, indicating not to replicate select.
Keep optimisation for reusing the alsoPropagate array instead of allocating it each time.

Tests:

All tests was kept and only few tests was modify to work correctly with the changes
Test was added to verify that the revert fixes the issues.

The PR reverts the changes made on redis#10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the del command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find. One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that was reverted are: * Saving a placeholder for replication at the begining of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * Spop changes are no longer needed (because we reverted the placeholder code) Changes that was not reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * PropagateNow function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimization for reusing the alsoPropagate array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.

src/db.c

src/server.h

src/t_set.c

tests/integration/replication.tcl

guybe7 · 2022-08-24T07:26:20Z

tests/integration/replication.tcl

+            assert_equal 1 [$master wait 1 1]
+
+            assert_equal "set" [$master type s]
+            assert_equal "set" [$slave type s]


before this commit the replica didn't have foo at all?

The replica did not have the set at all.

Related PRs: redis#10969, redis#11178 The following PR is a proposal of handling write operations inside key space notifications. The PR is not yet finished (see todo list at the end) but I would like to start getting opinions whether or not the approach make sense and whether there are any issues that we might have missed. After a lot of discussions we came to a conclussion that module should not perform any write operations on key space notifcation. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviure. One solution is to go over all those places (that assumes no data changes) and fix them such that they will be tollerable to data changes. This apporach will probably require a lot of changes all around the code and it will be very hard to know whether or not we got all those issues. Moreover, it will be very hard to maintain such code and be sure that we are not adding more places that assume no data changes. Another approach (which presented in this PR) is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API that, `RedisModule_AddPostNotificationJob`, allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the notification (the notification effect and the job effect will be replicated to the replication and the aof wrapped with multi exec). Module can use this new API to safely perform any write operation and still achieve atomicity betweent the notification and the write. ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostKeyspaceJobs`) that need to be invoke after the current logic ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `propagatePendingCommands` just before we actually propagating the effects. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostKeyspaceJobs` list and will be invoke atomically after the current callback ends. In order to avoid infinite loop we limit the number of jobs that can be registered in a single logic unit (command, eviction, active expire). This limitation is configurable using `maxpostnotificationsjobs` configuration value.

The PR reverts the changes made on redis#10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the `del` command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realize that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that **was** reverted are: * Saving a placeholder for replication at the beginning of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * `Spop` changes are no longer needed (because we reverted the placeholder code) Changes that **was not** reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.

MeirShpilraien requested review from guybe7 and oranagra August 23, 2022 14:21

remove unneeded diclaration of redisOpArrayInit

5b1a72b

oranagra reviewed Aug 23, 2022

View reviewed changes

src/db.c Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

src/t_set.c Show resolved Hide resolved

tests/integration/replication.tcl Outdated Show resolved Hide resolved

guybe7 reviewed Aug 24, 2022

View reviewed changes

tests/integration/replication.tcl Outdated Show resolved Hide resolved

guybe7 reviewed Aug 24, 2022

View reviewed changes

Review fixes

3b13f7f

MeirShpilraien requested review from guybe7 and oranagra August 24, 2022 09:22

oranagra approved these changes Aug 24, 2022

View reviewed changes

oranagra changed the title ~~Reverts changes on #10969~~ Reverts most of the changes of #10969 Aug 24, 2022

oranagra merged commit c1bd61a into redis:unstable Aug 24, 2022

MeirShpilraien mentioned this pull request Aug 28, 2022

Module API to allow writes after key space notification hooks #11199

Merged

7 tasks

oranagra mentioned this pull request Mar 22, 2023

Fix replication inconsistency on modules that uses key space notifications #10969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reverts most of the changes of #10969 #11178

Reverts most of the changes of #10969 #11178

Uh oh!

MeirShpilraien commented Aug 23, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guybe7 Aug 24, 2022

Uh oh!

MeirShpilraien Aug 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reverts most of the changes of #10969 #11178

Reverts most of the changes of #10969 #11178

Uh oh!

Conversation

MeirShpilraien commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guybe7 Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

MeirShpilraien Aug 24, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MeirShpilraien commented Aug 23, 2022 •

edited

Loading