Skip to content

Conversation

@MeirShpilraien
Copy link

@MeirShpilraien MeirShpilraien commented Aug 23, 2022

The PR reverts the changes made on #10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged.

The issue is that if there is a lazy expire during the command invocation, the del command is added to the replication stream after the command placeholder. So the logical order on the primary is:

  • Delete the key (lazy expiration)
  • Command invocation

But the replication stream gets it the other way around:

  • Command invocation (because the command is written into the placeholder)
  • Delete the key (lazy expiration)

So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica.

One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow:

  • User perform SINTERSTORE
  • When Redis tries to fetch the second input key it triggers lazy expire
  • The lazy expire trigger a module logic that deletes the first input key
  • Now Redis hold the robj of the first input key that was actually freed

We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure.

Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that was reverted are:

  • Saving a placeholder for replication at the beginning of the command (call function)
  • Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR)
  • Spop changes are no longer needed (because we reverted the placeholder code)

Changes that was not reverted:

  • On expire/eviction, wrap the del and the notification effect in a multi exec.
  • PropagateNow function can still accept a special dbid, -1, indicating not to replicate select.
  • Keep optimisation for reusing the alsoPropagate array instead of allocating it each time.

Tests:

  • All tests was kept and only few tests was modify to work correctly with the changes
  • Test was added to verify that the revert fixes the issues.

The PR reverts the changes made on redis#10969. The reason for revert was trigger
because of occasional test failure that started after the PR was merged.

The issue is that if there is a lazy expire during the command invocation,
the del command is added to the replication stream after the command placeholder.
So the logical order on the primary is:

* Delete the key (lazy expiration)
* Command invocation

But the replication stream gets it the other way around:

* Command invocation (because the command is written into the placeholder)
* Delete the key (lazy expiration)

So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica.
One solution we considered is to add another lazy expire replication stream and write all the lazy expire there.
Then when replicating, we will replicate the lazy expire replication stream first.
This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find.
One of the example we thought about (that can actually crashes Redis) is as follow:

* User perform SINTERSTORE
* When Redis tries to fetch the second input key it triggers lazy expire
* The lazy expire trigger a module logic that deletes the first input key
* Now Redis hold the robj of the first input key that was actually freed

We believe we took the wrong approach and we will come up with another PR that
solve the problem differently, for now we revert the changes so we will not
have the tests failure.

Notice that not the entire code was revert, some parts of the PR are changes that we
would like to keep. The changes that was reverted are:

* Saving a placeholder for replication at the begining of the command (`call` function)
* Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR)
* Spop changes are no longer needed (because we reverted the placeholder code)

Changes that was not reverted:

* On expire/eviction, wrap the `del` and the notification effect in a multi exec.
* PropagateNow function can still accept a special dbid, -1, indicating not to replicate select.
* Keep optimization for reusing the alsoPropagate array instead of allocating it each time.

Tests:

* All tests was kept and only few tests was modify to work correctly with the changes
* Test was added to verify that the revert fixes the issues.
assert_equal 1 [$master wait 1 1]

assert_equal "set" [$master type s]
assert_equal "set" [$slave type s]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before this commit the replica didn't have foo at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replica did not have the set at all.

@oranagra oranagra changed the title Reverts changes on #10969 Reverts most of the changes of #10969 Aug 24, 2022
@oranagra oranagra merged commit c1bd61a into redis:unstable Aug 24, 2022
MeirShpilraien pushed a commit to MeirShpilraien/redis that referenced this pull request Aug 28, 2022
Related PRs: redis#10969, redis#11178

The following PR is a proposal of handling write operations inside key space notifications. The PR is not yet finished (see todo list at the end) but I would like to start getting opinions whether or not the approach make sense and whether there are any issues that we might have missed.

After a lot of discussions we came to a conclussion that module
should not perform any write operations on key space notifcation.

Some examples of issues that such write operation can cause are describe
on the following links:

* Bad replication oreder - redis#10969
* Used after free - redis#10969 (comment)
* Used after free - redis#9406 (comment)

There are probably more issues that are yet to be discovered. The underline problem
with writing inside key space notification is that the notification runs synchronously,
this means that the notification code will be executed in the middle on Redis logic
(commands logic, eviction, expire). Redis **do not assume** that the data might change
while running the logic and such changes can crash Redis or cause unexpected behaviure.

One solution is to go over all those places (that assumes no data changes) and fix them
such that they will be tollerable to data changes. This apporach will probably require a
lot of changes all around the code and it will be very hard to know whether or not we got
all those issues. Moreover, it will be very hard to maintain such code and be sure that
we are not adding more places that assume no data changes.

Another approach (which presented in this PR) is to state that modules **should not** perform
any write command inside key space notification (we can chose whether or not we want to force it).
To still cover the use-case where module wants to perform a write operation as a reaction to
key space notifications, we introduce a new API that, `RedisModule_AddPostNotificationJob`, allows to register a callback that will be
called by Redis when the following conditions hold:

* It is safe to perform any write operation.
* The job will be called atomically along side the notification (the notification effect and the
  job effect will be replicated to the replication and the aof wrapped with multi exec).

Module can use this new API to safely perform any write operation and still achieve atomicity
betweent the notification and the write.

### Technical Details

Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostKeyspaceJobs`) that need to be invoke after the current logic ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `propagatePendingCommands` just before we actually propagating the effects.

If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostKeyspaceJobs` list and will be invoke atomically after the current callback ends. In order to avoid infinite loop we limit the number of jobs that can be registered in a single logic unit (command, eviction, active expire). This limitation is configurable using `maxpostnotificationsjobs` configuration value.
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
The PR reverts the changes made on redis#10969.
The reason for revert was trigger because of occasional test failure
that started after the PR was merged.

The issue is that if there is a lazy expire during the command invocation,
the `del` command is added to the replication stream after the command
placeholder. So the logical order on the primary is:

* Delete the key (lazy expiration)
* Command invocation

But the replication stream gets it the other way around:

* Command invocation (because the command is written into the placeholder)
* Delete the key (lazy expiration)

So if the command write to the key that was just lazy expired we will get
inconsistency between primary and replica.

One solution we considered is to add another lazy expire replication stream
and write all the lazy expire there. Then when replicating, we will replicate the
lazy expire replication stream first. This will solve this specific test failure but
we realize that the issues does not ends here and the more we dig the more
problems we find.One of the example we thought about (that can actually
crashes Redis) is as follow:

* User perform SINTERSTORE
* When Redis tries to fetch the second input key it triggers lazy expire
* The lazy expire trigger a module logic that deletes the first input key
* Now Redis hold the robj of the first input key that was actually freed

We believe we took the wrong approach and we will come up with another
PR that solve the problem differently, for now we revert the changes so we
will not have the tests failure.

Notice that not the entire code was revert, some parts of the PR are changes
that we would like to keep. The changes that **was** reverted are:

* Saving a placeholder for replication at the beginning of the command (`call` function)
* Order of the replication stream on active expire and eviction (we will decide how
  to handle it correctly on follow up PR)
* `Spop` changes are no longer needed (because we reverted the placeholder code)

Changes that **was not** reverted:

* On expire/eviction, wrap the `del` and the notification effect in a multi exec.
* `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select.
* Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time.

Tests:

* All tests was kept and only few tests was modify to work correctly with the changes
* Test was added to verify that the revert fixes the issues.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
The PR reverts the changes made on redis#10969.
The reason for revert was trigger because of occasional test failure
that started after the PR was merged.

The issue is that if there is a lazy expire during the command invocation,
the `del` command is added to the replication stream after the command
placeholder. So the logical order on the primary is:

* Delete the key (lazy expiration)
* Command invocation

But the replication stream gets it the other way around:

* Command invocation (because the command is written into the placeholder)
* Delete the key (lazy expiration)

So if the command write to the key that was just lazy expired we will get
inconsistency between primary and replica.

One solution we considered is to add another lazy expire replication stream
and write all the lazy expire there. Then when replicating, we will replicate the
lazy expire replication stream first. This will solve this specific test failure but
we realize that the issues does not ends here and the more we dig the more
problems we find.One of the example we thought about (that can actually
crashes Redis) is as follow:

* User perform SINTERSTORE
* When Redis tries to fetch the second input key it triggers lazy expire
* The lazy expire trigger a module logic that deletes the first input key
* Now Redis hold the robj of the first input key that was actually freed

We believe we took the wrong approach and we will come up with another
PR that solve the problem differently, for now we revert the changes so we
will not have the tests failure.

Notice that not the entire code was revert, some parts of the PR are changes
that we would like to keep. The changes that **was** reverted are:

* Saving a placeholder for replication at the beginning of the command (`call` function)
* Order of the replication stream on active expire and eviction (we will decide how
  to handle it correctly on follow up PR)
* `Spop` changes are no longer needed (because we reverted the placeholder code)

Changes that **was not** reverted:

* On expire/eviction, wrap the `del` and the notification effect in a multi exec.
* `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select.
* Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time.

Tests:

* All tests was kept and only few tests was modify to work correctly with the changes
* Test was added to verify that the revert fixes the issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants