-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Fix replication inconsistency on modules that uses key space notifications #10969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replication inconsistency on modules that uses key space notifications #10969
Conversation
…tions.
### The Problem
In general, key space notifcations are invoked after the command logic was executed (this is not always the case, we will discuss later about spacific command that do not follow this rules). For exampel, the `set x 1` will trigger a `set` notification that will be invoked after the `set` logic was perfromed, so if the notification logic will try to fetch `x`, it will see the new data that was writen.
Consider the scenario on which the notification logic performs some write commands. for example, the notification logic increase some counter, `incr x{counter}`, indicating how many times `x` was changed. The logical order by which the logic was executed is has followi:
```
set x 1
incr x{counter}
```
The issue is that the `set x 1` command is added to the replication buffer at the end of the command invocation (spacifically after the key space notification logic was invoked and performed the `incr` command). The replication/aof sees the commands in the wrong order:
```
incr x{counter}
set x 1
```
In this spacifica example the order is less important. But if, for example, the notifcation would have deleted `x` then we would end up with primary-replica inconsistency.
### The Solution
Put the command that cause the notification in its rightfull place. In the above example, the `set x 1` command logic was executed before the notification logic, so it should be added to the replication buffer before the commands that is invoked by the notificaiton logic. To achieve this, without a major code refactoring, we save a placeholder in the replication buffer, when finishing invoking the command logic we check if the command need to be replicated, and if it does, we use the placeholder to add it to the replication buffer instead of appending it to the end.
To be efficient and not allocating memory on each command to save the placeholder, the replication buffer array was modified to reuse memory (instead of allocating it each time we want to replicate commands).
#### Exceptions
* Expire and Eviction notifications:
* Expire/Eviction logical order was to first perform the Expire/Eviction and then the notification logic. The replication buffer got this in the other way around (first notification effect and then the `del` command). The PR fixes this issue.
* The notification effect and the `del` command was not wrap with `multi-exec` (if needed). The PR also fix this issue.
* SPOP command:
* On spop, the `spop` notification was fired before the command logic was executed. The change in this PR would have cause the replication order to be change (first `spop` command and then notification `logic`) altouhg the logical order is first the notification logic and then the `spop` logic. The right would have been to move the notification to be fire after the command was executed (like all the other commands), but this can be considered a breaking change. To overcome this, the PR keeps the current behaviure and changes the `spop` code to keep the right logical order when pushing commands to the replication buffer.
#### Testing
* We already have tests to cover cases where a notification is invoking write commands that are also added to the replication buffer, the tests was modified to verify that the replica gets the command in the correct logical order.
* Test was added to verify that `spop` behaviure was kept unchanged.
|
@MeirShpilraien didn't we agree to fix spop in a breaking manner? |
|
@oranagra we said that he will first fix it without breaking so we can cherrypick this to 7.0 if we want to. We will fix it proparly after, in anothet RP. |
|
Ohh, I forgot. Better outline that plan in the top comment. |
Co-authored-by: Oran Agra <oran@redislabs.com>
|
@yossigo please take another look. |
Co-authored-by: Oran Agra <oran@redislabs.com>
|
@redis/core-team not really a major-decision since there's no interface change, but it's a sensitive area so maybe one of you wants to review it too. |
yossigo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oranagra I agree this isn't a major decision, but it is bug fix that includes a breaking change and potential for undesired side effects so I vote for holding back on a backport at the moment.
|
the PR doesn't include a breaking change (just a bug fix). |
|
Notice that we have an occasional test failure because of the changes on this PR: https://github.com/redis/redis/runs/7927670765?check_suite_focus=true The issue is that if there is a lazy expire during the command invocation, the
But the replication stream gets it the other way around:
So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. The following test can constantly reproduce the issue: One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find. One of the example we thought about (that can actually crashes Redis) is as follow:
We believe we took the wrong approach here and we would like to suggested a new approach:
We are planing to revert this PR and implements this post command hook idea on another PR. |
The PR reverts the changes made on redis#10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the del command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realise that the issues does not ends here and the more we dig the more problems we find. One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that was reverted are: * Saving a placeholder for replication at the begining of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * Spop changes are no longer needed (because we reverted the placeholder code) Changes that was not reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * PropagateNow function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimization for reusing the alsoPropagate array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.
The PR reverts the changes made on #10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the `del` command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realize that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that **was** reverted are: * Saving a placeholder for replication at the beginning of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * `Spop` changes are no longer needed (because we reverted the placeholder code) Changes that **was not** reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.
Related PRs: redis#10969, redis#11178 The following PR is a proposal of handling write operations inside key space notifications. The PR is not yet finished (see todo list at the end) but I would like to start getting opinions whether or not the approach make sense and whether there are any issues that we might have missed. After a lot of discussions we came to a conclussion that module should not perform any write operations on key space notifcation. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviure. One solution is to go over all those places (that assumes no data changes) and fix them such that they will be tollerable to data changes. This apporach will probably require a lot of changes all around the code and it will be very hard to know whether or not we got all those issues. Moreover, it will be very hard to maintain such code and be sure that we are not adding more places that assume no data changes. Another approach (which presented in this PR) is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API that, `RedisModule_AddPostNotificationJob`, allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the notification (the notification effect and the job effect will be replicated to the replication and the aof wrapped with multi exec). Module can use this new API to safely perform any write operation and still achieve atomicity betweent the notification and the write. ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostKeyspaceJobs`) that need to be invoke after the current logic ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `propagatePendingCommands` just before we actually propagating the effects. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostKeyspaceJobs` list and will be invoke atomically after the current callback ends. In order to avoid infinite loop we limit the number of jobs that can be registered in a single logic unit (command, eviction, active expire). This limitation is configurable using `maxpostnotificationsjobs` configuration value.
### Summary of API additions * `RedisModule_AddPostNotificationJob` - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above. * New module option, `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`, allows to disable Redis protection of nested key-space notifications. * `RedisModule_GetModuleOptionsAll` - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance. ### Background The following PR is a proposal of handling write operations inside module key space notifications. After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - #10969 * Used after free - #10969 (comment) * Used after free - #9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour. The solution is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , `RedisModule_AddPostNotificationJob`, that allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the operation that triggers it (in our case, key space notification). Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write. Although currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example). ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostExecUnitJobs`) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `postExecutionUnitOperations` (which was `propagatePendingCommands` before this PR). The new function fires the post jobs and then calls `propagatePendingCommands`. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostExecUnitJobs` list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so **Redis will make no attempt to protect the module from infinite loops** In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module. ### Redis infrastructure This PR promotes the existing `propagatePendingCommands` to an "Execution Unit" concept, which is called after each atomic unit of execution, Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
The PR reverts the changes made on redis#10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the `del` command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realize that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that **was** reverted are: * Saving a placeholder for replication at the beginning of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * `Spop` changes are no longer needed (because we reverted the placeholder code) Changes that **was not** reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.
…11199) ### Summary of API additions * `RedisModule_AddPostNotificationJob` - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above. * New module option, `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`, allows to disable Redis protection of nested key-space notifications. * `RedisModule_GetModuleOptionsAll` - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance. ### Background The following PR is a proposal of handling write operations inside module key space notifications. After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour. The solution is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , `RedisModule_AddPostNotificationJob`, that allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the operation that triggers it (in our case, key space notification). Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write. Although currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example). ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostExecUnitJobs`) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `postExecutionUnitOperations` (which was `propagatePendingCommands` before this PR). The new function fires the post jobs and then calls `propagatePendingCommands`. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostExecUnitJobs` list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so **Redis will make no attempt to protect the module from infinite loops** In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module. ### Redis infrastructure This PR promotes the existing `propagatePendingCommands` to an "Execution Unit" concept, which is called after each atomic unit of execution, Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
…tions (redis#10969) Fix replication inconsistency on modules that uses key space notifications. ### The Problem In general, key space notifications are invoked after the command logic was executed (this is not always the case, we will discuss later about specific command that do not follow this rules). For example, the `set x 1` will trigger a `set` notification that will be invoked after the `set` logic was performed, so if the notification logic will try to fetch `x`, it will see the new data that was written. Consider the scenario on which the notification logic performs some write commands. for example, the notification logic increase some counter, `incr x{counter}`, indicating how many times `x` was changed. The logical order by which the logic was executed is has follow: ``` set x 1 incr x{counter} ``` The issue is that the `set x 1` command is added to the replication buffer at the end of the command invocation (specifically after the key space notification logic was invoked and performed the `incr` command). The replication/aof sees the commands in the wrong order: ``` incr x{counter} set x 1 ``` In this specific example the order is less important. But if, for example, the notification would have deleted `x` then we would end up with primary-replica inconsistency. ### The Solution Put the command that cause the notification in its rightful place. In the above example, the `set x 1` command logic was executed before the notification logic, so it should be added to the replication buffer before the commands that is invoked by the notification logic. To achieve this, without a major code refactoring, we save a placeholder in the replication buffer, when finishing invoking the command logic we check if the command need to be replicated, and if it does, we use the placeholder to add it to the replication buffer instead of appending it to the end. To be efficient and not allocating memory on each command to save the placeholder, the replication buffer array was modified to reuse memory (instead of allocating it each time we want to replicate commands). Also, to avoid saving a placeholder when not needed, we do it only for WRITE or MAY_REPLICATE commands. #### Additional Fixes * Expire and Eviction notifications: * Expire/Eviction logical order was to first perform the Expire/Eviction and then the notification logic. The replication buffer got this in the other way around (first notification effect and then the `del` command). The PR fixes this issue. * The notification effect and the `del` command was not wrap with `multi-exec` (if needed). The PR also fix this issue. * SPOP command: * On spop, the `spop` notification was fired before the command logic was executed. The change in this PR would have cause the replication order to be change (first `spop` command and then notification `logic`) although the logical order is first the notification logic and then the `spop` logic. The right fix would have been to move the notification to be fired after the command was executed (like all the other commands), but this can be considered a breaking change. To overcome this, the PR keeps the current behavior and changes the `spop` code to keep the right logical order when pushing commands to the replication buffer. Another PR will follow to fix the SPOP properly and match it to the other command (we split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if we chose to). #### Unhanded Known Limitations * key miss event: * On key miss event, if a module performed some write command on the event (using `RM_Call`), the `dirty` counter would increase and the read command that cause the key miss event would be replicated to the replication and aof. This problem can also happened on a write command that open some keys but eventually decides not to perform any action. We decided not to handle this problem on this PR because the solution is complex and will cause additional risks in case we will want to cherry-pick this PR. We should decide if we want to handle it in future PR's. For now, modules writers is advice not to perform any write commands on key miss event. #### Testing * We already have tests to cover cases where a notification is invoking write commands that are also added to the replication buffer, the tests was modified to verify that the replica gets the command in the correct logical order. * Test was added to verify that `spop` behavior was kept unchanged. * Test was added to verify key miss event behave as expected. * Test was added to verify the changes do not break lazy expiration. #### Additional Changes * `propagateNow` function can accept a special dbid, -1, indicating not to replicate `select`. We use this to replicate `multi/exec` on `propagatePendingCommands` function. The side effect of this change is that now the `select` command will appear inside the `multi/exec` block on the replication stream (instead of outside of the `multi/exec` block). Tests was modified to match this new behavior.
The PR reverts the changes made on redis#10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the `del` command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realize that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that **was** reverted are: * Saving a placeholder for replication at the beginning of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * `Spop` changes are no longer needed (because we reverted the placeholder code) Changes that **was not** reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.
…11199) ### Summary of API additions * `RedisModule_AddPostNotificationJob` - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above. * New module option, `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`, allows to disable Redis protection of nested key-space notifications. * `RedisModule_GetModuleOptionsAll` - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance. ### Background The following PR is a proposal of handling write operations inside module key space notifications. After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour. The solution is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , `RedisModule_AddPostNotificationJob`, that allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the operation that triggers it (in our case, key space notification). Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write. Although currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example). ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostExecUnitJobs`) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `postExecutionUnitOperations` (which was `propagatePendingCommands` before this PR). The new function fires the post jobs and then calls `propagatePendingCommands`. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostExecUnitJobs` list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so **Redis will make no attempt to protect the module from infinite loops** In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module. ### Redis infrastructure This PR promotes the existing `propagatePendingCommands` to an "Execution Unit" concept, which is called after each atomic unit of execution, Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Yossi Gottlieb <yossigo@gmail.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
NOTICE! The main part of this change has been later reverted in #11178, and isn't included in 7.2 RC1, despite what the release notes state
see #10969 (comment) as to why
Fix replication inconsistency on modules that uses key space notifications.
The Problem
In general, key space notifications are invoked after the command logic was executed (this is not always the case, we will discuss later about specific command that do not follow this rules). For example, the
set x 1will trigger asetnotification that will be invoked after thesetlogic was performed, so if the notification logic will try to fetchx, it will see the new data that was written.Consider the scenario on which the notification logic performs some write commands. for example, the notification logic increase some counter,
incr x{counter}, indicating how many timesxwas changed. The logical order by which the logic was executed is has follow:The issue is that the
set x 1command is added to the replication buffer at the end of the command invocation (specifically after the key space notification logic was invoked and performed theincrcommand). The replication/aof sees the commands in the wrong order:In this specific example the order is less important. But if, for example, the notification would have deleted
xthen we would end up with primary-replica inconsistency.The Solution
Put the command that cause the notification in its rightful place. In the above example, the
set x 1command logic was executed before the notification logic, so it should be added to the replication buffer before the commands that is invoked by the notification logic. To achieve this, without a major code refactoring, we save a placeholder in the replication buffer, when finishing invoking the command logic we check if the command need to be replicated, and if it does, we use the placeholder to add it to the replication buffer instead of appending it to the end.To be efficient and not allocating memory on each command to save the placeholder, the replication buffer array was modified to reuse memory (instead of allocating it each time we want to replicate commands). Also, to avoid saving a placeholder when not needed, we do it only for WRITE or MAY_REPLICATE commands.
Additional Fixes
delcommand). The PR fixes this issue.delcommand was not wrap withmulti-exec(if needed). The PR also fix this issue.spopnotification was fired before the command logic was executed. The change in this PR would have cause the replication order to be change (firstspopcommand and then notificationlogic) although the logical order is first the notification logic and then thespoplogic. The right fix would have been to move the notification to be fired after the command was executed (like all the other commands), but this can be considered a breaking change. To overcome this, the PR keeps the current behaviour and changes thespopcode to keep the right logical order when pushing commands to the replication buffer. Another PR will follow to fix the SPOP properly and match it to the other command (we split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if we chose to).Unhanded Known Limitations
RM_Call), thedirtycounter would increase and the read command that cause the key miss event would be replicated to the replication and aof. This problem can also happened on a write command that open some keys but eventually decides not to perform any action. We decided not to handle this problem on this PR because the solution is complex and will cause additional risks in case we will want to cherry-pick this PR. We should decide if we want to handle it in future PR's (@oranagra @yossigo @madolson FYI). For now, modules writers is advice not to perform any write commands on key miss event.Testing
spopbehaviour was kept unchanged.Additional Changes
propagateNowfunction can accept a special dbid, -1, indicating not to replicateselect. We use this to replicatemulti/execonpropagatePendingCommandsfunction. The side effect of this change is that now theselectcommand will appear inside themulti/execblock on the replication stream (instead of outside of themulti/execblock). Tests was modified to match this new behaviour.