Remove EVAL script verbatim replication, propagation, and deterministic execution logic #9812

zhugezy · 2021-11-19T10:20:21Z

Background

The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view.

To get you back on track if you are not keeping eyes on it, please refer to these issues/PRs: PR#4966, Issue#5292, Issue#8370 .

There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue#5292:

No longer need to explain to users side effects into scripts. They can do whatever they want.

No need for a cache about scripts that we sent or not to the slaves.

No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed.

No need to store scripts inside the RDB file in order to startup correctly.

No problems about evicting keys during the script execution.

When looking back at Redis 5.0, antirez and core team decided to set the config lua-replicate-commands yes by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally.

Changes

configuration for lua-replicate-commands removed
- created config file stub for backward compatibility
Replication script cache removed
- this is useless under script effects replication
- relevant statistics also removed
script persistence in RDB files is also removed
Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed
Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order)
- the flags indicating which commands have non-deterministic results are kept as hints to clients.
redis.replicate_commands() & redis.set_repl() changed
- now redis.replicate_commands() does nothing and return an 1
- ...and then redis.set_repl() can be issued before redis.replicate_commands() now
Relevant TCL cases adjusted
DEBUG lua-always-replicate-commands removed

Other changes

Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in Redis Function #9780)

Thoughts & Discussions

Now verbatim replication is removed, is there any necessary to keep scripts persisting?
- Case 1: Persisting.
  - Redis has no SLA on persisting scripts. So users have to handle NOSCRIPT error.
- Case 2: Not Persisting.
  - Users have to handle NOSCRIPT error too.
- Users may benefit if redis persists their scripts when something like failover or else happens, but I think this reason is not strong enough to convince me keeping script persisting logics.
- Conclusion: I think NO
More?

soloestoy · 2021-11-19T11:15:30Z

Good work! ping @redis/core-team to review.

About scripts persisting, I prefer NO, since we never guarantee scripts could be persisted, and as antirez said, it can make replication clearer.

And I think we can allow eviction in lua now, fix issue #8478.

oranagra · 2021-11-19T12:15:52Z

@zhugezy thank you and welcome.
@soloestoy i was under the impression we agreed this task is assigned to @yoav-steinberg .
in any case, i think this was done too early, and will need to be re-done once #9780 is merged.
In the meanwhile, we can start discussing the details here (instead of the other 3 existing open issues / PRs on the same subject).
There are currently arguments for both keeping scripts deterministic (i'll let @yossigo post them), as well as keeping the persistence (which you argued for).

Regarding the persistence, the EVAL design was very clear from the get go that clients can't assume the script is cached, and the EVALSHA is an optimization that the client should be prepared for any case the script is not cached.
I know users kept arguing that it's buggy, without reading the theory and design behind it (which is that the script is part of the client application, that's executed in the server, and not at all the responsibility of the server, which is why it was not named and not versioned)

In redis 4.0, because of PSYNC2, these cases where the script is missing got rarer, so users may have got accustomed to it.
and now that we remove EVAL propagation, in theory we can go back on that, and never replicate or persist the EVAL scripts.
This will make a the EVAL scripts design and the differences from Redis Functions clearer, and force users to realize that.
but indeed some users will see it as a step backwards.

soloestoy · 2021-11-19T12:29:43Z

@soloestoy i was under the impression we agreed this task is assigned to @yoav-steinberg .

Oh I didn't notice it, maybe you mean this approach #8478 (comment)? But I think after this PR we can allow eviction in lua just like MULTI/EXEC, so the problem can be fixed smoothly and no need break atomic.

oranagra · 2021-11-19T12:38:25Z

I thought we agreed that both of these will be assigned to Yoav, but maybe don't recall something.
Anyway, it's too early to code it, but not too early to discuss and decide...

Regarding the OOM, even if we allow eviction, the problem still exists since the server might be configured not to allow eviction. And also, there's the other discussion to make about deterministic execution (which I mentioned above but left for Yossi to present), which may lead to concluding that we still can't evict inside a script.
But anyway, let's not mix these two topics, and discuss each of them in a separate PR.

oranagra · 2021-11-19T12:45:30Z

Let's first debate about propagation and persistence.
I remember Salvatore vocally mentioning that users should not assume scripts are cached, possibly even if the same connection that loaded them is still alive.
I don't remember where it was and what was the argument, but the thing is that the design is that the script and its exact content are actually part of the caller app (not the server's responsibility), and it should always be prepared to reload it.
The main feature is EVAL, and the SHA part is just an optimization.

yossigo · 2021-11-21T12:06:05Z

Persistence and replication: the actual guarantee as documented was never very clear and at some point (#5292 (comment)) it was limited to the scope of a single connection (although SCRIPT FLUSH can break that as well).

Redis 7 and Redis Functions is a good opportunity to clean this up and re-set the client expectations, so I think it makes sense to completely drop script replication.

Eviction: I'm not in favor of that. While it's not strictly needed for consistency any longer, I think guaranteeing atomicity is still good practice, especially given that scripts are not designed to be long running. Having scripts suddenly see keys vanishing keys during execution in Redis 7 seems like a problem to me.

Deterministic execution: This too is not strictly needed but I think not being able to enforce it, at least optionally, is a potential regression. I think it's reasonable to expect identical results if you run the same script on multiple instances with the same dataset (e.g. running tests scenarios on different envs, prod vs. staging/dev, etc.).

zhugezy · 2021-11-30T09:29:18Z

Stage summary: looks like we have reached a consensus that script persistence/replication can & should be dropped, and haven't seen any objections in the past few days. If it's done, then we can move onto the next topic.

For deterministic execution, I think non-deterministic results are consistent with users' expectations when they put "random" commands in scripts/functions and send them to multiple instances, while deterministic may not. So why bother making the result deterministic? Anyway, it would be better if someone could cite their examples about it.

oranagra · 2021-11-30T09:52:57Z

@zhugezy we had a discussion about this these topics in a core-team meeting today.
At first, we concluded that we better keep the feature in redis that allows scripts to be deterministic, but maybe let the user control if it's enabled or disabled.
we considered to either:

if we intend to keep this feature forever, then have it disabled in functions but default, and keep it enabled in eval by default (so we don't break existing scripts).
or if we intend delete it some day, we can start by making it disabled by default for both functions and eval (so users can enable it if the run into trouble), and then consider deleting the code at a later stage (kinda how the lua_replicate_commands feature was introduced in baby steps too).

but then we realized that since lua_replicate_commands was changed to default of true, it also meant that scripts where allowed to be non-deterministic by default, and we are not aware of anyone who complained.
so in fact we're already half way through on our way to delete that feature, and we can take the next step now (actually delete it).

regarding script persistence and replication, as was said before, it was never meant to be persisted and replicated, and even if some users started getting accustomed to it since PSYNC2, it's actually likely that it'll cause them bugs, since the scripts are sometimes replicated, but they're never persisted. we wanna proceed and delete that feature completely.

so i think the next step is to wait for the functions PR to be merged, and then re-do this deletion work.

zhugezy · 2021-11-30T10:03:32Z

@oranagra Totally agreed.
BTW should I set this PR as a draft, or close it now and make a new PR after Function merged?

oranagra · 2021-11-30T10:24:52Z

i think you can just force-push the new version into it when it's ready.
no sense in opening another thread when we have the discussion here.

oranagra · 2021-12-01T05:53:39Z

btw, i see this PR doesn't remove the MASTER and SLAVE consistency with EVALSHA replication test, i guess it should.
this test keeps hanging every day in the freebsd CI (i guess for being slow).
not sure what was the process to decide which tests to delete, was it just ones who failed?

zhugezy · 2021-12-01T06:33:44Z

@oranagra Oh, I didn't notice this test... I just checked all tests in scripting.tcl, tests with keywords like "lua-replicate-commands","redis.replicate_commands" or else and other failed tests, which I thought are all of them. I'll check other test files later.

oranagra

mostly LGTM.

@MeirShpilraien @yoav-steinberg please take a look

src/script_lua.c

tests/unit/scripting.tcl

oranagra

ohh, i now realized that for some reason i only looked at the second commit.
edited my comment.

oranagra · 2021-12-05T07:21:51Z

@zhugezy we discussed script persistence and propagation in a core-team meeting and concluded we want them removed.
in any case i don't see a reason to keep script persistence if we trimmed the propagation part.

also, we'll need to remove the excessive command flags soon, but let's do it after #9656 is merged.

i added these to a todo section in the top comment.

src/config.c

src/debug.c

src/script.h

src/script_lua.c

tests/unit/scripting.tcl

src/script_lua.c

oranagra · 2021-12-06T06:45:40Z

@zhugezy i remind you about MASTER and SLAVE consistency with EVALSHA replication and possibly other tests that make no sense now.
on one side, as long as they pass, maybe they still have value and increase coverage, but on the other hand some may not make any sense.
specifically for the one mentioned above, i'd like to see it gone since it keeps hanging on freebsd CI.

oranagra · 2021-12-17T06:55:20Z

@redis/core-team please approve (see top comment for a list of what's included and excluded)

itamarhaber

Removal of script replication gladly approved (not a CR)

yossigo

List of changes in top comment LGTM, did not do a complete code review.

…by-effects

oranagra · 2021-12-20T20:20:07Z

Full CI: https://github.com/redis/redis/actions/runs/1603831303

Adding command tips (see https://redis.io/topics/command-tips) to commands. Breaking changes: 1. Removed the "random" and "sort_for_script" flags. They are now command tips. (this isn't affecting redis behavior since #9812, but could affect some client applications that's relying on COMMAND command flags) Summary of changes: 1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with the same name is now implicit. 2. move RANDOM flag to a `nondeterministic_output` tip 3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip 3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips 4. deprecate (ignore) the `random` flag for RM_CreateCommand Other notes: 1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly. The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme fails if this specific shard is empty 2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT) It was added in 9e4fb96, I guess by mistake. Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake). 3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it: the reply may differ depending on the internal representation in memory) 4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now it's `nondeterministic_output_order` 5. Unrelated: Hide CMD_PROTECTED from COMMAND

) append for PR #9812

It was first added to load lua from RDB, see 28dfdca. After redis#9812, we no longer save lua in RDB. luaCreateFunction will only be called in script load and eval*, both of which are available in the client. Remove the dead code and comments to avoid misunderstandings.

It was first added to load lua from RDB, see 28dfdca. After #9812, we no longer save lua in RDB. luaCreateFunction will only be called in script load and eval*, both of which are available in the client. It could be that that some day we'll still want to load scripts from somewhere that's not a client. This fix is in dead code.

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus breaking-change This change can potentially break existing application labels Nov 21, 2021

zhugezy closed this Dec 3, 2021

zhugezy force-pushed the script-replicate-by-effects branch from 8314750 to 64f6159 Compare December 3, 2021 06:23

initial commit

9aa2aca

zhugezy reopened this Dec 3, 2021

initial commit-2.

e01dae5

oranagra reviewed Dec 5, 2021

View reviewed changes

src/script_lua.c Show resolved Hide resolved

tests/unit/scripting.tcl Outdated Show resolved Hide resolved

tests/unit/scripting.tcl Outdated Show resolved Hide resolved

tests/unit/scripting.tcl Show resolved Hide resolved

tests/unit/scripting.tcl Show resolved Hide resolved

oranagra reviewed Dec 5, 2021

View reviewed changes

oranagra added the 7.0-must-have label Dec 5, 2021

yoav-steinberg reviewed Dec 5, 2021

View reviewed changes

MeirShpilraien reviewed Dec 5, 2021

View reviewed changes

src/script_lua.c Show resolved Hide resolved

zhugezy added 3 commits December 6, 2021 14:23

part of changes on reviews.

f2cefef

indent for easier reading

42cbf82

undo the to-sort changes. this should be done after redis#9656 merged

818a815

oranagra added the approval-needed Waiting for core team approval to be merged label Dec 17, 2021

itamarhaber approved these changes Dec 18, 2021

View reviewed changes

madolson approved these changes Dec 19, 2021

View reviewed changes

yossigo approved these changes Dec 20, 2021

View reviewed changes

Merge remote-tracking branch 'origin/unstable' into script-replicate-…

72303f3

…by-effects

oranagra approved these changes Dec 20, 2021

View reviewed changes

oranagra changed the title ~~Remove script verbatim replication logics.~~ Remove EVAL script verbatim replication, propagation, and deterministic execution logic Dec 20, 2021

soloestoy approved these changes Dec 21, 2021

View reviewed changes

oranagra merged commit 1b0968d into redis:unstable Dec 21, 2021

soloestoy mentioned this pull request Dec 27, 2021

Sort out mess around propagation and MULTI/EXEC #9890

Merged

tezc mentioned this pull request Dec 28, 2021

Fixed typo in test tag #10021

Merged

oranagra mentioned this pull request Jan 12, 2022

Add command tips to COMMAND DOCS #10104

Merged

chenyang8094 mentioned this pull request Feb 9, 2022

Remove unused code - leftover from script replication mechanisms #10272

Merged

oranagra pushed a commit that referenced this pull request Feb 9, 2022

Remove unused code - leftover from script replication mechanisms (#10272

ceeff6b

) append for PR #9812

soloestoy mentioned this pull request May 31, 2022

Potential data inconsistency when upgrade old version to 7.0 since old version's EVAL use verbatim replication #10799

Closed

enjoy-binbin mentioned this pull request Dec 4, 2023

Check whether the client is NULL in luaCreateFunction #12829

Merged

OMG-By mentioned this pull request Feb 19, 2025

[BUG]lua information is not persisted in the rdb file and sent to replica #13812

Closed

sundb mentioned this pull request Aug 28, 2025

[BUG] SCRIPT LOAD Command No Longer Replicates to Replicas in Redis 7.x #14314

Closed

Remove EVAL script verbatim replication, propagation, and deterministic execution logic #9812

Remove EVAL script verbatim replication, propagation, and deterministic execution logic #9812

Uh oh!

Conversation

zhugezy commented Nov 19, 2021 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

Other changes

Thoughts & Discussions

Uh oh!

soloestoy commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Nov 19, 2021

Uh oh!

soloestoy commented Nov 19, 2021

Uh oh!

oranagra commented Nov 19, 2021

Uh oh!

oranagra commented Nov 19, 2021

Uh oh!

yossigo commented Nov 21, 2021

Uh oh!

zhugezy commented Nov 30, 2021

Uh oh!

oranagra commented Nov 30, 2021

Uh oh!

zhugezy commented Nov 30, 2021

Uh oh!

oranagra commented Nov 30, 2021

Uh oh!

oranagra commented Dec 1, 2021

Uh oh!

zhugezy commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Dec 5, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra commented Dec 6, 2021

Uh oh!

oranagra commented Dec 17, 2021

Uh oh!

itamarhaber left a comment

Choose a reason for hiding this comment

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Dec 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

zhugezy commented Nov 19, 2021 •

edited by oranagra

Loading

soloestoy commented Nov 19, 2021 •

edited

Loading

zhugezy commented Dec 1, 2021 •

edited

Loading

oranagra left a comment •

edited

Loading