Skip to content

Conversation

@zhugezy
Copy link
Contributor

@zhugezy zhugezy commented Nov 19, 2021

Background

The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view.

To get you back on track if you are not keeping eyes on it, please refer to these issues/PRs: PR#4966, Issue#5292, Issue#8370 .

There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue#5292:

  1. No longer need to explain to users side effects into scripts. They can do whatever they want.
  2. No need for a cache about scripts that we sent or not to the slaves.
  3. No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed.
  4. No need to store scripts inside the RDB file in order to startup correctly.
  5. No problems about evicting keys during the script execution.

When looking back at Redis 5.0, antirez and core team decided to set the config lua-replicate-commands yes by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally.

Changes

  • configuration for lua-replicate-commands removed
    • created config file stub for backward compatibility
  • Replication script cache removed
    • this is useless under script effects replication
    • relevant statistics also removed
  • script persistence in RDB files is also removed
  • Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed
  • Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order)
    • the flags indicating which commands have non-deterministic results are kept as hints to clients.
  • redis.replicate_commands() & redis.set_repl() changed
    • now redis.replicate_commands() does nothing and return an 1
    • ...and then redis.set_repl() can be issued before redis.replicate_commands() now
  • Relevant TCL cases adjusted
  • DEBUG lua-always-replicate-commands removed

Other changes

  • Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in Redis Function #9780)

Thoughts & Discussions

  • Now verbatim replication is removed, is there any necessary to keep scripts persisting?
    • Case 1: Persisting.
      • Redis has no SLA on persisting scripts. So users have to handle NOSCRIPT error.
    • Case 2: Not Persisting.
      • Users have to handle NOSCRIPT error too.
    • Users may benefit if redis persists their scripts when something like failover or else happens, but I think this reason is not strong enough to convince me keeping script persisting logics.
    • Conclusion: I think NO
  • More?

@soloestoy
Copy link
Contributor

soloestoy commented Nov 19, 2021

Good work! ping @redis/core-team to review.

About scripts persisting, I prefer NO, since we never guarantee scripts could be persisted, and as antirez said, it can make replication clearer.

And I think we can allow eviction in lua now, fix issue #8478.

@oranagra
Copy link
Member

@zhugezy thank you and welcome.
@soloestoy i was under the impression we agreed this task is assigned to @yoav-steinberg .
in any case, i think this was done too early, and will need to be re-done once #9780 is merged.
In the meanwhile, we can start discussing the details here (instead of the other 3 existing open issues / PRs on the same subject).
There are currently arguments for both keeping scripts deterministic (i'll let @yossigo post them), as well as keeping the persistence (which you argued for).

Regarding the persistence, the EVAL design was very clear from the get go that clients can't assume the script is cached, and the EVALSHA is an optimization that the client should be prepared for any case the script is not cached.
I know users kept arguing that it's buggy, without reading the theory and design behind it (which is that the script is part of the client application, that's executed in the server, and not at all the responsibility of the server, which is why it was not named and not versioned)

In redis 4.0, because of PSYNC2, these cases where the script is missing got rarer, so users may have got accustomed to it.
and now that we remove EVAL propagation, in theory we can go back on that, and never replicate or persist the EVAL scripts.
This will make a the EVAL scripts design and the differences from Redis Functions clearer, and force users to realize that.
but indeed some users will see it as a step backwards.

@soloestoy
Copy link
Contributor

@soloestoy i was under the impression we agreed this task is assigned to @yoav-steinberg .

Oh I didn't notice it, maybe you mean this approach #8478 (comment)? But I think after this PR we can allow eviction in lua just like MULTI/EXEC, so the problem can be fixed smoothly and no need break atomic.

@oranagra
Copy link
Member

I thought we agreed that both of these will be assigned to Yoav, but maybe don't recall something.
Anyway, it's too early to code it, but not too early to discuss and decide...

Regarding the OOM, even if we allow eviction, the problem still exists since the server might be configured not to allow eviction. And also, there's the other discussion to make about deterministic execution (which I mentioned above but left for Yossi to present), which may lead to concluding that we still can't evict inside a script.
But anyway, let's not mix these two topics, and discuss each of them in a separate PR.

@oranagra
Copy link
Member

Let's first debate about propagation and persistence.
I remember Salvatore vocally mentioning that users should not assume scripts are cached, possibly even if the same connection that loaded them is still alive.
I don't remember where it was and what was the argument, but the thing is that the design is that the script and its exact content are actually part of the caller app (not the server's responsibility), and it should always be prepared to reload it.
The main feature is EVAL, and the SHA part is just an optimization.

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus breaking-change This change can potentially break existing application labels Nov 21, 2021
@yossigo
Copy link
Collaborator

yossigo commented Nov 21, 2021

Persistence and replication: the actual guarantee as documented was never very clear and at some point (#5292 (comment)) it was limited to the scope of a single connection (although SCRIPT FLUSH can break that as well).

Redis 7 and Redis Functions is a good opportunity to clean this up and re-set the client expectations, so I think it makes sense to completely drop script replication.

Eviction: I'm not in favor of that. While it's not strictly needed for consistency any longer, I think guaranteeing atomicity is still good practice, especially given that scripts are not designed to be long running. Having scripts suddenly see keys vanishing keys during execution in Redis 7 seems like a problem to me.

Deterministic execution: This too is not strictly needed but I think not being able to enforce it, at least optionally, is a potential regression. I think it's reasonable to expect identical results if you run the same script on multiple instances with the same dataset (e.g. running tests scenarios on different envs, prod vs. staging/dev, etc.).

@zhugezy
Copy link
Contributor Author

zhugezy commented Nov 30, 2021

Stage summary: looks like we have reached a consensus that script persistence/replication can & should be dropped, and haven't seen any objections in the past few days. If it's done, then we can move onto the next topic.

For deterministic execution, I think non-deterministic results are consistent with users' expectations when they put "random" commands in scripts/functions and send them to multiple instances, while deterministic may not. So why bother making the result deterministic? Anyway, it would be better if someone could cite their examples about it.

@oranagra
Copy link
Member

@zhugezy we had a discussion about this these topics in a core-team meeting today.
At first, we concluded that we better keep the feature in redis that allows scripts to be deterministic, but maybe let the user control if it's enabled or disabled.
we considered to either:

  1. if we intend to keep this feature forever, then have it disabled in functions but default, and keep it enabled in eval by default (so we don't break existing scripts).
  2. or if we intend delete it some day, we can start by making it disabled by default for both functions and eval (so users can enable it if the run into trouble), and then consider deleting the code at a later stage (kinda how the lua_replicate_commands feature was introduced in baby steps too).

but then we realized that since lua_replicate_commands was changed to default of true, it also meant that scripts where allowed to be non-deterministic by default, and we are not aware of anyone who complained.
so in fact we're already half way through on our way to delete that feature, and we can take the next step now (actually delete it).

regarding script persistence and replication, as was said before, it was never meant to be persisted and replicated, and even if some users started getting accustomed to it since PSYNC2, it's actually likely that it'll cause them bugs, since the scripts are sometimes replicated, but they're never persisted. we wanna proceed and delete that feature completely.

so i think the next step is to wait for the functions PR to be merged, and then re-do this deletion work.

@zhugezy
Copy link
Contributor Author

zhugezy commented Nov 30, 2021

@oranagra Totally agreed.
BTW should I set this PR as a draft, or close it now and make a new PR after Function merged?

@oranagra
Copy link
Member

i think you can just force-push the new version into it when it's ready.
no sense in opening another thread when we have the discussion here.

@oranagra
Copy link
Member

oranagra commented Dec 1, 2021

btw, i see this PR doesn't remove the MASTER and SLAVE consistency with EVALSHA replication test, i guess it should.
this test keeps hanging every day in the freebsd CI (i guess for being slow).
not sure what was the process to decide which tests to delete, was it just ones who failed?

@zhugezy
Copy link
Contributor Author

zhugezy commented Dec 1, 2021

@oranagra Oh, I didn't notice this test... I just checked all tests in scripting.tcl, tests with keywords like "lua-replicate-commands","redis.replicate_commands" or else and other failed tests, which I thought are all of them. I'll check other test files later.

@zhugezy zhugezy closed this Dec 3, 2021
@zhugezy zhugezy force-pushed the script-replicate-by-effects branch from 8314750 to 64f6159 Compare December 3, 2021 06:23
@zhugezy zhugezy reopened this Dec 3, 2021
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM.

@MeirShpilraien @yoav-steinberg please take a look

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, i now realized that for some reason i only looked at the second commit.
edited my comment.

@oranagra
Copy link
Member

oranagra commented Dec 5, 2021

@zhugezy we discussed script persistence and propagation in a core-team meeting and concluded we want them removed.
in any case i don't see a reason to keep script persistence if we trimmed the propagation part.

also, we'll need to remove the excessive command flags soon, but let's do it after #9656 is merged.

i added these to a todo section in the top comment.

@oranagra
Copy link
Member

oranagra commented Dec 6, 2021

@zhugezy i remind you about MASTER and SLAVE consistency with EVALSHA replication and possibly other tests that make no sense now.
on one side, as long as they pass, maybe they still have value and increase coverage, but on the other hand some may not make any sense.
specifically for the one mentioned above, i'd like to see it gone since it keeps hanging on freebsd CI.

@oranagra
Copy link
Member

@redis/core-team please approve (see top comment for a list of what's included and excluded)

@oranagra oranagra added the approval-needed Waiting for core team approval to be merged label Dec 17, 2021
Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of script replication gladly approved (not a CR)

Copy link
Collaborator

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of changes in top comment LGTM, did not do a complete code review.

@oranagra
Copy link
Member

@oranagra oranagra changed the title Remove script verbatim replication logics. Remove EVAL script verbatim replication, propagation, and deterministic execution logic Dec 20, 2021
@oranagra oranagra merged commit 1b0968d into redis:unstable Dec 21, 2021
@tezc tezc mentioned this pull request Dec 28, 2021
oranagra pushed a commit that referenced this pull request Jan 20, 2022
Adding command tips (see https://redis.io/topics/command-tips) to commands.

Breaking changes:
1. Removed the "random" and "sort_for_script" flags. They are now command tips.
(this isn't affecting redis behavior since #9812, but could affect some client applications
that's relying on COMMAND command flags)

Summary of changes:
1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with
  the same name is now implicit.
2. move RANDOM flag to a `nondeterministic_output` tip
3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip
3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips
4. deprecate (ignore) the `random` flag for RM_CreateCommand

Other notes:
1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly.
  The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme
  fails if this specific shard is empty
2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT)
   It was added in 9e4fb96, I guess by mistake.
   Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake).
3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it:
   the reply may differ depending on the internal representation in memory)
4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now
  it's `nondeterministic_output_order`
5. Unrelated: Hide CMD_PROTECTED from COMMAND
oranagra pushed a commit that referenced this pull request Feb 9, 2022
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Dec 4, 2023
It was first added to load lua from RDB, see 28dfdca. After redis#9812,
we no longer save lua in RDB. luaCreateFunction will only be called
in script load and eval*, both of which are available in the client.

Remove the dead code and comments to avoid misunderstandings.
oranagra pushed a commit that referenced this pull request Dec 4, 2023
It was first added to load lua from RDB, see 28dfdca. After #9812,
we no longer save lua in RDB. luaCreateFunction will only be called
in script load and eval*, both of which are available in the client.

It could be that that some day we'll still want to load scripts from
somewhere that's not a client. This fix is in dead code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Scripts should be replicated only by effects.

10 participants