A new propagation mechanism for Lua scripts #4966

soloestoy · 2018-05-30T15:21:41Z

Happy to see 5.0 RC1 released, it's a big step, redis becomes more strong and interesting, look forward to the GA version.

From the last commit, we are aligned that RDB file does no longer represent just a persistence file, it represents also the state of the instance, so we persist all keys even expired to RDB file.

It looks good after the fix, but I just remember that we have met a similar problem about lua scripts once. After psync slave may lose some scripts because the rebooting RDB file doesn't contains, and we fix it by persisting all lua scripts to RDB file when FULL RESYNC. Lua scripts should be regarded as a part of state of the instance as well I think.

But there still have something bad which may result in inconsistency between master and slave:

If slave accept script flush from a normal client, slave cannot execute EVALSHA correctly.

It's easy to understand and reproduce, because after script flush slave delete all lua scripts.
SCRIPT LOAD on middle slave may lead to inconsistency.

This scenario is a little complex but serious, imagine that:
- We have three instances, master A, middle slave B and slave C like this: A --> B --> C
- Apply script load "redis.call('set','foo','bar')" on middle slave B, we can get the sha1 a3862cfaeddaf7c3473803cba58cdd0bdca53397
- Apply evalsha a3862cfaeddaf7c3473803cba58cdd0bdca53397 0 on slave B, we will get an error because this script contains write command. And slave B will add this script to server.repl_scriptcache_dict, but not replicate to slave C, because all data in replication stream are generated by master.
- Then we promote slave B to be master, and apply evalsha a3862cfaeddaf7c3473803cba58cdd0bdca53397 0 again. Now master B has key foo but slave C doesn't.

These bugs appear because script flush/load and eval command break the lua scripting state, we can fix them like this:

Disable script flush/load and eval command on slave.

Then slave has no chance to break lua scripting state.
For EVAL command, we can propagate scripts explicitly.

It means that we can split EVAL command to SCRIPT LOAD and EVAL, if it's the first time we meet this script.

Because some readonly scripts or scripts with redis.replicate_command() cannot be propagate to slave and aof, so after this all slaves can be consist with master.
After these two above finished, we can remove replication script cache.

And consider aofrewrite without replication script cache, we should persist all lua scripts in AOF and RDB file.

cqyisbug · 2018-05-31T03:43:05Z

repl_scriptcache_dict 字典记录了自己已经将哪些脚本传播给了所有服务器
这个理解对吗?

soloestoy · 2018-05-31T05:57:58Z

repl_scriptcache_dict 字典记录了自己已经将哪些脚本传播给了所有服务器
这个理解对吗?

@cqyisbug repl_scriptcache_dict records scripts those converted from EVALSHA to EVAL. It means that scripts never touched by EVALSHA are not recorded.

    if (evalsha && !server.lua_replicate_commands) {
        if (!replicationScriptCacheExists(c->argv[1]->ptr)) {
            /* This script is not in our script cache, replicate it as
             * EVAL, then add it into the script cache, as from now on
             * slaves and AOF know about it. */
            robj *script = dictFetchValue(server.lua_scripts,c->argv[1]->ptr);

            replicationScriptCacheAdd(c->argv[1]->ptr);
            serverAssertWithInfo(c,NULL,script != NULL);
            rewriteClientCommandArgument(c,0,
                resetRefCount(createStringObject("EVAL",4)));
            rewriteClientCommandArgument(c,1,script);
            forceCommandPropagation(c,PROPAGATE_REPL|PROPAGATE_AOF);
        }
    }

cqyisbug · 2018-05-31T06:37:31Z

@soloestoy thanks for your advice. ^_^

soloestoy · 2018-05-31T12:52:24Z

Moreover , if we apply SCRIPT LOAD "redis.call('keys','*')" and then apply EVALSHA on master, master will propagate EVAL "redis.call('keys','*')" 0 to slave, because of the code above.

But it's a waste of time that a slave execute EVAL "redis.call('keys','*')" 0 from master, @antirez

soloestoy · 2018-06-20T09:25:23Z

Ping @antirez

After RDB version 9, the RDB file not only contains replication information, but alos contains the script cache as well, so we can guarantee that no matter PSYNC after a restart or FULLRESYNC a new slave connect, the slave can hold all the same scripts with master, so we can remove the replication script cache. Before this, redis only propagate scripts by SCRIPT LOAD, and by EVAL command implicitly(only when EVAL update data). Now we wanna make the script propagated explicitly, in other words we propagate all the scripts by SCRIPT LOAD even the script is in EVAL command. That can make master, slave and AOF be consistent, redis can execute EVALSHA on master or slave or restart from AOF.

antirez · 2018-07-25T14:27:39Z

Hello @soloestoy, I've to study the PR, just had time to scan it when you posted it initially: my impression is that this PR attempts to prevent users from doing Very Silly Things (TM) :-) Like going and poke with the scripting cache of a slave, by doing SCRIPT FLUSH/LOAD in a slave. In general as you can see in all the Redis design, we don't try to avoid that users can shoot themselves in the feet... So that would not be a good fit. However I've to read the code and evaluate the PR with more care, this is just to give a general feedback about this kind of changes. Thanks.

soloestoy · 2018-07-26T17:30:49Z

Hi @antirez , thanks for your reply.

In general as you can see in all the Redis design, we don't try to avoid that users can shoot themselves in the feet...

I totally agree with you, I don't care users' wrong and silly behavior neither.

My main idea is to implement a new propagation for Lua scripts, the current Lua scripts propagation is a bit complicated and there is a waste of memory and cpu time.

Firstly, let's see the current Lua scripts propagation mechanism(here we ignore redis. replicate_commands for the time being):

When does redis propagate Lua scripts explicitly?

SCRIPT LOAD command propagates Lua script to slaves and aof, and adds the script to server.lua_scripts.

At the same time it means that we can apply EVALSHA on redis even after rebooting or failover.
When does redis propagate Lua scripts implicitly?

i. EVAL command with WRITE Lua script.
- Then redis propagates the Lua script to slaves and aof, and adds the script to server.lua_scripts as what SCRIPT LOAD does.
- Problem 1: we cannot propagate READ Lua script by EVAL, and then we cannot apply EVALSHA after rebooting and failover.
ii. EVAHSHA command with new sha.
- As we know the new sha is not in server.repl_scriptcache_dict(but it is already in server.lua_scipts), so we have to rewrite EVALSHA as EVAL with the original script, and propagate it to slaves and aof.
- Problem 2: we have to propagate the original script even it is already propagated by SCRIPT LOAD or EVAL, it's unnecessary.
- Problem 3: we have to propagate the original script even it's a READ script like redis.call('keys','*'), it's a waste of time that a slave execute EVAL "redis.call('keys','*')" 0 or master load from aof.
- Problem 4: replicationScriptCacheFlush will be called when aof rewrite and full resync, it means that all shas called by EVALSHA will be flushed then all EVALSHA have to propagate again.
- Problem 5: old sha will be evicted when we reach max number of server.repl_scriptcache_fifo entries.

But remember that we already persist all Lua scripts into RDB file after Redis 4.0, so I'm thinking can we implement a new propagation for Lua scripts to make it simple and effective?

I think maybe Lua scripts should be regarded as a part of state of the instance as well. We can take the Lua scripts as data that not belong to any DB but belong to the whole instance. And then what I did is that:

Propagate EVAL command as SCRIPT LOAD when it's the first time we meet this script.

It means that even READ scripts can be propagated to slaves and aof by EVAL command, and slaves could be same with master state.
Disable SCRIPT FLUSH|LOAD and EVAL command on slave.

Because these two commands may break Lua scripting state.
Persist all Lua scripts in AOF and RDB file.

Then slaves could be same with master's state and AOF file contains all Lua scripts even after rewrite.
Remove server.repl_scriptcache_dict and server.repl_scriptcache_fifo.

We can guarantee that even after FULL RESYNC or AOF REWRITE redis can hold all Lua scripts, so these two are not needed any more. Scripts will not be propagated duplicated and slave will not execute READ Lua script from master.

We can guarantee that the script in EVAL command has already been propagated.

It is useful to get the original script which represented by sha1 in slowlog.

antirez · 2018-09-04T15:19:39Z

Hello @soloestoy, I examined your proposal. In general those are good ideas, however given that we now have a plan to avoid all this complexity anyway, and switch to a feature where the only propagation mechanism is effects replication of scripts, people that want to use scripts in slaves will have to follow the protocol as usually... that is: EVALSHA and if the server replies with a no script error, use EVAL instead. So all the instances, including slaves, are unrelated to all the other instances from the POV of the scripts they have in memory. This is in general a very good simplification.

However, there is one of the points in your proposal that IMHO is a serious bug, and we should address it ASAP, regardless of the fact we are going to remove all this code in the future (hopefully), and is that read-only scripts, or aborted scripts (scripts with writes that abort before the first write is ever executed, so they are effectively read-only), are propagated anyway.

I believe that we should instead translate such scripts as SCRIPT LOAD in the slave here:

    if (evalsha && !server.lua_replicate_commands) {
        if (!replicationScriptCacheExists(c->argv[1]->ptr)) {
            /* This script is not in our script cache, replicate it as
             * EVAL, then add it into the script cache, as from now on
             * slaves and AOF know about it. */
            robj *script = dictFetchValue(server.lua_scripts,c->argv[1]->ptr);

            replicationScriptCacheAdd(c->argv[1]->ptr);
            serverAssertWithInfo(c,NULL,script != NULL);
            rewriteClientCommandArgument(c,0,
                resetRefCount(createStringObject("EVAL",4)));
            rewriteClientCommandArgument(c,1,script);
            forceCommandPropagation(c,PROPAGATE_REPL|PROPAGATE_AOF);
        }
    }

The above code, does not attempt to do any differentiation between scripts that produced changes, and scripts that instead were completely read-only. So my suggestion is that we do such change for now ignoring all the rest, and go forward with our plan to enable effects replication by default (with some hidden config to revert it if really needed). If such plan fails, and we see in the practice that users really need verbatim scripts replication, then we can re-examine the issue and understand what's the best approach to propagate the scripts, starting from your proposal. Does this make sense to you? Thanks for all your great work in this issue (and otherwise!).

soloestoy · 2018-09-04T16:13:57Z

Thanks your reply @antirez , I will analyze the bug you pointed carefully.

I agree with that we can go forward to enable effects replication by default, at the same time I'm trying to find a good way to propagate the verbatim scripts, it's not finished yet, maybe can finish these days.

oranagra · 2021-12-21T06:34:15Z

no longer relevant, script propagation was deprecated in #9812

soloestoy mentioned this pull request May 30, 2018

A improvement makes lua scripts be propagated simpler #4767

Closed

soloestoy added 6 commits July 25, 2018 13:09

Scripting: disable SCRIPT FLUSH on slave unless it's from our master

3fe26c0

Scripting: disable SCRIPT LOAD and EVAL on slave

3999b4c

Scripting & persistence: persist lua scripts in AOF and RDB file

235e749

Scripting & PSYNC2: remove replication script cache

1c528bf

Scripting: calculate lua.scripts memory more accurately

1af44c1

soloestoy force-pushed the lua-new-propagation branch from d3265d9 to 1af44c1 Compare July 25, 2018 12:27

soloestoy changed the title ~~Bugfix and Lua new propagation~~ A new propagation mechanism about Lua scripts Jul 26, 2018

soloestoy changed the title ~~A new propagation mechanism about Lua scripts~~ A new propagation mechanism for Lua scripts Jul 26, 2018

soloestoy added 2 commits July 27, 2018 02:10

Scripting: rewrite EVAL as EVALSHA

1179461

We can guarantee that the script in EVAL command has already been propagated.

Scripting: add subcommand GET to get original scripts by sha1

042e660

It is useful to get the original script which represented by sha1 in slowlog.

soloestoy mentioned this pull request Aug 13, 2018

script cache memory in INFO and MEMORY includes both script code and overheads #5242

Merged

This was referenced Aug 27, 2018

CRITICAL BUG: eviction when lua timeout may lead to inconsistency. #5250

Closed

Scripts should be replicated only by effects. #5292

Closed

soloestoy force-pushed the lua-new-propagation branch from 9f24cdf to 042e660 Compare August 29, 2018 14:06

zhugezy mentioned this pull request Nov 19, 2021

Remove EVAL script verbatim replication, propagation, and deterministic execution logic #9812

Merged

oranagra closed this Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A new propagation mechanism for Lua scripts #4966

A new propagation mechanism for Lua scripts #4966

Uh oh!

soloestoy commented May 30, 2018 •

edited

Loading

Uh oh!

cqyisbug commented May 31, 2018

Uh oh!

soloestoy commented May 31, 2018

Uh oh!

cqyisbug commented May 31, 2018

Uh oh!

soloestoy commented May 31, 2018

Uh oh!

soloestoy commented Jun 20, 2018

Uh oh!

antirez commented Jul 25, 2018

Uh oh!

soloestoy commented Jul 26, 2018 •

edited

Loading

Uh oh!

antirez commented Sep 4, 2018

Uh oh!

soloestoy commented Sep 4, 2018 •

edited

Loading

Uh oh!

oranagra commented Dec 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

A new propagation mechanism for Lua scripts #4966

A new propagation mechanism for Lua scripts #4966

Uh oh!

Conversation

soloestoy commented May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cqyisbug commented May 31, 2018

Uh oh!

soloestoy commented May 31, 2018

Uh oh!

cqyisbug commented May 31, 2018

Uh oh!

soloestoy commented May 31, 2018

Uh oh!

soloestoy commented Jun 20, 2018

Uh oh!

antirez commented Jul 25, 2018

Uh oh!

soloestoy commented Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antirez commented Sep 4, 2018

Uh oh!

soloestoy commented Sep 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

soloestoy commented May 30, 2018 •

edited

Loading

soloestoy commented Jul 26, 2018 •

edited

Loading

soloestoy commented Sep 4, 2018 •

edited

Loading