Redis Function #9780

MeirShpilraien · 2021-11-14T18:23:13Z

Redis Function

This PR added the Redis Functions capabilities that were suggested on #8693. The PR also introduce a big refactoring to the current Lua implementation (i.e scripting.c). The main purpose of the refactoring is to have better code sharing between the Lua implementation that exists today on Redis (scripting.c) and the new Lua engine that is introduced on this PR. The refactoring includes code movements and file name changes as well as some logic changes that need to be carefully reviewed. To make the review easier, the PR was split into multiple commits. Each commit is deeply described later on but the main concept is that some commits are just moving code around without making any logical changes, those commits are less likely to cause any issues or regressions and can be reviewed fast. Other commits, which perform code and logic changes, need to be reviewed carefully, but those commits were created after the code movements so it's pretty easy to see what was changed. To sum up, it is highly recommended to review this PR commit by commit as it will be easier to see the changes, it is also recommended to read each commit description (written below) to understand what was changed on the commit and whether or not it's just a huge code movement or a logic changes.

Terminology

Currently, the terminology in Redis is not clearly defined. Scripts refer to Lua scripts and eval also refers only to Lua. Introducing Redis Function requires redefining those terms to be able to clearly understand what is been discussed on each context.

eval - legacy Lua script implementation.
Function - new scripting implementation (currently implemented in Lua but in the future, it might be other languages like javascript).
Engine - the component that is responsible for executing functions.
Script - Function or legacy Lua (executed with eval or evalsha)

Refactoring New Structure

Today, the entire scripting logic is located on scripting.c. This logic can be split into 3 main groups:

Script management - responsible for storing the scripts that were sent to Redis and retrieving them when they need to be run (base on the script sha on the current implementation).
Script invocation - invoke the script given on eval or evalsha command (this part includes finding the relevant script, preparing the arguments, ..)
Interact back with Redis (command invocation)

Those 3 groups are tightly coupled on scripting.c. Redis Functions also need to use those groups logics, for example, to interact back with Redis or to execute Lua code. The refactoring attempts to split those 3 groups and define APIs so that we can reuse the code both on legacy Lua scripts and Redis Functions.

In order to do so we define the following units:

script.c: responsible for interaction with Redis from within a script.
script_lua.c: responsible to execute Lua code, uses script.c to interact with Redis from within the Lua code.
function_lua.c: contains the Lua engine implementation, uses script_lua.c to execute the Lua code.
functions.c: Contains Redis Functions implementation (FUNCTION command,), uses functions_lua.c if the function it wants to invoke needs the Lua engine.
eval.c: the original scripting.c contains the Lua legacy implementation and was refactored to use script_lua.c to invoke the Lua code.

Commits breakdown

Notice: Some small commits are omitted from this list as they are small and insignificant (for example build fixes)

First commit - code movements

This commit rename scripting.c -> eval.c and introduce the new script_lua.c unit. The commit moves relevant code from eval.c (scripting.c) to script_lua.c, the purpose of moving the code is so that later we will be able to re-use the code on the Lua engine (function_lua.c). The commit only moves the code without modifying even a single line, so there is a very low risk of breaking anything and it also makes it much easier to see the changes on the following commits.
Because the commit does not change the code (only moves it), it does not compile. But we do not care about it as the only purpose here is to make the review processes simpler.

Second commit - move legacy Lua variables into `eval.c`

Today, all Lua-related variables are located on the server struct. The commit attempt to identify those variable and take them out from the server struct, leaving only script related variables (variables that later need to be used also by engines)
The following variable where renamed and left on the server struct:

lua_caller -> script_caller
lua_time_limit -> script_time_limit
lua_timedout -> script_timedout
lua_oom -> script_oom
lua_disable_deny_script -> script_disable_deny_script
in_eval -> in_script

The following variables where moved to lctx under eval.c

lua
lua_client
lua_cur_script
lua_scripts
lua_scripts_mem
lua_replicate_commands
lua_write_dirty
lua_random_dirty
lua_multi_emitted
lua_repl
lua_kill
lua_time_start
lua_time_snapshot

This commit is in a low risk of introducing any issues and it is just moving variables around and not changing any logic.

Third commit - introducing script unit

This commit introduces the script.c unit. Its purpose (as described above) is to provide an API for scripts to interact with Redis. Interaction includes mostly executing commands, but also other functionalities. The interaction is done using a ScriptRunCtx object that needs to be created by the user and initialized using scriptPrepareForRun. A detailed list of functionalities expose by the unit:

Calling commands (including all the validation checks such as
acl, cluster, read only run, ...)
Set Resp
Set Replication method (AOF/REPLICATION/NONE)
Call Redis back on long-running scripts to allow Redis to reply to clients and perform script kill

The commit introduces the new unit and uses it on eval commands to interact with Redis.

Fourth commit - Moved functionality of invoke Lua code to `script_lua.c`

This commit moves the logic of invoking the Lua code into script_lua.c so later it can be used also by Lua engine (function_lua.c). The code is located on callFunction function and assumes the Lua function already located on the top of the Lua stack. This commit also change eval.c to use the new functionality to invoke Lua code.

Fith commit - Added Redis Functions unit (`functions.c`) and Lua engine (`function_lua.c`)

Added Redis Functions unit under functions.c, included:

FUNCTION command:
- FUNCTION CREATE
- FUNCTION CALL
- FUNCTION DELETE
- FUNCTION KILL
- FUNCTION INFO
- FUNCTION STATS
Register engines

In addition, this commit introduces the first engine that uses the Redis Functions capabilities, the Lua engine (function_lua.c)

API Changes

`lua-time-limit`

configuration was renamed to script-time-limit (keep lua-time-limit as alias for backward compatibility).

Error log changes

When integrating with Redis from within a Lua script, the Lua term was removed from all the error messages and instead we write only script. For example:
Wrong number of args calling Redis command From Lua script -> Wrong number of args calling Redis command From script

`info memory` changes:

Before stating all the changes made to memory stats we will try to explain the reason behind them and what we want to see on those metrics:

memory metrics should show both totals (for all scripting frameworks), as well as a breakdown per framework / vm.
The totals metrics should have "human" metrics while the breakdown shouldn't.
We did try to maintain backward compatibility in some way, that said we did make some repurpose to existing metrics where it looks reasonable.
We separate between memory used by the script framework (part of redis's used_memory), and memory used by the VM (not part of redis's used_memory)

A full breakdown of info memory changes:

used_memory_lua and used_memory_lua_human was deprecated, used_memory_vm_eval has the same meaning as used_memory_lua
used_memory_scripts was renamed to used_memory_scripts_eval
used_memory_scripts and used_memory_scripts_human were repurposed and now return the total memory used by functions and eval (not including vm memory, only code cache, and structs).
used_memory_vm_function was added and represents the total memory used by functions vm's
used_memory_functions was added and represents the total memory by functions (not including vm memory, only code cache, and structs)
used_memory_vm_total and used_memory_vm_total_human was added and represents the total memory used by vm's (functions and eval combined)

`functions.caches`

functions.caches field was added to memory stats, representing the memory used by engines that are not functions (this memory includes data structures like dictionaries, arrays, ...)

New API

FUNCTION LOAD

Usage: FUNCTION LOAD ENGINE NAME [REPLACE] [DESC <DESCRIPTION>] <CODE>

ENGINE - The name of the engine to use to create the script.
NAME - the name of the function that can be used later to call the function using FUNCTION CALL command.
REPLACE - if given, replace the given function with the existing function (if exists).
DESCRIPTION - optional argument describing the function and what it does
CODE - function code.

The command will return OK if created successfully or error in the following cases:

The given engine name does not exist
The function name is already taken and REPLACE was not used.
The given function failed on the compilation.

FCALL and FCALL_RO

Usage: FCALL/FCALL_RO NAME NUM_KEYS key1 key2 … arg1 arg2

Call and execute the function specified by NAME. The function will receive all arguments given after NUM_KEYS. The return value from the function will be returned to the user as a result.

NAME - Name of the function to run.
The rest is as today with EVALSHA command.

The command will return an error in the following cases:

NAME does not exist
The function itself returned an error.

The FCALL_RO is equivalent to EVAL_RO and allows only read-only commands to be invoked from the script.

FUNCTION DELETE

Usage: FUNCTION DELETE NAME

Delete a function identified by NAME. Return OK on success or error on one of the following:

The given function does not exist

FUNCTION INFO

Usage: FUNCTION INFO NAME [WITHCODE]

Return information about a function by function name:

Function name
Engine name
Description
Raw code (only if WITHCODE argument is given)

FUNCTION LIST

Usage: FUNCTION LIST

Return general information about all the functions:

Function name
Engine name
Description

FUNCTION STATS

Usage: FUNCTION STATS

Return information about the current running function:

Function name
Command that was used to invoke the function
Duration in MS that the function is already running

If no function is currently running, this section is just a RESP nil.

Additionally, return a list of all the available engines.

FUNCTION KILL

Usage: FUNCTION KILL

Kill the currently executing function. The command will fail if the function already initiated a write command.

TODO

Handle short read (add tests)
Improve testing

Community requests to handle on following PR's

Allow loading functions from an external file on startup - Functions on Redis Cluster #9899
Define and implement Function arguments parsing (named arguments/positional arguments)
Define and implement code reuse between functions - Redis Functions - Libraries #9906
Add JS engine

Community requests handled on this PR's

FUNCTION RUNNING command, to check to current running function: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)
Handled with FUCTION STATS command
Get the source code of a function: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)
Handled with FUCTION INFO command

Community requests to consider for future planning

Memory limit on functions engines.
Trigger functions on events: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)
Blocking operations inside functions: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)
Background function execution: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)
Rollback on timeouts: Suggestion: Redis Functions, an extension to Lua scripts #8693 (comment)

Notes

Note: Function creation/deletion is replicated to AOF but AOFRW is not implemented sense its going to be removed: #9794

oranagra · 2021-11-15T09:00:02Z

@redis/core-team please review / approve and state what's missing or needs a change.
there are a few additional minor changes that should be done on this PR before being merged, but i think the majority of the additional changes can come in a few followup PRs.

eduardobr · 2021-11-16T12:06:58Z

@MeirShpilraien I noticed the functionsCtxGetCurrent gets a global value. Does it make sense that it could live within the redisDb struct? Same style as slots_to_keys, that was moved there recently.
Reason for that is that passing around, initializing, swapping and destroying server.db and tempDb + related objects is cleaner and more explicit when the related objects come together (at least things that share same life cycle).
Methods like functionsCtxSwapWithCurrent would be called or have the logic in the main function that swaps whole db.

oranagra · 2021-11-16T12:23:22Z

@eduardobr the problem is that functions are global (not per db, in a multi-db server).
unlike the cluster slots mapping, which is global, but only valid in a single db configuration.
that's why we had to create rdbLoadingCtx, if we had that one before, maybe we would have put the slots mapping there too.

Note, i did review this code before being posted publicly, and did attempt to follow the cluster slots footsteps as much as i could.
if you still think there's a possibility for an additional cleanup, please suggest it.

src/eval.c

yossigo

LGTM, but did not do a full code review of the latest version.

itamarhaber

Approving the feature and API, did not do a code review.

madolson

(Sorry for random comments scattered, I was just making notes on stuff I saw. I mostly was trying to understand the interface. It looked good, so I trust the functionality is implemented as expected)

I want to better understand why we think the right approach here is storing functions in the RDB as opposed to storing them in the conf file or a separate file. I view functions more analogous to ACLs than keyspace data, that they are properties of the cluster and not necessarily specific to the RDB data. This is basically completely broken for workloads without persistence, like caching, which seems relevant. I would expect having a well defined mechanism outside of the RDB to exist.

Also are we fixing this? #8478 for functions?

tests/unit/functions.tcl

src/script.c

src/rdb.c

src/server.c

src/functions.c

tests/unit/functions.tcl

MeirShpilraien · 2021-11-24T09:54:21Z

@madolson thanks for the review. Fixed most of the comments (pushed a new commit) and I added comments where I needed more input about how to proceed.

Regarding pre-load of functions (not from persistence). This is listed as first task to handle on following PR's. I am working on the design and will share in a few days. Once you (the core team) will approve the design I will work on the implementation.

Regarding this #8478, I have another design coming to allow better code sharing and I hope to address it (or at least suggest a possible solution). Will share soon.

madolson · 2021-11-26T21:23:46Z

@MeirShpilraien My point is that I'm not convinced RDB is the right place to be storing functions, or at least we should take a deeper look into the RDB storage to think about how it interacts with other features. One example is cluster mode, when you want to split a cluster, you add N new nodes with the same configs of the other nodes and then migrate the keyspace data off for some slices. This won't include functions until we add them to configs, but then you run into the issue where they need to only be loaded on primaries and replicated (or perhaps we allow primaries and replicas to diverge?).

That's why I want to understand why it's not being built more analogous to the config file or to ACLs. The only real benefit here is that storing it in the RDB is actually really convenient (Since you can just restore the RDB to another node). But it would also be convenient for ACLs, so maybe the right answer is we should be storing them both in the RDB. (Also, I didn't actually review the code but I assume we replicate by the effects of the script and not the script itself. If that isn't the case, then having replica and primary agree on the state of a function matter more)

I suppose I would want an answer to this before I would signoff on it.

MeirShpilraien · 2021-11-28T09:05:19Z

@madolson the entire point of functions is that they are replicated and persisted so you will have them on fail-overs for example (this is also how we defined it in the issue #8693). I do agree though that we need a way to allow users to also load them from config in case the user runs without persistence for example. Hope to publish my design for that today/tomorrow and we will be able to discuss it.

(Also, I didn't actually review the code but I assume we replicate by the effects of the script and not the script itself. If that isn't the case, then having replica and primary agree on the state of a function matter more)

Yes, on functions we replicate only by effect.

madolson · 2021-11-28T16:44:29Z

@MeirShpilraien Agree about having them available on failover, if they were stored in a config file they would also be available though.

src/functions.c

src/functions.h

src/functions.c

src/server.c

src/functions.c

src/functions.h

src/rdb.c

src/script.c

src/functions.h

The functionality was moved to script_lua.c under callFunction function. Its purpose is to call the Lua function already located on the top of the Lua stack. Used the new function on eval.c to invoke Lua code. The function will also be used to invoke Lua code on the Lua engine.

MeirShpilraien · 2021-12-01T22:19:16Z

Squashed all reviews commits to their relevant location, kept only the original 5 commits.

oranagra · 2021-12-02T09:07:57Z

triggered full CI https://github.com/redis/redis/actions/runs/1529820590

Redis function unit is located inside functions.c and contains Redis Function implementation: 1. FUNCTION commands: * FUNCTION CREATE * FCALL * FCALL_RO * FUNCTION DELETE * FUNCTION KILL * FUNCTION INFO 2. Register engine In addition, this commit introduce the first engine that uses the Redis Function capabilities, the Lua engine.

madolson

Talked with @Oran over slack, and it seems like we're definitely going to make sure there is another strategy for making functions loadable at startup/cluster mode. So we can follow up on that issue.

MeirShpilraien · 2021-12-05T20:37:19Z

Loading Functions on cluster and/or startup time is discussed here: #9899

MeirShpilraien · 2021-12-06T18:54:24Z

Code sharing on functions is discussed here: #9906

…ic execution logic (#9812) # Background The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view. There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue #5292: >1. No longer need to explain to users side effects into scripts. They can do whatever they want. >2. No need for a cache about scripts that we sent or not to the slaves. >3. No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed. >4. No need to store scripts inside the RDB file in order to startup correctly. >5. No problems about evicting keys during the script execution. When looking back at Redis 5.0, antirez and core team decided to set the config `lua-replicate-commands yes` by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally. # Changes - configuration for lua-replicate-commands removed - created config file stub for backward compatibility - Replication script cache removed - this is useless under script effects replication - relevant statistics also removed - script persistence in RDB files is also removed - Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed - Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order) - the flags indicating which commands have non-deterministic results are kept as hints to clients. - `redis.replicate_commands()` & `redis.set_repl()` changed - now `redis.replicate_commands()` does nothing and return an 1 - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now - Relevant TCL cases adjusted - DEBUG lua-always-replicate-commands removed # Other changes - Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780) Co-authored-by: Oran Agra <oran@redislabs.com>

This is an overlook in redis#9780

We now no longer propagate scripts (started from 7.0), so this is a very rare issue that in nearly-dead-code. This is an overlook in #9780

We now no longer propagate scripts (started from 7.0), so this is a very rare issue that in nearly-dead-code. This is an overlook in #9780 (cherry picked from commit e4d183a)

Info fields from the rearrangement of scripts when functions were introduced. see redis/redis#9780 --------- Co-authored-by: Oran Agra <oran@redislabs.com>

MeirShpilraien requested a review from oranagra November 14, 2021 18:23

oranagra approved these changes Nov 15, 2021

View reviewed changes

sundb reviewed Nov 16, 2021

View reviewed changes

src/eval.c Outdated Show resolved Hide resolved

src/eval.c Outdated Show resolved Hide resolved

yossigo approved these changes Nov 17, 2021

View reviewed changes

oranagra added 7.0-must-have approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Nov 18, 2021

oranagra linked an issue Nov 18, 2021 that may be closed by this pull request

Suggestion: Redis Functions, an extension to Lua scripts #8693

Closed

oranagra mentioned this pull request Nov 19, 2021

Remove EVAL script verbatim replication, propagation, and deterministic execution logic #9812

Merged

itamarhaber approved these changes Nov 22, 2021

View reviewed changes

madolson reviewed Nov 23, 2021

View reviewed changes

tests/unit/functions.tcl Outdated Show resolved Hide resolved