Move Lua scripting engine into a Valkey module#2858
Conversation
This commit restructures the Lua scripting functionality by extracting it from the core Valkey server into a separate Valkey module. This change enables the possibility of a backwards compatible Lua engine upgrade, as well as, the flexibility in building Valkey without the Lua engine. **Important**: from a user's point of view, there's no difference in using the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward compatible with respect to the public API. The main code change is the move and adaptation of the Lua engine source files from `src/lua` to `src/modules/lua`. The original Lua engine code is adapted to use the module API to compile and execute scripts. The main difference between the original code and the new, is the serialization and deserialization of Valkey RESP values into, and from, Lua values. While in the original implementation the parsing of RESP values was done directly from the client buffer, in the new implementation the parsing is done from the `ValkeyModuleCallReply` object and respective API. The Makefile and CMake build systems were also updated to build and integrate the new Lua engine module, within the Valkey server build workflow. When the Valkey server is built, the Lua engine module is also built, and, the Lua module is loaded automatically by the server upon startup. When running `make install` the Lua engine module is installed in the default system library directory. There's a new build option, called `WITHOUT_LUA`, that if set allows to build Valkey server without building the Lua engine. This modular architecture enables future development of additional Lua engine modules with newer Lua versions that can be loaded alongside the current engine, facilitating gradual migration paths for users. Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
|
Wow! Some initial comments:
|
Sure, that makes more sense. I'll update the PR.
I was thinking that if the lua module is built then it would always be loaded by default. But we can add a new config to disable auto-loading even if the lua module is built.
Yes |
Yeah, we can discuss it with the core team. I believe in many contexts, such as in pre-built containers, the module is already built. Many users don't build their own binaries. |
|
I think it makes sense to have that config option to disable auto-load. |
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
@zuiderkwast I vote for implementing such option in a follow up PR. This PR is already huge to review. |
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2858 +/- ##
============================================
+ Coverage 72.47% 73.76% +1.28%
============================================
Files 129 125 -4
Lines 70537 68805 -1732
============================================
- Hits 51121 50752 -369
+ Misses 19416 18053 -1363
🚀 New features to boost your workflow:
|
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
Just some initial comments. I'll do another pass later. The change is not as huge as the +/- numbers indicate. Many lines are moved to other files.
The build options should be mentioned in the README.md.
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
This looks pretty solid in general.
My main concerns are about the build-time changes. Is there a risk that it will break compilation or installation for any user?
If Valkey is build with BUILD_LUA=no, then it doesn't automatically load a lua module even if such module is installed in the system later. Maybe it's not bad, but I just want to mention it.
@valkey-io/valkey-committers Does anyone else want to take a look, especially on the compile-time changes (Makefiles etc.)?
I tried my best to not break it, but a another pair of eyes is needed to improve confidence on the changes.
Right. But if later someone installs the module separately, it can add a |
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Before valkey-io#2858, we has this check: ``` static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) { if (run_ctx->flags & SCRIPT_ALLOW_OOM) { /* Allow running any command even if OOM reached */ return C_OK; } /* If we reached the memory limit configured via maxmemory, commands that * could enlarge the memory usage are not allowed, but only if this is the * first write in the context of this script, otherwise we can't stop * in the middle. */ if (server.maxmemory && /* Maxmemory is actually enabled. */ !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */ !(run_ctx->flags & SCRIPT_WRITE_DIRTY) && /* Script had no side effects so far. */ server.pre_command_oom_state && /* Detected OOM when script start. */ (run_ctx->c->cmd->flags & CMD_DENYOOM)) { *err = sdsdup(shared.oomerr->ptr); return C_ERR; } return C_OK; } ``` If we reached the memory limit configured via maxmemory, commands that could enlarge the memory usage are not allowed, but only if this is the first write in the context of this script, otherwise we can't stop in the middle. Signed-off-by: Binbin <binloveplay1314@qq.com>
This PR restructures the Lua scripting functionality by extracting it from the core Valkey server into a separate Valkey module. This change enables the possibility of a backwards compatible Lua engine upgrade, as well as, the flexibility in building Valkey without the Lua engine. **Important**: from a user's point of view, there's no difference in using the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward compatible with respect to the public API. The main code change is the move and adaptation of the Lua engine source files from `src/lua` to `src/modules/lua`. The original Lua engine code is adapted to use the module API to compile and execute scripts. The main difference between the original code and the new, is the serialization and deserialization of Valkey RESP values into, and from, Lua values. While in the original implementation the parsing of RESP values was done directly from the client buffer, in the new implementation the parsing is done from the `ValkeyModuleCallReply` object and respective API. The Makefile and CMake build systems were also updated to build and integrate the new Lua engine module, within the Valkey server build workflow. When the Valkey server is built, the Lua engine module is also built, and, the Lua module is loaded automatically by the server upon startup. When running `make install` the Lua engine module is installed in the default system library directory. There's a new build option, called `BUILD_LUA`, that if set to `no` allows to build Valkey server without building the Lua engine. This modular architecture enables future development of additional Lua engine modules with newer Lua versions that can be loaded alongside the current engine, facilitating gradual migration paths for users. Additional change: Unload all modules on shutdown (ignoring modules that can't be unloaded). This is to avoid address sanitizer warnings about leaked allocations. Fixes: valkey-io#1627 --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
…ion (#3030) Before #2858, we has this check: ``` static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) { if (run_ctx->flags & SCRIPT_ALLOW_OOM) { /* Allow running any command even if OOM reached */ return C_OK; } /* If we reached the memory limit configured via maxmemory, commands that * could enlarge the memory usage are not allowed, but only if this is the * first write in the context of this script, otherwise we can't stop * in the middle. */ if (server.maxmemory && /* Maxmemory is actually enabled. */ !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */ !(run_ctx->flags & SCRIPT_WRITE_DIRTY) && /* Script had no side effects so far. */ server.pre_command_oom_state && /* Detected OOM when script start. */ (run_ctx->c->cmd->flags & CMD_DENYOOM)) { *err = sdsdup(shared.oomerr->ptr); return C_ERR; } return C_OK; } ``` If we reached the memory limit configured via maxmemory, commands that could enlarge the memory usage are not allowed, but only if this is the first write in the context of this script, otherwise we can't stop in the middle. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
This was commented out in valkey-io#2858 but I don't know why - it looks like it might have been accidental. Signed-off-by: Rain Valentine <rsg000@gmail.com>
…ion (valkey-io#3030) Before valkey-io#2858, we has this check: ``` static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) { if (run_ctx->flags & SCRIPT_ALLOW_OOM) { /* Allow running any command even if OOM reached */ return C_OK; } /* If we reached the memory limit configured via maxmemory, commands that * could enlarge the memory usage are not allowed, but only if this is the * first write in the context of this script, otherwise we can't stop * in the middle. */ if (server.maxmemory && /* Maxmemory is actually enabled. */ !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */ !(run_ctx->flags & SCRIPT_WRITE_DIRTY) && /* Script had no side effects so far. */ server.pre_command_oom_state && /* Detected OOM when script start. */ (run_ctx->c->cmd->flags & CMD_DENYOOM)) { *err = sdsdup(shared.oomerr->ptr); return C_ERR; } return C_OK; } ``` If we reached the memory limit configured via maxmemory, commands that could enlarge the memory usage are not allowed, but only if this is the first write in the context of this script, otherwise we can't stop in the middle. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>
…y test in crash report (#3029) The memory test was commented out in #2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too: - main thread: `serverCron()` acquires GIL via `afterSleep()` then calls `finishShutdown()`, which eventually calls our script module unload code that calls `bioDrainWorker()`. - bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()` call `scriptingEngineCallFreeFunction()` which requires the GIL. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com>
Before we added LUA as a module we had a logic to NOT register the luaHook when the value of `busy-reply-threshold` config is set to 0. Now we ALWAYS register the hook and in order to keep aligned with old behavior we will let the execution of the script continue from the interrupt hook when `busy-reply-threshold` config is set == 0. And in value.conf, we are saying the config can be negative, but in fact the config is minimum at 0, fix the valkey.conf as well. ``` # The default is 5 seconds. It is possible to set it to 0 or a negative value # to disable this mechanism (uninterrupted execution) ``` It was introduced in #2858. Signed-off-by: Alon Arenberg <alonare@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>
Before we added LUA as a module we had a logic to NOT register the luaHook when the value of `busy-reply-threshold` config is set to 0. Now we ALWAYS register the hook and in order to keep aligned with old behavior we will let the execution of the script continue from the interrupt hook when `busy-reply-threshold` config is set == 0. And in value.conf, we are saying the config can be negative, but in fact the config is minimum at 0, fix the valkey.conf as well. ``` # The default is 5 seconds. It is possible to set it to 0 or a negative value # to disable this mechanism (uninterrupted execution) ``` It was introduced in valkey-io#2858. Signed-off-by: Alon Arenberg <alonare@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>
This PR restructures the Lua scripting functionality by extracting it from the core Valkey server into a separate Valkey module. This change enables the possibility of a backwards compatible Lua engine upgrade, as well as, the flexibility in building Valkey without the Lua engine. **Important**: from a user's point of view, there's no difference in using the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward compatible with respect to the public API. The main code change is the move and adaptation of the Lua engine source files from `src/lua` to `src/modules/lua`. The original Lua engine code is adapted to use the module API to compile and execute scripts. The main difference between the original code and the new, is the serialization and deserialization of Valkey RESP values into, and from, Lua values. While in the original implementation the parsing of RESP values was done directly from the client buffer, in the new implementation the parsing is done from the `ValkeyModuleCallReply` object and respective API. The Makefile and CMake build systems were also updated to build and integrate the new Lua engine module, within the Valkey server build workflow. When the Valkey server is built, the Lua engine module is also built, and, the Lua module is loaded automatically by the server upon startup. When running `make install` the Lua engine module is installed in the default system library directory. There's a new build option, called `BUILD_LUA`, that if set to `no` allows to build Valkey server without building the Lua engine. This modular architecture enables future development of additional Lua engine modules with newer Lua versions that can be loaded alongside the current engine, facilitating gradual migration paths for users. Additional change: Unload all modules on shutdown (ignoring modules that can't be unloaded). This is to avoid address sanitizer warnings about leaked allocations. Fixes: valkey-io#1627 --------- Signed-off-by: Ricardo Dias <ricardo.dias@percona.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
…ion (valkey-io#3030) Before valkey-io#2858, we has this check: ``` static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) { if (run_ctx->flags & SCRIPT_ALLOW_OOM) { /* Allow running any command even if OOM reached */ return C_OK; } /* If we reached the memory limit configured via maxmemory, commands that * could enlarge the memory usage are not allowed, but only if this is the * first write in the context of this script, otherwise we can't stop * in the middle. */ if (server.maxmemory && /* Maxmemory is actually enabled. */ !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */ !(run_ctx->flags & SCRIPT_WRITE_DIRTY) && /* Script had no side effects so far. */ server.pre_command_oom_state && /* Detected OOM when script start. */ (run_ctx->c->cmd->flags & CMD_DENYOOM)) { *err = sdsdup(shared.oomerr->ptr); return C_ERR; } return C_OK; } ``` If we reached the memory limit configured via maxmemory, commands that could enlarge the memory usage are not allowed, but only if this is the first write in the context of this script, otherwise we can't stop in the middle. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
…y test in crash report (valkey-io#3029) The memory test was commented out in valkey-io#2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too: - main thread: `serverCron()` acquires GIL via `afterSleep()` then calls `finishShutdown()`, which eventually calls our script module unload code that calls `bioDrainWorker()`. - bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()` call `scriptingEngineCallFreeFunction()` which requires the GIL. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Before we added LUA as a module we had a logic to NOT register the luaHook when the value of `busy-reply-threshold` config is set to 0. Now we ALWAYS register the hook and in order to keep aligned with old behavior we will let the execution of the script continue from the interrupt hook when `busy-reply-threshold` config is set == 0. And in value.conf, we are saying the config can be negative, but in fact the config is minimum at 0, fix the valkey.conf as well. ``` # The default is 5 seconds. It is possible to set it to 0 or a negative value # to disable this mechanism (uninterrupted execution) ``` It was introduced in valkey-io#2858. Signed-off-by: Alon Arenberg <alonare@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Before we added LUA as a module we had a logic to NOT register the luaHook when the value of `busy-reply-threshold` config is set to 0. Now we ALWAYS register the hook and in order to keep aligned with old behavior we will let the execution of the script continue from the interrupt hook when `busy-reply-threshold` config is set == 0. And in value.conf, we are saying the config can be negative, but in fact the config is minimum at 0, fix the valkey.conf as well. ``` # The default is 5 seconds. It is possible to set it to 0 or a negative value # to disable this mechanism (uninterrupted execution) ``` It was introduced in valkey-io#2858. Signed-off-by: Alon Arenberg <alonare@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>
In the refactor that moved Lua scripting to the Module API (valkey-io#2858), the argument object caching was not reimplemented. This causes a measurable regression in workloads that pass data through redis.call(), because every invocation allocates and frees ValkeyModuleString objects via je_malloc/je_sdallocx. This patch restores caching without depending on robj struct internals: - Reuse a static argv array across calls (avoids per-call ValkeyModule_Alloc + ValkeyModule_Free of the pointer array) - After each call, retain up to LUA_CMD_OBJCACHE_SIZE (32) small string objects via ValkeyModule_RetainString - On the next call, if the cached SDS buffer (checked via objectGetVal + sdsalloc) is large enough, memcpy the new content in-place — zero malloc/free for the hot path - Fall back to ValkeyModule_CreateString on cache miss The only server internal used is objectGetVal() (resolved at link time) and sds.h for sdssetlen/sdsalloc. No robj struct layout knowledge is needed, making this robust against future struct changes. Benchmark results (5 workloads, c6i.2xlarge, 3 runs each): - Data transfer (100x100KB reads): -7.8% regression -> -0.5% (fixed) - pcall error handling (500 iter): -11.6% regression -> +0.0% (fixed) - Heavy redis.call() (1200 cmds): remaining regression from module dispatch overhead (not allocation-related) Signed-off-by: Madelyn Olson <matolson@amazon.com>
Address review feedback from @zuiderkwast on valkey-io#3640: instead of calling zlibc_trim() explicitly from the MEMORY PURGE handler, fold it into jemalloc_purge() itself. That way the libc main arena is also trimmed automatically when jemalloc_purge() is invoked from other code paths, which today means the synchronous variants of FLUSHDB and FLUSHALL in src/db.c. This also fixes the historical oddity that @zuiderkwast pointed out: zlibc_trim() was added in a7abc2f to be called from freeLuaScriptsSync(), but became dead code when valkey-io#2858 moved Lua to a separate module that calls malloc_trim(3) directly. zlibc_trim() is now reachable again through the single jemalloc_purge() entry point. The async paths (lazyfree-lazy-user-flush=yes, EMPTYDB_ASYNC) intentionally do not call jemalloc_purge() so they continue to skip the libc trim too - that is correct: malloc_trim(3) briefly locks every glibc arena and we don't want to do that on the main thread when the user explicitly asked for an async flush. Verified locally on Ubuntu 22.04 / glibc 2.35: MEMORY PURGE : [heap] 135168 -> 114688 bytes (-20480) FLUSHALL SYNC : [heap] 135168 -> 114688 bytes (-20480) FLUSHDB SYNC : [heap] 135168 -> 114688 bytes (-20480) FLUSHDB (async) : [heap] 135168 -> 135168 bytes (unchanged, as intended) The behaviour described by the MEMORY PURGE help text and by the memory doctor's high_proc_rss diagnostic is unchanged; both still mention the libc main arena and direct users at MEMORY PURGE. Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
When the Lua scripting engine was migrated into a module in valkey-io#2858, two helpers stopped preserving the leading error code: * `luaPushErrorBuff` no longer prepended a generic "ERR " when the caller passed a bare message, so direct API errors (`redis.pcall()` with no args, `redis.setresp(4)`, `redis.log(1)`, `redis.set_repl()`, `redis.log(10, 'm')`) reach the client as unprefixed strings. The function's own comment says it should prepend "ERR", and the pre-module implementation did. * `luaProcessReplyError` and `errorCallback` stripped (or never re-added) the leading `-` before calling `luaPushErrorBuff`, so reply-side errors like OOM and WRONGTYPE took the bare-message branch above and then — once the bare branch is fixed — would have ended up double-coded ("ERR OOM ...", "ERR WRONGTYPE ..."). Restore the original behaviour by: * Re-adding the `-` in both reply-error entry points so the existing `"-CODE msg"` branch in `luaPushErrorBuff` preserves the code. * Prepending "ERR " in the bare-message branch unless the caller already supplied the prefix (avoids "ERR ERR ..." on the rewrite paths that hard-code "ERR ..."). Adds a regression test that asserts the prefix on the five direct API errors named in the bug, and that already-coded errors (WRONGTYPE) are left alone. Signed-off-by: 1fanwang <1fannnw@gmail.com>
When the Lua scripting engine was migrated into a module in valkey-io#2858, two helpers stopped preserving the leading error code: * `luaPushErrorBuff` no longer prepended a generic "ERR " when the caller passed a bare message, so direct API errors (`redis.pcall()` with no args, `redis.setresp(4)`, `redis.log(1)`, `redis.set_repl()`, `redis.log(10, 'm')`) reach the client as unprefixed strings. The function's own comment says it should prepend "ERR", and the pre-module implementation did. * `luaProcessReplyError` and `errorCallback` stripped (or never re-added) the leading `-` before calling `luaPushErrorBuff`, so reply-side errors like OOM and WRONGTYPE took the bare-message branch above and would have ended up double-coded once the bare branch is fixed ("ERR OOM ...", "ERR WRONGTYPE ..."). Restore the original behaviour by: * Re-adding the `-` in both reply-error entry points so the existing `"-CODE msg"` branch in `luaPushErrorBuff` preserves the code. * Unconditionally prepending "ERR " in the bare-message branch. Per zuiderkwast review feedback, the seven call sites that hard-coded "ERR ..." are updated to pass just the message — the contract is now that callers either pass `"-CODE msg"` for an explicit code or a bare `"msg"` for the default ERR prefix, never `"ERR msg"` themselves. Adds a regression test that asserts the prefix on the five direct API errors named in the bug, and that already-coded errors (WRONGTYPE) are left alone. Signed-off-by: 1fanwang <1fannnw@gmail.com>
Closes #3663. ## Why Three callers in `src/modules/lua/script_lua.c` lost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log, 1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now return the bare message instead of `ERR <message>`. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident. Reproducer: ``` $ ./src/valkey-cli eval "return redis.pcall()" 0 "Please specify at least one argument for this redis lib call" # before this PR "ERR Please specify at least one argument for this redis lib call" # with this PR ``` ## What changed In `src/modules/lua/script_lua.c`: 1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR ` unless the message already starts with a code letter (defensive — five existing callers hardcode `"ERR ..."` strings; double-prefixing them would also be wrong). 2. `luaProcessReplyError` push_error branch: re-add the leading `-` before delegating to `luaPushError` so the case-1 code-extraction path handles `OOM`, `WRONGTYPE`, `READONLY`, etc. 3. `errorCallback` errno==0 branch: same re-add-`-` approach. 41 LoC across the three functions. ## Test `tests/unit/scripting.tcl` regression at line 1028 covers: - All five direct-API errors above — asserts `ERR ` prefix - WRONGTYPE error path — asserts the code survives and is not double-prefixed - Runs 4 times under the `foreach is_eval × foreach script_compatibility_api` matrix Fails on `unstable`, passes with this PR. --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Closes #3663. ## Why Three callers in `src/modules/lua/script_lua.c` lost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log, 1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now return the bare message instead of `ERR <message>`. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident. Reproducer: ``` $ ./src/valkey-cli eval "return redis.pcall()" 0 "Please specify at least one argument for this redis lib call" # before this PR "ERR Please specify at least one argument for this redis lib call" # with this PR ``` ## What changed In `src/modules/lua/script_lua.c`: 1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR ` unless the message already starts with a code letter (defensive — five existing callers hardcode `"ERR ..."` strings; double-prefixing them would also be wrong). 2. `luaProcessReplyError` push_error branch: re-add the leading `-` before delegating to `luaPushError` so the case-1 code-extraction path handles `OOM`, `WRONGTYPE`, `READONLY`, etc. 3. `errorCallback` errno==0 branch: same re-add-`-` approach. 41 LoC across the three functions. ## Test `tests/unit/scripting.tcl` regression at line 1028 covers: - All five direct-API errors above — asserts `ERR ` prefix - WRONGTYPE error path — asserts the code survives and is not double-prefixed - Runs 4 times under the `foreach is_eval × foreach script_compatibility_api` matrix Fails on `unstable`, passes with this PR. --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Closes #3663. ## Why Three callers in `src/modules/lua/script_lua.c` lost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log, 1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now return the bare message instead of `ERR <message>`. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident. Reproducer: ``` $ ./src/valkey-cli eval "return redis.pcall()" 0 "Please specify at least one argument for this redis lib call" # before this PR "ERR Please specify at least one argument for this redis lib call" # with this PR ``` ## What changed In `src/modules/lua/script_lua.c`: 1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR ` unless the message already starts with a code letter (defensive — five existing callers hardcode `"ERR ..."` strings; double-prefixing them would also be wrong). 2. `luaProcessReplyError` push_error branch: re-add the leading `-` before delegating to `luaPushError` so the case-1 code-extraction path handles `OOM`, `WRONGTYPE`, `READONLY`, etc. 3. `errorCallback` errno==0 branch: same re-add-`-` approach. 41 LoC across the three functions. ## Test `tests/unit/scripting.tcl` regression at line 1028 covers: - All five direct-API errors above — asserts `ERR ` prefix - WRONGTYPE error path — asserts the code survives and is not double-prefixed - Runs 4 times under the `foreach is_eval × foreach script_compatibility_api` matrix Fails on `unstable`, passes with this PR. --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This PR restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This change
enables the possibility of a backwards compatible Lua engine upgrade, as well
as, the flexibility in building Valkey without the Lua engine.
Important: from a user's point of view, there's no difference in using
the
EVALofFUNCTION/FCALLscripts. This PR is fully backward compatiblewith respect to the public API.
The main code change is the move and adaptation of the Lua engine source
files from
src/luatosrc/modules/lua. The original Lua engine code isadapted to use the module API to compile and execute scripts.
The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP values
was done directly from the client buffer, in the new implementation the
parsing is done from the
ValkeyModuleCallReplyobject and respective API.The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running
make installthe Lua engine module is installed in thedefault system library directory.
There's a new build option, called
BUILD_LUA, that if set tonoallows tobuild Valkey server without building the Lua engine.
This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.
Additional change: Unload all modules on shutdown (ignoring modules that
can't be unloaded). This is to avoid address sanitizer warnings about
leaked allocations.
Fixes: #1627