Skip to content

Move Lua scripting engine into a Valkey module#2858

Merged
rjd15372 merged 31 commits into
valkey-io:unstablefrom
rjd15372:lua-module
Dec 18, 2025
Merged

Move Lua scripting engine into a Valkey module#2858
rjd15372 merged 31 commits into
valkey-io:unstablefrom
rjd15372:lua-module

Conversation

@rjd15372

@rjd15372 rjd15372 commented Nov 20, 2025

Copy link
Copy Markdown
Member

This PR restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This change
enables the possibility of a backwards compatible Lua engine upgrade, as well
as, the flexibility in building Valkey without the Lua engine.

Important: from a user's point of view, there's no difference in using
the EVAL of FUNCTION/FCALL scripts. This PR is fully backward compatible
with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from src/lua to src/modules/lua. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP values
was done directly from the client buffer, in the new implementation the
parsing is done from the ValkeyModuleCallReply object and respective API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running make install the Lua engine module is installed in the
default system library directory.
There's a new build option, called BUILD_LUA, that if set to no allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Additional change: Unload all modules on shutdown (ignoring modules that
can't be unloaded). This is to avoid address sanitizer warnings about
leaked allocations.

Fixes: #1627

@rjd15372 rjd15372 self-assigned this Nov 20, 2025
@rjd15372 rjd15372 added the enhancement New feature or request label Nov 20, 2025
@rjd15372

Copy link
Copy Markdown
Member Author

This PR is rebased on top of #2836 . When #2836 is merged, I'll update this PR.

This commit restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This change
enables the possibility of a backwards compatible Lua engine upgrade, as well
as, the flexibility in building Valkey without the Lua engine.

**Important**: from a user's point of view, there's no difference in using
the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward compatible
with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from `src/lua` to `src/modules/lua`. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP values
was done directly from the client buffer, in the new implementation the
parsing is done from the `ValkeyModuleCallReply` object and respective API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running `make install` the Lua engine module is installed in the
default system library directory.
There's a new build option, called `WITHOUT_LUA`, that if set allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
@zuiderkwast

Copy link
Copy Markdown
Contributor

Wow!

Some initial comments:

  • For the build option, why not use the name BUILD_LUA (default yes), to align with the build options BUILD_TLS and BUILD_RDMA?
  • For backward compatibility, the lua module needs to load by default. To disable this auto-loading, we need a new config, don't we? I can't see it mentioned here.
  • Can the lua module be unloaded using MODULE UNLOAD?

@rjd15372

Copy link
Copy Markdown
Member Author
  • For the build option, why not use the name BUILD_LUA (default yes), to align with the build options BUILD_TLS and BUILD_RDMA?

Sure, that makes more sense. I'll update the PR.

  • For backward compatibility, the lua module needs to load by default. To disable this auto-loading, we need a new config, don't we? I can't see it mentioned here.

I was thinking that if the lua module is built then it would always be loaded by default. But we can add a new config to disable auto-loading even if the lua module is built.

  • Can the lua module be unloaded using MODULE UNLOAD?

Yes

@zuiderkwast

zuiderkwast commented Nov 20, 2025

Copy link
Copy Markdown
Contributor

I was thinking that if the lua module is built then it would always be loaded by default. But we can add a new config to disable auto-loading even if the lua module is built.

Yeah, we can discuss it with the core team.

I believe in many contexts, such as in pre-built containers, the module is already built. Many users don't build their own binaries.

@rjd15372

Copy link
Copy Markdown
Member Author

I think it makes sense to have that config option to disable auto-load.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
@rjd15372

Copy link
Copy Markdown
Member Author

I think it makes sense to have that config option to disable auto-load.

@zuiderkwast I vote for implementing such option in a follow up PR. This PR is already huge to review.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
@codecov

codecov Bot commented Nov 21, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.33333% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.76%. Comparing base (48e0cbb) to head (37e212b).
⚠️ Report is 11 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 73.52% 9 Missing ⚠️
src/scripting_engine.c 95.23% 1 Missing ⚠️
src/server.c 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2858      +/-   ##
============================================
+ Coverage     72.47%   73.76%   +1.28%     
============================================
  Files           129      125       -4     
  Lines         70537    68805    -1732     
============================================
- Hits          51121    50752     -369     
+ Misses        19416    18053    -1363     
Files with missing lines Coverage Δ
src/config.c 78.76% <ø> (+0.32%) ⬆️
src/debug.c 54.63% <ø> (+0.04%) ⬆️
src/eval.c 87.46% <100.00%> (ø)
src/module.h 0.00% <ø> (ø)
src/replication.c 85.86% <ø> (-0.20%) ⬇️
src/script.c 81.29% <ø> (+1.15%) ⬆️
src/scripting_engine.c 57.49% <95.23%> (+4.12%) ⬆️
src/server.c 88.67% <88.88%> (+0.28%) ⬆️
src/module.c 25.47% <73.52%> (+15.69%) ⬆️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. I'll do another pass later. The change is not as huge as the +/- numbers indicate. Many lines are moved to other files.

The build options should be mentioned in the README.md.

Comment thread src/modules/lua/engine_lua.c
Comment thread src/modules/lua/list.c
Comment thread src/modules/lua/list.c Outdated
Comment thread src/modules/lua/script_lua.c Outdated
Comment thread src/modules/lua/script_lua.c Outdated
Comment thread src/modules/lua/script_lua.c Outdated
Comment thread src/modules/lua/script_lua.c Outdated
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Comment thread src/scripting_engine.c Outdated
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty solid in general.

My main concerns are about the build-time changes. Is there a risk that it will break compilation or installation for any user?

If Valkey is build with BUILD_LUA=no, then it doesn't automatically load a lua module even if such module is installed in the system later. Maybe it's not bad, but I just want to mention it.

@valkey-io/valkey-committers Does anyone else want to take a look, especially on the compile-time changes (Makefiles etc.)?

Comment thread src/Makefile Outdated
Comment thread src/server.c Outdated
Comment thread src/server.c Outdated
@rjd15372

Copy link
Copy Markdown
Member Author

My main concerns are about the build-time changes. Is there a risk that it will break compilation or installation for any user?

I tried my best to not break it, but a another pair of eyes is needed to improve confidence on the changes.

If Valkey is build with BUILD_LUA=no, then it doesn't automatically load a lua module even if such module is installed in the system later. Maybe it's not bad, but I just want to mention it.

Right. But if later someone installs the module separately, it can add a loadmodule line to valkey.conf to load it automatically.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Comment thread src/config.c Outdated
Comment thread src/config.c Outdated
Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Jan 8, 2026
Before valkey-io#2858, we has this check:
```
static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) {
    if (run_ctx->flags & SCRIPT_ALLOW_OOM) {
        /* Allow running any command even if OOM reached */
        return C_OK;
    }

    /* If we reached the memory limit configured via maxmemory, commands that
     * could enlarge the memory usage are not allowed, but only if this is the
     * first write in the context of this script, otherwise we can't stop
     * in the middle. */

    if (server.maxmemory &&                          /* Maxmemory is actually enabled. */
        !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */
        !(run_ctx->flags & SCRIPT_WRITE_DIRTY) &&    /* Script had no side effects so far. */
        server.pre_command_oom_state &&              /* Detected OOM when script start. */
        (run_ctx->c->cmd->flags & CMD_DENYOOM)) {
        *err = sdsdup(shared.oomerr->ptr);
        return C_ERR;
    }

    return C_OK;
}
```

If we reached the memory limit configured via maxmemory, commands that
could enlarge the memory usage are not allowed, but only if this is the
first write in the context of this script, otherwise we can't stop
in the middle.

Signed-off-by: Binbin <binloveplay1314@qq.com>
jdheyburn pushed a commit to jdheyburn/valkey that referenced this pull request Jan 8, 2026
This PR restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This
change enables the possibility of a backwards compatible Lua engine
upgrade, as well as, the flexibility in building Valkey without the
Lua engine.

**Important**: from a user's point of view, there's no difference in
using the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward
compatible with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from `src/lua` to `src/modules/lua`. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP
values
was done directly from the client buffer, in the new implementation the
parsing is done from the `ValkeyModuleCallReply` object and respective
API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running `make install` the Lua engine module is installed in the
default system library directory.
There's a new build option, called `BUILD_LUA`, that if set to `no`
allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Additional change: Unload all modules on shutdown (ignoring modules that
can't be unloaded). This is to avoid address sanitizer warnings about
leaked allocations.

Fixes: valkey-io#1627

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
rjd15372 pushed a commit that referenced this pull request Jan 12, 2026
…ion (#3030)

Before #2858, we has this check:
```
static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) {
    if (run_ctx->flags & SCRIPT_ALLOW_OOM) {
        /* Allow running any command even if OOM reached */
        return C_OK;
    }

    /* If we reached the memory limit configured via maxmemory, commands that
     * could enlarge the memory usage are not allowed, but only if this is the
     * first write in the context of this script, otherwise we can't stop
     * in the middle. */

    if (server.maxmemory &&                          /* Maxmemory is actually enabled. */
        !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */
        !(run_ctx->flags & SCRIPT_WRITE_DIRTY) &&    /* Script had no side effects so far. */
        server.pre_command_oom_state &&              /* Detected OOM when script start. */
        (run_ctx->c->cmd->flags & CMD_DENYOOM)) {
        *err = sdsdup(shared.oomerr->ptr);
        return C_ERR;
    }

    return C_OK;
}
```

If we reached the memory limit configured via maxmemory, commands that
could enlarge the memory usage are not allowed, but only if this is the
first write in the context of this script, otherwise we can't stop
in the middle.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
rainsupreme added a commit to rainsupreme/valkey that referenced this pull request Feb 11, 2026
This was commented out in valkey-io#2858 but I don't know why - it looks like it might have been accidental.

Signed-off-by: Rain Valentine <rsg000@gmail.com>
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
…ion (valkey-io#3030)

Before valkey-io#2858, we has this check:
```
static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) {
    if (run_ctx->flags & SCRIPT_ALLOW_OOM) {
        /* Allow running any command even if OOM reached */
        return C_OK;
    }

    /* If we reached the memory limit configured via maxmemory, commands that
     * could enlarge the memory usage are not allowed, but only if this is the
     * first write in the context of this script, otherwise we can't stop
     * in the middle. */

    if (server.maxmemory &&                          /* Maxmemory is actually enabled. */
        !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */
        !(run_ctx->flags & SCRIPT_WRITE_DIRTY) &&    /* Script had no side effects so far. */
        server.pre_command_oom_state &&              /* Detected OOM when script start. */
        (run_ctx->c->cmd->flags & CMD_DENYOOM)) {
        *err = sdsdup(shared.oomerr->ptr);
        return C_ERR;
    }

    return C_OK;
}
```

If we reached the memory limit configured via maxmemory, commands that
could enlarge the memory usage are not allowed, but only if this is the
first write in the context of this script, otherwise we can't stop
in the middle.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast pushed a commit that referenced this pull request Feb 24, 2026
…y test in crash report (#3029)

The memory test was commented out in #2858 and should have been
reenabled. On further investigation I found that the server hangs during
shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes
deadlock because the lock was acquired for shutdown but lazy free jobs
require the GIL too:

- main thread: `serverCron()` acquires GIL via `afterSleep()` then calls
`finishShutdown()`, which eventually calls our script module unload code
that calls `bioDrainWorker()`.
- bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()`
call `scriptingEngineCallFreeFunction()` which requires the GIL.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
enjoy-binbin added a commit that referenced this pull request Mar 5, 2026
Before we added LUA as a module we had a logic to NOT register the
luaHook when the value of `busy-reply-threshold` config is set to 0.

Now we ALWAYS register the hook and in order to keep aligned with
old behavior we will let the execution of the script continue from
the interrupt hook when `busy-reply-threshold` config is set == 0.

And in value.conf, we are saying the config can be negative, but in
fact the config is minimum at 0, fix the valkey.conf as well.
```
# The default is 5 seconds. It is possible to set it to 0 or a negative value
# to disable this mechanism (uninterrupted execution)
```

It was introduced in #2858.

Signed-off-by: Alon Arenberg <alonare@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
eifrah-aws pushed a commit to eifrah-aws/valkey that referenced this pull request Mar 5, 2026
Before we added LUA as a module we had a logic to NOT register the
luaHook when the value of `busy-reply-threshold` config is set to 0.

Now we ALWAYS register the hook and in order to keep aligned with
old behavior we will let the execution of the script continue from
the interrupt hook when `busy-reply-threshold` config is set == 0.

And in value.conf, we are saying the config can be negative, but in
fact the config is minimum at 0, fix the valkey.conf as well.
```
# The default is 5 seconds. It is possible to set it to 0 or a negative value
# to disable this mechanism (uninterrupted execution)
```

It was introduced in valkey-io#2858.

Signed-off-by: Alon Arenberg <alonare@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
This PR restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This
change enables the possibility of a backwards compatible Lua engine
upgrade, as well as, the flexibility in building Valkey without the
Lua engine.

**Important**: from a user's point of view, there's no difference in
using the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward
compatible with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from `src/lua` to `src/modules/lua`. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP
values
was done directly from the client buffer, in the new implementation the
parsing is done from the `ValkeyModuleCallReply` object and respective
API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running `make install` the Lua engine module is installed in the
default system library directory.
There's a new build option, called `BUILD_LUA`, that if set to `no`
allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Additional change: Unload all modules on shutdown (ignoring modules that
can't be unloaded). This is to avoid address sanitizer warnings about
leaked allocations.

Fixes: valkey-io#1627

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…ion (valkey-io#3030)

Before valkey-io#2858, we has this check:
```
static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) {
    if (run_ctx->flags & SCRIPT_ALLOW_OOM) {
        /* Allow running any command even if OOM reached */
        return C_OK;
    }

    /* If we reached the memory limit configured via maxmemory, commands that
     * could enlarge the memory usage are not allowed, but only if this is the
     * first write in the context of this script, otherwise we can't stop
     * in the middle. */

    if (server.maxmemory &&                          /* Maxmemory is actually enabled. */
        !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */
        !(run_ctx->flags & SCRIPT_WRITE_DIRTY) &&    /* Script had no side effects so far. */
        server.pre_command_oom_state &&              /* Detected OOM when script start. */
        (run_ctx->c->cmd->flags & CMD_DENYOOM)) {
        *err = sdsdup(shared.oomerr->ptr);
        return C_ERR;
    }

    return C_OK;
}
```

If we reached the memory limit configured via maxmemory, commands that
could enlarge the memory usage are not allowed, but only if this is the
first write in the context of this script, otherwise we can't stop
in the middle.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…y test in crash report (valkey-io#3029)

The memory test was commented out in valkey-io#2858 and should have been
reenabled. On further investigation I found that the server hangs during
shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes
deadlock because the lock was acquired for shutdown but lazy free jobs
require the GIL too:

- main thread: `serverCron()` acquires GIL via `afterSleep()` then calls
`finishShutdown()`, which eventually calls our script module unload code
that calls `bioDrainWorker()`.
- bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()`
call `scriptingEngineCallFreeFunction()` which requires the GIL.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
Before we added LUA as a module we had a logic to NOT register the
luaHook when the value of `busy-reply-threshold` config is set to 0.

Now we ALWAYS register the hook and in order to keep aligned with
old behavior we will let the execution of the script continue from
the interrupt hook when `busy-reply-threshold` config is set == 0.

And in value.conf, we are saying the config can be negative, but in
fact the config is minimum at 0, fix the valkey.conf as well.
```
# The default is 5 seconds. It is possible to set it to 0 or a negative value
# to disable this mechanism (uninterrupted execution)
```

It was introduced in valkey-io#2858.

Signed-off-by: Alon Arenberg <alonare@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
akashkgit pushed a commit to akashkgit/valkey that referenced this pull request Mar 6, 2026
Before we added LUA as a module we had a logic to NOT register the
luaHook when the value of `busy-reply-threshold` config is set to 0.

Now we ALWAYS register the hook and in order to keep aligned with
old behavior we will let the execution of the script continue from
the interrupt hook when `busy-reply-threshold` config is set == 0.

And in value.conf, we are saying the config can be negative, but in
fact the config is minimum at 0, fix the valkey.conf as well.
```
# The default is 5 seconds. It is possible to set it to 0 or a negative value
# to disable this mechanism (uninterrupted execution)
```

It was introduced in valkey-io#2858.

Signed-off-by: Alon Arenberg <alonare@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
madolson added a commit to madolson/valkey that referenced this pull request May 10, 2026
In the refactor that moved Lua scripting to the Module API (valkey-io#2858), the
argument object caching was not reimplemented. This causes a measurable
regression in workloads that pass data through redis.call(), because
every invocation allocates and frees ValkeyModuleString objects via
je_malloc/je_sdallocx.

This patch restores caching without depending on robj struct internals:
- Reuse a static argv array across calls (avoids per-call
  ValkeyModule_Alloc + ValkeyModule_Free of the pointer array)
- After each call, retain up to LUA_CMD_OBJCACHE_SIZE (32) small string
  objects via ValkeyModule_RetainString
- On the next call, if the cached SDS buffer (checked via objectGetVal +
  sdsalloc) is large enough, memcpy the new content in-place — zero
  malloc/free for the hot path
- Fall back to ValkeyModule_CreateString on cache miss

The only server internal used is objectGetVal() (resolved at link time)
and sds.h for sdssetlen/sdsalloc. No robj struct layout knowledge is
needed, making this robust against future struct changes.

Benchmark results (5 workloads, c6i.2xlarge, 3 runs each):
- Data transfer (100x100KB reads): -7.8% regression -> -0.5% (fixed)
- pcall error handling (500 iter):  -11.6% regression -> +0.0% (fixed)
- Heavy redis.call() (1200 cmds):   remaining regression from module
  dispatch overhead (not allocation-related)

Signed-off-by: Madelyn Olson <matolson@amazon.com>
webbsssss added a commit to webbsssss/valkey that referenced this pull request May 11, 2026
Address review feedback from @zuiderkwast on valkey-io#3640: instead of calling
zlibc_trim() explicitly from the MEMORY PURGE handler, fold it into
jemalloc_purge() itself. That way the libc main arena is also trimmed
automatically when jemalloc_purge() is invoked from other code paths,
which today means the synchronous variants of FLUSHDB and FLUSHALL in
src/db.c.

This also fixes the historical oddity that @zuiderkwast pointed out:
zlibc_trim() was added in a7abc2f to be called from freeLuaScriptsSync(),
but became dead code when valkey-io#2858 moved Lua to a separate module that calls
malloc_trim(3) directly. zlibc_trim() is now reachable again through the
single jemalloc_purge() entry point.

The async paths (lazyfree-lazy-user-flush=yes, EMPTYDB_ASYNC) intentionally
do not call jemalloc_purge() so they continue to skip the libc trim too -
that is correct: malloc_trim(3) briefly locks every glibc arena and we
don't want to do that on the main thread when the user explicitly asked
for an async flush.

Verified locally on Ubuntu 22.04 / glibc 2.35:

  MEMORY PURGE    : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHALL SYNC   : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHDB SYNC    : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHDB (async) : [heap] 135168 -> 135168 bytes (unchanged, as intended)

The behaviour described by the MEMORY PURGE help text and by the memory
doctor's high_proc_rss diagnostic is unchanged; both still mention the
libc main arena and direct users at MEMORY PURGE.

Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
1fanwang added a commit to 1fanwang/valkey that referenced this pull request May 12, 2026
When the Lua scripting engine was migrated into a module in valkey-io#2858, two
helpers stopped preserving the leading error code:

* `luaPushErrorBuff` no longer prepended a generic "ERR " when the caller
  passed a bare message, so direct API errors (`redis.pcall()` with no
  args, `redis.setresp(4)`, `redis.log(1)`, `redis.set_repl()`,
  `redis.log(10, 'm')`) reach the client as unprefixed strings. The
  function's own comment says it should prepend "ERR", and the pre-module
  implementation did.
* `luaProcessReplyError` and `errorCallback` stripped (or never re-added)
  the leading `-` before calling `luaPushErrorBuff`, so reply-side errors
  like OOM and WRONGTYPE took the bare-message branch above and then —
  once the bare branch is fixed — would have ended up double-coded
  ("ERR OOM ...", "ERR WRONGTYPE ...").

Restore the original behaviour by:

* Re-adding the `-` in both reply-error entry points so the existing
  `"-CODE msg"` branch in `luaPushErrorBuff` preserves the code.
* Prepending "ERR " in the bare-message branch unless the caller already
  supplied the prefix (avoids "ERR ERR ..." on the rewrite paths that
  hard-code "ERR ...").

Adds a regression test that asserts the prefix on the five direct API
errors named in the bug, and that already-coded errors (WRONGTYPE) are
left alone.

Signed-off-by: 1fanwang <1fannnw@gmail.com>
1fanwang added a commit to 1fanwang/valkey that referenced this pull request May 14, 2026
When the Lua scripting engine was migrated into a module in valkey-io#2858, two
helpers stopped preserving the leading error code:

* `luaPushErrorBuff` no longer prepended a generic "ERR " when the caller
  passed a bare message, so direct API errors (`redis.pcall()` with no
  args, `redis.setresp(4)`, `redis.log(1)`, `redis.set_repl()`,
  `redis.log(10, 'm')`) reach the client as unprefixed strings. The
  function's own comment says it should prepend "ERR", and the pre-module
  implementation did.
* `luaProcessReplyError` and `errorCallback` stripped (or never re-added)
  the leading `-` before calling `luaPushErrorBuff`, so reply-side errors
  like OOM and WRONGTYPE took the bare-message branch above and would
  have ended up double-coded once the bare branch is fixed
  ("ERR OOM ...", "ERR WRONGTYPE ...").

Restore the original behaviour by:

* Re-adding the `-` in both reply-error entry points so the existing
  `"-CODE msg"` branch in `luaPushErrorBuff` preserves the code.
* Unconditionally prepending "ERR " in the bare-message branch. Per
  zuiderkwast review feedback, the seven call sites that hard-coded
  "ERR ..." are updated to pass just the message — the contract is now
  that callers either pass `"-CODE msg"` for an explicit code or a bare
  `"msg"` for the default ERR prefix, never `"ERR msg"` themselves.

Adds a regression test that asserts the prefix on the five direct API
errors named in the bug, and that already-coded errors (WRONGTYPE) are
left alone.

Signed-off-by: 1fanwang <1fannnw@gmail.com>
ranshid added a commit that referenced this pull request May 18, 2026
Closes #3663.

## Why

Three callers in `src/modules/lua/script_lua.c` lost error-code handling
during the #2858 move of the Lua scripting engine into a module. On the
wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log,
1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now
return the bare message instead of `ERR <message>`. OOM and WRONGTYPE
happened to keep working because their code letters survived elsewhere
by accident.

Reproducer:

```
$ ./src/valkey-cli eval "return redis.pcall()" 0
"Please specify at least one argument for this redis lib call"   # before this PR
"ERR Please specify at least one argument for this redis lib call"  # with this PR
```

## What changed

In `src/modules/lua/script_lua.c`:

1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR `
unless the message already starts with a code letter (defensive — five
existing callers hardcode `"ERR ..."` strings; double-prefixing them
would also be wrong).
2. `luaProcessReplyError` push_error branch: re-add the leading `-`
before delegating to `luaPushError` so the case-1 code-extraction path
handles `OOM`, `WRONGTYPE`, `READONLY`, etc.
3. `errorCallback` errno==0 branch: same re-add-`-` approach.

41 LoC across the three functions.

## Test

`tests/unit/scripting.tcl` regression at line 1028 covers:

- All five direct-API errors above — asserts `ERR ` prefix
- WRONGTYPE error path — asserts the code survives and is not
double-prefixed
- Runs 4 times under the `foreach is_eval × foreach
script_compatibility_api` matrix

Fails on `unstable`, passes with this PR.

---------

Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
Closes #3663.

## Why

Three callers in `src/modules/lua/script_lua.c` lost error-code handling
during the #2858 move of the Lua scripting engine into a module. On the
wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log,
1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now
return the bare message instead of `ERR <message>`. OOM and WRONGTYPE
happened to keep working because their code letters survived elsewhere
by accident.

Reproducer:

```
$ ./src/valkey-cli eval "return redis.pcall()" 0
"Please specify at least one argument for this redis lib call"   # before this PR
"ERR Please specify at least one argument for this redis lib call"  # with this PR
```

## What changed

In `src/modules/lua/script_lua.c`:

1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR `
unless the message already starts with a code letter (defensive — five
existing callers hardcode `"ERR ..."` strings; double-prefixing them
would also be wrong).
2. `luaProcessReplyError` push_error branch: re-add the leading `-`
before delegating to `luaPushError` so the case-1 code-extraction path
handles `OOM`, `WRONGTYPE`, `READONLY`, etc.
3. `errorCallback` errno==0 branch: same re-add-`-` approach.

41 LoC across the three functions.

## Test

`tests/unit/scripting.tcl` regression at line 1028 covers:

- All five direct-API errors above — asserts `ERR ` prefix
- WRONGTYPE error path — asserts the code survives and is not
double-prefixed
- Runs 4 times under the `foreach is_eval × foreach
script_compatibility_api` matrix

Fails on `unstable`, passes with this PR.

---------

Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
madolson pushed a commit that referenced this pull request May 19, 2026
Closes #3663.

## Why

Three callers in `src/modules/lua/script_lua.c` lost error-code handling
during the #2858 move of the Lua scripting engine into a module. On the
wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log,
1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now
return the bare message instead of `ERR <message>`. OOM and WRONGTYPE
happened to keep working because their code letters survived elsewhere
by accident.

Reproducer:

```
$ ./src/valkey-cli eval "return redis.pcall()" 0
"Please specify at least one argument for this redis lib call"   # before this PR
"ERR Please specify at least one argument for this redis lib call"  # with this PR
```

## What changed

In `src/modules/lua/script_lua.c`:

1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR `
unless the message already starts with a code letter (defensive — five
existing callers hardcode `"ERR ..."` strings; double-prefixing them
would also be wrong).
2. `luaProcessReplyError` push_error branch: re-add the leading `-`
before delegating to `luaPushError` so the case-1 code-extraction path
handles `OOM`, `WRONGTYPE`, `READONLY`, etc.
3. `errorCallback` errno==0 branch: same re-add-`-` approach.

41 LoC across the three functions.

## Test

`tests/unit/scripting.tcl` regression at line 1028 covers:

- All five direct-API errors above — asserts `ERR ` prefix
- WRONGTYPE error path — asserts the code survives and is not
double-prefixed
- Runs 4 times under the `foreach is_eval × foreach
script_compatibility_api` matrix

Fails on `unstable`, passes with this PR.

---------

Signed-off-by: 1fanwang <1fannnw@gmail.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[NEW] Move Lua scripting engine into an external Valkey module

3 participants