Restore ERR prefix and error-code preservation in Lua scripting#3678
Conversation
66927f0 to
f27c92b
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3678 +/- ##
============================================
- Coverage 76.84% 76.71% -0.14%
============================================
Files 162 162
Lines 80689 80689
============================================
- Hits 62009 61903 -106
- Misses 18680 18786 +106 🚀 New features to boost your workflow:
|
When the Lua scripting engine was migrated into a module in valkey-io#2858, two helpers stopped preserving the leading error code: * `luaPushErrorBuff` no longer prepended a generic "ERR " when the caller passed a bare message, so direct API errors (`redis.pcall()` with no args, `redis.setresp(4)`, `redis.log(1)`, `redis.set_repl()`, `redis.log(10, 'm')`) reach the client as unprefixed strings. The function's own comment says it should prepend "ERR", and the pre-module implementation did. * `luaProcessReplyError` and `errorCallback` stripped (or never re-added) the leading `-` before calling `luaPushErrorBuff`, so reply-side errors like OOM and WRONGTYPE took the bare-message branch above and would have ended up double-coded once the bare branch is fixed ("ERR OOM ...", "ERR WRONGTYPE ..."). Restore the original behaviour by: * Re-adding the `-` in both reply-error entry points so the existing `"-CODE msg"` branch in `luaPushErrorBuff` preserves the code. * Unconditionally prepending "ERR " in the bare-message branch. Per zuiderkwast review feedback, the seven call sites that hard-coded "ERR ..." are updated to pass just the message — the contract is now that callers either pass `"-CODE msg"` for an explicit code or a bare `"msg"` for the default ERR prefix, never `"ERR msg"` themselves. Adds a regression test that asserts the prefix on the five direct API errors named in the bug, and that already-coded errors (WRONGTYPE) are left alone. Signed-off-by: 1fanwang <1fannnw@gmail.com>
f27c92b to
3fe1b09
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated Lua error handling in ChangesLua RESP Error Code Preservation
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/modules/lua/script_lua.c (1)
766-773:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDrop the embedded
ERRin the ACL rewrite.This branch still constructs
"ERR ACL failure in script: ..."and then passes it throughluaPushError(), so it now comes out asERR ERR ACL failure in script: .... Pass a bare message here, or-ERR ...if you want an explicit code preserved.Suggested fix
- const char *err_prefix = "ERR ACL failure in script: "; + const char *err_prefix = "ACL failure in script: "; size_t err_len = strlen(err_prefix) + strlen(err + strlen("NOPERM ")) + 1; char *err_msg = ValkeyModule_Alloc(err_len * sizeof(char)); bzero(err_msg, err_len); strcpy(err_msg, err_prefix); strcat(err_msg, err + strlen("NOPERM ")); luaPushError(lua, err_msg);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modules/lua/script_lua.c` around lines 766 - 773, The code currently builds an error string with the literal "ERR ACL failure in script: " then calls luaPushError, causing a double "ERR ERR ..." output; change the constructed prefix (err_prefix) to a bare message like "ACL failure in script: " (or to a protocol-safe "-ERR ACL failure in script: " if you need an explicit code) so luaPushError receives only the message body; update the err_len computation, keep using ValkeyModule_Alloc/ValkeyModule_Free and strcat/strcpy as before, and ensure the substring extraction (err + strlen("NOPERM ")) remains unchanged.
🧹 Nitpick comments (1)
tests/unit/scripting.tcl (1)
1033-1043: ⚡ Quick winTighten these assertions so they fail on the bad output.
This regression block still misses part of the behavior it is trying to lock in: the no-arg case goes through
redis.call()instead ofredis.pcall(), and theERR *server.log()/ERR *server.set_repl()globs would still pass onERR ERR .... Coverredis.pcall()directly and remove the wildcard after the initialERRfor the double-prefix checks.Possible update
- assert_error "ERR Please specify at least one argument*" { - run_script "return redis.call()" 0 - } + assert_match "ERR Please specify at least one argument*" \ + [run_script "local res = redis.pcall(); return res.err" 0] @@ - assert_match "ERR *server.log() requires two arguments or more.*" \ + assert_match "ERR server.log() requires two arguments or more.*" \ [run_script "local ok, err = pcall(redis.log, 1); return err" 0] @@ - assert_match "ERR *server.set_repl() requires one argument.*" \ + assert_match "ERR server.set_repl() requires one argument.*" \ [run_script "local ok, err = pcall(redis.set_repl); return err" 0]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/scripting.tcl` around lines 1033 - 1043, Change the no-arg test to call redis.pcall instead of redis.call (update the run_script call inside assert_error to use "redis.pcall()") so the error path matches pcall behavior, and tighten the two double-prefix assertions by removing the wildcard after the initial "ERR " (replace patterns like "ERR *server.log()" and "ERR *server.set_repl()" with "ERR server.log()" and "ERR server.set_repl()") so they won’t accept an extra "ERR " prefix; leave the redis.setresp and invalid log level assertions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/modules/lua/script_lua.c`:
- Around line 766-773: The code currently builds an error string with the
literal "ERR ACL failure in script: " then calls luaPushError, causing a double
"ERR ERR ..." output; change the constructed prefix (err_prefix) to a bare
message like "ACL failure in script: " (or to a protocol-safe "-ERR ACL failure
in script: " if you need an explicit code) so luaPushError receives only the
message body; update the err_len computation, keep using
ValkeyModule_Alloc/ValkeyModule_Free and strcat/strcpy as before, and ensure the
substring extraction (err + strlen("NOPERM ")) remains unchanged.
---
Nitpick comments:
In `@tests/unit/scripting.tcl`:
- Around line 1033-1043: Change the no-arg test to call redis.pcall instead of
redis.call (update the run_script call inside assert_error to use
"redis.pcall()") so the error path matches pcall behavior, and tighten the two
double-prefix assertions by removing the wildcard after the initial "ERR "
(replace patterns like "ERR *server.log()" and "ERR *server.set_repl()" with
"ERR server.log()" and "ERR server.set_repl()") so they won’t accept an extra
"ERR " prefix; leave the redis.setresp and invalid log level assertions
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4406f9b6-ea16-43a1-944f-64a17ec1f335
📒 Files selected for processing (2)
src/modules/lua/script_lua.ctests/unit/scripting.tcl
zuiderkwast
left a comment
There was a problem hiding this comment.
Thanks! One "ERR" remaining and some minor comments.
|
@1fanwang - are you going to followup on the comments by @zuiderkwast ? We are about to release 9.1 and we need this to be completed in order to take it. |
|
@ranshid Will make the change, push and then merge it. |
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/modules/lua/script_lua.c (1)
278-299:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon't mutate
err_bufferin the documented-CODE msgpath.Line 291 writes
'\0'intoerr_buffer, but this function now explicitly documents"-CODE msg"as a supported input format while maintaining aconst char *API contract. Although current callers (lines 786, 959) safely use mutable buffers fromlm_asprintf(), the const-correctness violation is undefined behavior and will trap future callers who attempt to pass string literals or read-only memory. Parse the code without modifying caller memory.Suggested fix
- if (err_buffer[0] == '-') { - /* derive error code from the message */ - char *err_msg = strstr(err_buffer, " "); - if (!err_msg) { - msg = lm_strcpy(err_buffer + 1); - final_msg = lm_asprintf("ERR %s", msg); - } else { - *err_msg = '\0'; - msg = lm_strcpy(err_msg + 1); - msg = lm_strtrim(msg, "\r\n"); - final_msg = lm_asprintf("%s %s", err_buffer + 1, msg); - } + if (err_buffer[0] == '-') { + /* Derive the explicit error code without mutating caller memory. */ + const char *err_msg = strchr(err_buffer + 1, ' '); + if (!err_msg) { + msg = lm_strcpy(err_buffer + 1); + final_msg = lm_asprintf("ERR %s", msg); + } else { + size_t code_len = (size_t)(err_msg - (err_buffer + 1)); + msg = lm_strcpy(err_msg + 1); + msg = lm_strtrim(msg, "\r\n"); + final_msg = lm_asprintf("%.*s %s", (int)code_len, err_buffer + 1, msg); + } } else {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modules/lua/script_lua.c` around lines 278 - 299, The code currently mutates err_buffer by writing '\0' via *err_msg = '\0' when parsing the "-CODE msg" form; instead, avoid modifying caller memory by locating the first space (err_msg = strstr(err_buffer, " ")) and then extracting the code and message into new strings or using pointer arithmetic without writes: copy the message part with lm_strcpy/lm_strndup (e.g., copy from err_msg+1) and copy the code part by creating a temporary string from err_buffer+1 up to the space (or use lm_asprintf with a %.*s width), then lm_strtrim the copied message and build final_msg with lm_asprintf("%s %s", code_copy, msg_copy); ensure all uses reference the new copies (msg, code) and remove the in-place *err_msg = '\0' mutation so err_buffer remains const-safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/modules/lua/script_lua.c`:
- Around line 278-299: The code currently mutates err_buffer by writing '\0' via
*err_msg = '\0' when parsing the "-CODE msg" form; instead, avoid modifying
caller memory by locating the first space (err_msg = strstr(err_buffer, " "))
and then extracting the code and message into new strings or using pointer
arithmetic without writes: copy the message part with lm_strcpy/lm_strndup
(e.g., copy from err_msg+1) and copy the code part by creating a temporary
string from err_buffer+1 up to the space (or use lm_asprintf with a %.*s width),
then lm_strtrim the copied message and build final_msg with lm_asprintf("%s %s",
code_copy, msg_copy); ensure all uses reference the new copies (msg, code) and
remove the in-place *err_msg = '\0' mutation so err_buffer remains const-safe.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9151f1ae-55c3-4a9c-8078-295d9bbbf805
📒 Files selected for processing (1)
src/modules/lua/script_lua.c
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
The previous commit re-saved the entire file with CRLF line endings via the GitHub web suggestion editor, producing a 4300-line diff that hid the actual one-line fix and risks failing clang-format-18. Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
|
Thanks a lot for your support @ranshid @madolson @zuiderkwast |
Closes #3663. ## Why Three callers in `src/modules/lua/script_lua.c` lost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log, 1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now return the bare message instead of `ERR <message>`. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident. Reproducer: ``` $ ./src/valkey-cli eval "return redis.pcall()" 0 "Please specify at least one argument for this redis lib call" # before this PR "ERR Please specify at least one argument for this redis lib call" # with this PR ``` ## What changed In `src/modules/lua/script_lua.c`: 1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR ` unless the message already starts with a code letter (defensive — five existing callers hardcode `"ERR ..."` strings; double-prefixing them would also be wrong). 2. `luaProcessReplyError` push_error branch: re-add the leading `-` before delegating to `luaPushError` so the case-1 code-extraction path handles `OOM`, `WRONGTYPE`, `READONLY`, etc. 3. `errorCallback` errno==0 branch: same re-add-`-` approach. 41 LoC across the three functions. ## Test `tests/unit/scripting.tcl` regression at line 1028 covers: - All five direct-API errors above — asserts `ERR ` prefix - WRONGTYPE error path — asserts the code survives and is not double-prefixed - Runs 4 times under the `foreach is_eval × foreach script_compatibility_api` matrix Fails on `unstable`, passes with this PR. --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Closes #3663. ## Why Three callers in `src/modules/lua/script_lua.c` lost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire, `redis.pcall()` (no args), `redis.setresp(4)`, `pcall(redis.log, 1)`, `pcall(redis.log, 10, 'msg')`, and `pcall(redis.set_repl)` now return the bare message instead of `ERR <message>`. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident. Reproducer: ``` $ ./src/valkey-cli eval "return redis.pcall()" 0 "Please specify at least one argument for this redis lib call" # before this PR "ERR Please specify at least one argument for this redis lib call" # with this PR ``` ## What changed In `src/modules/lua/script_lua.c`: 1. `luaPushErrorBuff` — bare-message branch (case 2): prepend `ERR ` unless the message already starts with a code letter (defensive — five existing callers hardcode `"ERR ..."` strings; double-prefixing them would also be wrong). 2. `luaProcessReplyError` push_error branch: re-add the leading `-` before delegating to `luaPushError` so the case-1 code-extraction path handles `OOM`, `WRONGTYPE`, `READONLY`, etc. 3. `errorCallback` errno==0 branch: same re-add-`-` approach. 41 LoC across the three functions. ## Test `tests/unit/scripting.tcl` regression at line 1028 covers: - All five direct-API errors above — asserts `ERR ` prefix - WRONGTYPE error path — asserts the code survives and is not double-prefixed - Runs 4 times under the `foreach is_eval × foreach script_compatibility_api` matrix Fails on `unstable`, passes with this PR. --------- Signed-off-by: 1fanwang <1fannnw@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Closes #3663.
Why
Three callers in
src/modules/lua/script_lua.clost error-code handling during the #2858 move of the Lua scripting engine into a module. On the wire,redis.pcall()(no args),redis.setresp(4),pcall(redis.log, 1),pcall(redis.log, 10, 'msg'), andpcall(redis.set_repl)now return the bare message instead ofERR <message>. OOM and WRONGTYPE happened to keep working because their code letters survived elsewhere by accident.Reproducer:
What changed
In
src/modules/lua/script_lua.c:luaPushErrorBuff— bare-message branch (case 2): prependERRunless the message already starts with a code letter (defensive — five existing callers hardcode"ERR ..."strings; double-prefixing them would also be wrong).luaProcessReplyErrorpush_error branch: re-add the leading-before delegating toluaPushErrorso the case-1 code-extraction path handlesOOM,WRONGTYPE,READONLY, etc.errorCallbackerrno==0 branch: same re-add--approach.41 LoC across the three functions.
Test
tests/unit/scripting.tclregression at line 1028 covers:ERRprefixforeach is_eval × foreach script_compatibility_apimatrixFails on
unstable, passes with this PR.