-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Scripts that declare the no-writes flag are implicitly allow-oom too.
#10699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/unit/scripting.tcl
Outdated
| # Fail to execute regardless of RO script content when we use default flags in OOM condition | ||
| assert_error {OOM allow-oom flag is not set on the script, can not run it when used memory > 'maxmemory'} { | ||
| r eval_ro {#!lua flags= | ||
| return 1 | ||
| } 0 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All write commands are implicitly use-memory, and since they are disallowed in EVAL*_RO I would expect that an EVAL*_RO to never fail on OOM conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right. but the shebang flags can define that the script should not be run in OOM condition.
do you think we should ignore that flag in EVAL_RO, FCALL_RO and let the script run even when we're in OOM state?
@MeirShpilraien @yoav-steinberg WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it more, a script that declares a shebang, has to declare the no-writes in order to be used in EVAL_RO.
that means that the entity that defines when the script will fail, is the script author, not the caller (i.e. the script code not which command was used).
this is certainly true for functions where the caller doesn't necessarily knows what the script really does.
so i think we should focus on two things:
- it's quite clear what should be done with functions, and i suppose we want eval to be consistent with that.
- i suppose the only debate here should be about eval scripts that don't declare a shebang at all.
i think these work just fine in both EVAL and EVAL_RO. i'll add a test to demonstrate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOM doesn't really make sense in the context of EVAL_RO scripts though. It seems like an unexpected failure mode to have a script, which doesn't use memory, to fail on OOM conditions if you add a shebang. Can't we just document that these two modes are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document what?
that in both EVAL_RO and FCALL_RO, the default flags are ignored and the script is executed as if it declared allow-oom?
would you consider doing the same for no-writes?
i.e. user will be able to call a function that didn't declare no-writes by using FCALL_RO?
the script would start running, and then abort with error if it uses any write command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean that any attempt to run a function with FCALL_RO should fail unless it is marked with allow-oom (even when we're not in OOM state)?
i think that would be suitable for a command named FCALL_OOM, buy luckily we don't have / need one.
i kinda regret adding that command in the first place (EVAL_RO). if it's sole purpose is for client routing to replicas, and the grantee that no matter the script it cannot do any writes, i don't see why it should require the allow-oom flag.
can you explain again what's the benefit of doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a simpler idea is that no-write implies allow-oom? I'm not really aware of a case where a non-write command would consume additional memory, so there isn't really a reason to block in OOM scenarios. Since OOM scenarios are infrequent, I would rather guide users to setting the right flags so their application works as expected in OOM scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed that in a core-team meeting and concluded we wanna proceed with this idea (no-write to imply allow-oom), i'll make a separate PR for that and we can further discuss it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a second thought, although that change is completely unrelated to the other fix in this PR, effects and the tests might be related, so let's keep the discussion here.
So some background:
in #10066 we added the no-write and allow-oom flags, i seem to remember that we did consider setting allow-oom implicitly when no-write is set, it was discussed here #10066 (comment), and was rejected because i wanted similar flags and behavior as normal redis commands and module commands (to avoid confusion).
but later we negated the flags so we have different defaults, so maybe we should reconsider that (despite being a breaking change comparing to redis 7.0.0)
@MeirShpilraien do you remember any other detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, i discussed it with Meir.
It seems we clearly stated in the original PR top comment that:
if the function will not turn on this flag it will not be possible to run it if OOM reached (even if the function declares
no-writesand even iffcall_rois used). If this flag is set, any command will be allow on OOM (even those that is marked with CMD_DENYOOM)
But looking back at the original discussions, we can't spot any reason not to revert that decision and have no-writes implicitly count as allow-oom too.
i decided to keep the flags separate and test both instead of set one when the other is set. The tests i added for the bug in server.c not updating the server.script_oom flag are no longer relevant since that state is now impossible to test.
no-writes flag are implicitly allow-oom too.
|
updated the code, and top comment. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…too. (redis#10699) Scripts that have the `no-writes` flag, cannot execute write commands, and since all `deny-oom` commands are write commands, we now act as if the `allow-oom` flag is implicitly set for scripts that set the `no-writes` flag. this also implicitly means that the EVAL*_RO and FCALL_RO commands can never fails with OOM error. Note about a bug that's no longer relevant: There was an issue with EVAL*_RO using shebang not being blocked correctly in OOM state: When an EVAL script declares a shebang, it was by default not allowed to run in OOM state. but this depends on a flag that is updated before the command is executed, which was not updated in case of the `_RO` variants. the result is that if the previous cached state was outdated (either true or false), the script will either unjustly fail with OOM, or unjustly allowed to run despite the OOM state. It doesn't affect scripts without a shebang since these depend on the actual commands they run, and since these are only read commands, they don't care for that cached oom state flag. it did affect scripts with shebang and no allow-oom flag, bug after the change in this PR, scripts that are run with eval_ro would implicitly have that flag so again the cached state doesn't matter. p.s. this isn't a breaking change since all it does is allow scripts to run when they should / could rather than blocking them.
Scripts that have the
no-writesflag, cannot execute write commands, and since alldeny-oomcommands are write commands, we now act as if theallow-oomflag is implicitly set for scripts that set theno-writesflag.this also implicitly means that the EVAL*_RO and FCALL_RO commands can never fails with OOM error.
Note about a bug that's no longer relevant:
There was an issue with EVAL*_RO using shebang not being blocked correctly in OOM state:
When an EVAL script declares a shebang, it was by default not allowed to run in OOM state.
but this depends on a flag that is updated before the command is executed, which was not updated in case of the
_ROvariants.the result is that if the previous cached state was outdated (either true or false), the script will either unjustly fail with OOM, or unjustly allowed to run despite the OOM state.
It doesn't affect scripts without a shebang since these depend on the actual commands they run, and since these are only read commands, they don't care for that cached oom state flag.
it did affect scripts with shebang and no allow-oom flag, bug after the change in this PR, scripts that are run with eval_ro would implicitly have that flag so again the cached state doesn't matter.
todo:
p.s. this isn't a breaking change since all it does is allow scripts to run when they should / could rather than blocking them.