Skip to content

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented May 9, 2022

Scripts that have the no-writes flag, cannot execute write commands, and since all deny-oom commands are write commands, we now act as if the allow-oom flag is implicitly set for scripts that set the no-writes flag.
this also implicitly means that the EVAL*_RO and FCALL_RO commands can never fails with OOM error.

Note about a bug that's no longer relevant:
There was an issue with EVAL*_RO using shebang not being blocked correctly in OOM state:
When an EVAL script declares a shebang, it was by default not allowed to run in OOM state.
but this depends on a flag that is updated before the command is executed, which was not updated in case of the _RO variants.
the result is that if the previous cached state was outdated (either true or false), the script will either unjustly fail with OOM, or unjustly allowed to run despite the OOM state.
It doesn't affect scripts without a shebang since these depend on the actual commands they run, and since these are only read commands, they don't care for that cached oom state flag.
it did affect scripts with shebang and no allow-oom flag, bug after the change in this PR, scripts that are run with eval_ro would implicitly have that flag so again the cached state doesn't matter.

todo:

p.s. this isn't a breaking change since all it does is allow scripts to run when they should / could rather than blocking them.

Comment on lines 1430 to 1435
# Fail to execute regardless of RO script content when we use default flags in OOM condition
assert_error {OOM allow-oom flag is not set on the script, can not run it when used memory > 'maxmemory'} {
r eval_ro {#!lua flags=
return 1
} 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All write commands are implicitly use-memory, and since they are disallowed in EVAL*_RO I would expect that an EVAL*_RO to never fail on OOM conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. but the shebang flags can define that the script should not be run in OOM condition.
do you think we should ignore that flag in EVAL_RO, FCALL_RO and let the script run even when we're in OOM state?
@MeirShpilraien @yoav-steinberg WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it more, a script that declares a shebang, has to declare the no-writes in order to be used in EVAL_RO.
that means that the entity that defines when the script will fail, is the script author, not the caller (i.e. the script code not which command was used).
this is certainly true for functions where the caller doesn't necessarily knows what the script really does.

so i think we should focus on two things:

  1. it's quite clear what should be done with functions, and i suppose we want eval to be consistent with that.
  2. i suppose the only debate here should be about eval scripts that don't declare a shebang at all.

i think these work just fine in both EVAL and EVAL_RO. i'll add a test to demonstrate that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOM doesn't really make sense in the context of EVAL_RO scripts though. It seems like an unexpected failure mode to have a script, which doesn't use memory, to fail on OOM conditions if you add a shebang. Can't we just document that these two modes are different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document what?
that in both EVAL_RO and FCALL_RO, the default flags are ignored and the script is executed as if it declared allow-oom?

would you consider doing the same for no-writes?
i.e. user will be able to call a function that didn't declare no-writes by using FCALL_RO?
the script would start running, and then abort with error if it uses any write command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean that any attempt to run a function with FCALL_RO should fail unless it is marked with allow-oom (even when we're not in OOM state)?

i think that would be suitable for a command named FCALL_OOM, buy luckily we don't have / need one.
i kinda regret adding that command in the first place (EVAL_RO). if it's sole purpose is for client routing to replicas, and the grantee that no matter the script it cannot do any writes, i don't see why it should require the allow-oom flag.

can you explain again what's the benefit of doing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simpler idea is that no-write implies allow-oom? I'm not really aware of a case where a non-write command would consume additional memory, so there isn't really a reason to block in OOM scenarios. Since OOM scenarios are infrequent, I would rather guide users to setting the right flags so their application works as expected in OOM scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed that in a core-team meeting and concluded we wanna proceed with this idea (no-write to imply allow-oom), i'll make a separate PR for that and we can further discuss it there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought, although that change is completely unrelated to the other fix in this PR, effects and the tests might be related, so let's keep the discussion here.

So some background:
in #10066 we added the no-write and allow-oom flags, i seem to remember that we did consider setting allow-oom implicitly when no-write is set, it was discussed here #10066 (comment), and was rejected because i wanted similar flags and behavior as normal redis commands and module commands (to avoid confusion).
but later we negated the flags so we have different defaults, so maybe we should reconsider that (despite being a breaking change comparing to redis 7.0.0)

@MeirShpilraien do you remember any other detail?

Copy link
Member Author

@oranagra oranagra May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, i discussed it with Meir.
It seems we clearly stated in the original PR top comment that:

if the function will not turn on this flag it will not be possible to run it if OOM reached (even if the function declares no-writes and even if fcall_ro is used). If this flag is set, any command will be allow on OOM (even those that is marked with CMD_DENYOOM)

But looking back at the original discussions, we can't spot any reason not to revert that decision and have no-writes implicitly count as allow-oom too.

oranagra added 3 commits May 10, 2022 09:52
i decided to keep the flags separate and test both instead of set one
when the other is set.

The tests i added for the bug in server.c not updating the
server.script_oom flag are no longer relevant since that state is now
impossible to test.
@oranagra oranagra changed the title fix issue with EVAL*_RO using shebang not being blocked correctly in OOM state Scripts that declare the no-writes flag are implicitly allow-oom too. May 16, 2022
@oranagra
Copy link
Member Author

updated the code, and top comment.
note that the original bug in this PR is no longer reachable (see top comment)

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels May 16, 2022
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oranagra oranagra merged commit b0e18f8 into redis:unstable May 22, 2022
@oranagra oranagra deleted the eval_ro_oom branch May 22, 2022 13:03
@oranagra oranagra mentioned this pull request Jun 8, 2022
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…too. (redis#10699)

Scripts that have the `no-writes` flag, cannot execute write commands,
and since all `deny-oom` commands are write commands, we now act
as if the `allow-oom` flag is implicitly set for scripts that set the `no-writes` flag.
this also implicitly means that the EVAL*_RO and FCALL_RO commands can
never fails with OOM error.

Note about a bug that's no longer relevant:
There was an issue with EVAL*_RO using shebang not being blocked correctly
in OOM state:
When an EVAL script declares a shebang, it was by default not allowed to run in
OOM state.
but this depends on a flag that is updated before the command is executed, which
was not updated in case of the `_RO` variants.
the result is that if the previous cached state was outdated (either true or false),
the script will either unjustly fail with OOM, or unjustly allowed to run despite
the OOM state.
It doesn't affect scripts without a shebang since these depend on the actual
commands they run, and since these are only read commands, they don't care
for that cached oom state flag.
it did affect scripts with shebang and no allow-oom flag, bug after the change in
this PR, scripts that are run with eval_ro would implicitly have that flag so again
the cached state doesn't matter.

p.s. this isn't a breaking change since all it does is allow scripts to run when they
should / could rather than blocking them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants