Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Mar 4, 2024

In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).

Implementation notes:

When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.

New interfaces:

At the same time, a new evicted_scripts field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.

benchmark:

./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return __rand_int__" 0

The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.

For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.

unlikely potentially breaking change:

in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same transaction / pipeline.

This solves the second point in #13102.

In some cases, users will abuse lua eval. Each eval call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues, we implement script approximate LUR
eviction only for these (not for one loaded with SCRIPT LOAD).
Lua scripts are divided into eval scripts and load scripts, and
we will only evict eval scripts.

We call this mechanism "eval scripts eviction".

A new configuration of `maxmemory-eval-scripts` is introduced, which
represents the memory limit used by eval scripts in redis-server
(part of used_memory). The default is 0 means no limit.

At the same time, a new `evicted_eval_scripts` field is added to INFO,
which represents the number of evicted eval scripts due to
`maxmemory-eval-scripts` limit.

The general design is as following:
- Save eval scripts and load scripts in separate dictionaries.
- Update the lua body's lru when created or called.
- Use the approximate LRU pool in evict.c to populate the data.
- In order to reduce code complexity and not affect users, we will
  only check the limit in serverCron (also will control us limit)

This solves the second point in redis#13102.
@enjoy-binbin
Copy link
Contributor Author

I try to keep the diff small and simple. If we click "Hide whitespace" during the review, the diff in evict will look much better. post it out first for some review, and then i will see how to add some test cases.

@enjoy-binbin enjoy-binbin requested a review from oranagra March 4, 2024 11:41
@sundb
Copy link
Collaborator

sundb commented Mar 4, 2024

first impression, too much repetitive code.🙃

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't review it in depth, but here are some random thoughts:

  1. i don't think we need a config for this, when not abused we'll probably have less than 100 scripts, and when abused we'll have many thousands. i think we can use a hard coded value of 500 scripts, and i don't think we need to monitor their memory usage.
  2. considering that we don't have many scripts, then unlike keys, we don't need to worry about the memory usag of keeping a true sorted LRU linked list. IIRC we compute the SHA of each script anyway, and put the script in a dict, we can store a listNode there, and use it for quick removal and re-insertion into an LRU list each time the script is used.

WDYT?

@enjoy-binbin
Copy link
Contributor Author

yes, this sound good to me, this would be much simpler. I also thought this way at first, but I was worried that it would be too "simple". i will re-code this part

@enjoy-binbin enjoy-binbin marked this pull request as ready for review March 6, 2024 09:34
@enjoy-binbin enjoy-binbin changed the title Lua eval scripts eviction Lua eval scripts first in first out LRU eviction Mar 6, 2024
@oranagra oranagra added state:major-decision Requires core team consensus release-notes indication that this issue needs to be mentioned in the release notes labels Mar 7, 2024
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you post some benchmarks and memory usage comparisons of before and after.

@oranagra
Copy link
Member

oranagra commented Mar 7, 2024

while we're waiting for the benchmarks and memory stats.
@redis/core-team please approve the concept and new info field.

@oranagra oranagra added the approval-needed Waiting for core team approval to be merged label Mar 7, 2024
@enjoy-binbin
Copy link
Contributor Author

enjoy-binbin commented Mar 9, 2024

benchmark:
./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return __rand_int__" 0

The simple abuse of eval benchmark test that will create 1 million eval scripts. The performance has been improved by 50%, and the max lantency has dropped from 500ms to 13ms (this may be caused by table expansion inside Lua when the number of scripts is large). And in the INFO memory, it used to consume 120MB (server cache) + 310MB (lua engine), but now it only consumes 70KB (server cache) + 210KB (lua_engine) because of the scripts eviction.

unstable:

round 1:
Summary:
  throughput summary: 48346.55 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.243     1.176     7.871    22.511    45.727   503.295

used_memory_vm_eval:327702528
used_memory_scripts_eval:120342128
number_of_cached_scripts:999585


round 2:
Summary:
  throughput summary: 48787.63 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.155     1.608     7.919    22.335    44.543   477.951

used_memory_vm_eval:327690240
used_memory_scripts_eval:120347952
number_of_cached_scripts:999637

round 3:
Summary:
  throughput summary: 48564.91 requests per second  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.199     0.744     7.927    22.527    46.367   482.047

used_memory_vm_eval:327727104
used_memory_scripts_eval:120344704
number_of_cached_scripts:999608

this branch:

round 1:
Summary:
  throughput summary: 73719.13 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        6.689     0.784     7.223     7.719    11.471    13.295

used_memory_vm_eval:214016
used_memory_scripts_eval:76096
number_of_cached_scripts:500
evicted_scripts:999500

round2:
Summary:
  throughput summary: 75677.31 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        6.507     0.752     7.079     7.567     7.815    13.151

used_memory_vm_eval:214016
used_memory_scripts_eval:76096
number_of_cached_scripts:500
evicted_scripts:999498

round3:
Summary:
  throughput summary: 75261.54 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        6.544     0.744     7.111     7.575     7.791    11.439

used_memory_vm_eval:214016
used_memory_scripts_eval:76096
number_of_cached_scripts:500
evicted_scripts:999500

@oranagra
Copy link
Member

thanks. i'm more interested in the impact on non abusive cases.
e.g. one with heavy use of 100 scripts.

@enjoy-binbin
Copy link
Contributor Author

enjoy-binbin commented Mar 10, 2024

ohh, right, i have thought about testing non-abusive cases before, but did not came up with a simple/good way.

using -r 100, it will make only create 100 scripts, it seems like there is a slight loss in performance
./src/redis-benchmark -P 10 -n 1000000 -r 100 eval "return __rand_int__" 0

unstable, five consecutive records after restarting:

Summary:
  throughput summary: 244558.56 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.964     0.736     2.079     2.319     2.511     4.471

Summary:
  throughput summary: 246548.31 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.945     0.664     2.063     2.295     2.383     4.223

Summary:
  throughput summary: 244917.94 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.959     0.688     2.087     2.327     2.431     3.255

Summary:
  throughput summary: 243546.03 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.972     0.696     2.103     2.351     2.455     3.415

Summary:
  throughput summary: 246548.31 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.946     0.664     2.055     2.287     2.367     4.383

this PR:

Summary:
  throughput summary: 241021.92 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.993     0.744     2.111     2.351     2.487     6.143

Summary:
  throughput summary: 243486.73 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.969     0.672     2.087     2.319     2.383     2.863

Summary:
  throughput summary: 239578.36 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        2.004     0.936     2.111     2.367     2.783     4.335

Summary:
  throughput summary: 241779.48 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.984     0.648     2.111     2.351     2.447     3.759

Summary:
  throughput summary: 240963.84 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.993     0.664     2.119     2.359     2.495     4.151

memory usage:

unstable:
used_memory_lua:63488
used_memory_scripts_eval:12224

this branch:
used_memory_lua:63488
used_memory_scripts_eval:15424

@oranagra
Copy link
Member

that's a 0.13% change (insignificant).
just to cover everything, let's also measure the memory usage impact.

@enjoy-binbin
Copy link
Contributor Author

right, i updated it in the comments. Basically a script will take 32 more bytes than before (listNode * in luaScript and listNode in lua_scripts_lru_list)

unstable:
used_memory_lua:63488
used_memory_scripts_eval:12224

this branch:
used_memory_lua:63488
used_memory_scripts_eval:15424

@zuiderkwast
Copy link
Contributor

it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.

We should update these docs to make this clear.

@enjoy-binbin enjoy-binbin added the state:needs-doc-pr requires a PR to redis-doc repository label Mar 11, 2024
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directional approval, with some minor comments.

@oranagra
Copy link
Member

approved in a core-team meeting. decided to tag it as a breaking change.
@enjoy-binbin can you make a redis-doc PR with some adjustments (the new info field, and maybe a word about script eviction if you find place for it)

@oranagra oranagra merged commit ad28d22 into redis:unstable Mar 13, 2024
@oranagra oranagra added the breaking-change This change can potentially break existing application label Mar 13, 2024
@enjoy-binbin enjoy-binbin deleted the lua_eval_scripts_eviction branch March 13, 2024 06:29
sundb added a commit that referenced this pull request Sep 2, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by #13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear
unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).

### Implementation notes:
When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.

### New interfaces:
At the same time, a new `evicted_scripts` field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.

### benchmark:
`./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return
__rand_int__" 0`

The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.

For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.

### unlikely potentially breaking change:
in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same
transaction / pipeline.

This solves the second point in redis#13102.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 28, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 29, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
sundb added a commit to YaacovHazan/redis that referenced this pull request Sep 30, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants