SCRIPT FLUSH run truly async, close lua interpreter in bio #13087

enjoy-binbin · 2024-02-22T13:14:33Z

Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of
lua scripts, SCRIPT FLUSH ASYNC will still block the main thread.
This is because lua_close is executed in the main thread, and lua
heap needs to release a lot of memory.

In this PR, we take the current lua instance on lctx.lua and call
lua_close on it in a background thread, to close it in async way.
This is MeirShpilraien's idea.

This solves the first point in #13102.

Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of lua scripts, SCRIPT FLUSH ASYNC will still block the main thread. This is because lua_close is executed in the main thread, and lua heap needs to release a lot of memory. In this PR, we take the current lua instance on lctx.lua and call lua_close on it in a background thread, to close it in async way. This is MeirShpilraien's idea.

enjoy-binbin · 2024-02-22T13:17:46Z

@MeirShpilraien I meet a issue. When closing_lua in the bg thread, i can observe that the corresponding lua interpreter memory is declining (by looking at luaMemory(lua)). But the rss in the info memory still displays it:

# Memory
used_memory:34117688
used_memory_human:32.54M
used_memory_rss:11181293568
used_memory_rss_human:10.41G
used_memory_peak:3018153064
used_memory_peak_human:2.81G
used_memory_peak_perc:1.13%
used_memory_overhead:22944980
used_memory_startup:874952
used_memory_dataset:11172708
used_memory_dataset_perc:33.61%
allocator_allocated:34438672
allocator_active:41156608
allocator_resident:125444096
allocator_muzzy:0
total_system_memory:33190125568
total_system_memory_human:30.91G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.20
allocator_frag_bytes:6717936
allocator_rss_ratio:3.05
allocator_rss_bytes:84287488
rss_overhead_ratio:89.13
rss_overhead_bytes:11055849472
mem_fragmentation_ratio:328.11
mem_fragmentation_bytes:11147215888
mem_not_counted_for_evict:13472
mem_replication_backlog:1066212
mem_total_replication_buffers:1066208
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:12276361

src/eval.c

MeirShpilraien · 2024-02-22T14:25:47Z

I meet a issue. When closing_lua in the bg thread, i can observe that the corresponding lua interpreter memory is declining (by looking at luaMemory(lua)). But the rss in the info memory still displays it

And without the async you do see this memory declining?

enjoy-binbin · 2024-02-22T15:08:30Z

And without the async you do see this memory declining?

ok, it have the same input. these lua body is about 110 bytes, it look like we need to use jemalloc on lua.

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:14.06G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush sync  --> this take about 45s
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep lua
used_memory_lua:31744
used_memory_lua_human:31.00K

but when you do it one more time, it become ok.

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush sync  -- this take few seconds (i guess maybe 1-2 seconds.)                                                    
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep lua
used_memory_lua:31744
used_memory_lua_human:31.00K
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush sync  -- this take about 10s
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:116.14M

MeirShpilraien · 2024-02-22T15:14:06Z

This is what I suspected, Lua uses libc malloc which does not return the memory the same way jmalloc does. I am not sure exactly whats the logic of returning the memory for libc malloc but I believe we can see that its not related to the background release of the Lua interpreter and we can proceed with this fix, WDYT?

enjoy-binbin · 2024-02-22T15:18:06Z

this is another case:
edit: ignore me. The subsequent async actually uses sync because lua_scripts dict is empty. the case just like above

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:14.06G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:14.06G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:14.06G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:13.93G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:12.16G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:11.52G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.66G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async  --this take quite long, i guess 5s?
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async -- this take 1 sec maybe
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:117.60M

MeirShpilraien · 2024-02-22T15:39:37Z

edit: ignore me. The subsequent async actually uses sync because lua_scripts dict is empty. the case just like above

But if its empty it should not take so long, no?

enjoy-binbin · 2024-02-22T15:42:57Z

Yes, it's weird. I guess the second lua_close actually reclaims the memory. I'll try making lua_close always async:

@@ -204,8 +204,8 @@ void freeLuaScriptsAsync(dict *lua_scripts, lua_State *lua) {
         atomicIncr(lazyfree_objects,dictSize(lua_scripts));
         bioCreateLazyFreeJob(lazyFreeLuaScripts,2,lua_scripts,lua);
     } else {
-        dictRelease(lua_scripts);
-        lua_close(lua);
+        atomicIncr(lazyfree_objects,dictSize(lua_scripts));
+        bioCreateLazyFreeJob(lazyFreeLuaScripts,2,lua_scripts,lua);
     }
 }

ok. in this code, flush always async, but the rss won't change.

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G

then i try to use sync, one of them will take 1s, and then the rss will change.

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:10.41G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:117.52M

MeirShpilraien · 2024-02-22T16:13:50Z

Interesting, I wonder if its not releasing the memory because of the libc allocator or because of Lua. Any idea how we can check it?

enjoy-binbin · 2024-02-23T02:09:39Z

Interesting, I wonder if its not releasing the memory because of the libc allocator or because of Lua. Any idea how we can check it?

@sundb do you have any thoughts?

sundb · 2024-02-23T04:19:54Z

use jemalloc as thc Lua allocator, we can see that the Lua VM does get cleaned, but the memory isn't return to
system, and jemalloc and gblic malloc both have the same result.
so it doesnt relate to the background thread.

btw: using jemalloc_purge will eventually return the memory to the system.

#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>

static void *myAlloc(void *ud, void *ptr, size_t osize, size_t nsize) {
    if (nsize == 0) {
        zfree(ptr);
        return NULL;
    } else {
        return zrealloc(ptr, nsize);
    }
}

#define USE_LUA_JEMALLOC 1

int main(int argc, char **argv) {
    struct timeval tv;
    int j;
    char config_from_stdin = 0;

#if USE_LUA_JEMALLOC
    lua_State* L = lua_newstate(myAlloc, NULL); /* Use je as lua allocater*/
#else
    lua_State* L = lua_open(); /* Use default(glibc) as lua allocater*/
#endif
    if (!L) {
        fprintf(stderr, "Failed to create Lua state.\n");
        return 1;
    }

    /* Populate data to lua state. */
    for (int i = 1; i <= 1000000; ++i) {
        char script[100];
        snprintf(script, sizeof(script), "print('Hello from script %d')", i);
        luaL_loadstring(L, script);
        lua_pcall(L, 0, LUA_MULTRET, 0);
    }

    lua_close(L);
    je_malloc_stats_print(NULL, NULL, NULL);          /* <- 1 */
    jemalloc_purge();

    printf("wait for 10s before exit\n");
    sleep(10);

    je_malloc_stats_print(NULL, NULL, NULL);         /* <- 2 */
    return 1;
}

output of 1)

resident:                   242216960

output of 2)

resident:                    19767296

enjoy-binbin · 2024-02-23T04:46:05Z

@sundb thanks, i also did relevant tests this morning using the same RDB and using jemalloc in lua seems to have an good effect (the rss is declining):

binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:12.75G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep lua
used_memory_lua:10003225600
used_memory_lua_human:9.32G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli script flush async
OK
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:12.75G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:12.23G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:9.71G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:8.65G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:4.93G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:4.81G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:2.56G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:1.50G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:1.16G
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:352.84M
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep rss_human
used_memory_rss_human:273.14M

According to the simple results, it seems that this function needs to be introduced together with lua jemalloc. In glibc, async does not seem to work well (it may even give users wrong instructions)

and btw, using jemalloc, lua looks faster. RDB loading used to take 180s, now it takes 150s. Flush sync used to take 45s, now it takes 25s .

oranagra · 2024-02-25T11:55:50Z

i'm not sure we should tie these (get rid of the freeze by doing async, and the change of allocator) together.
i'm quite certain that libc has some documented behavior which in this case take a bit longer to return the memory to the OS, but that eventually it will, and maybe we can find an API to purge it earlier. maybe try malloc_trim?

regarding the change of allocator, indeed jemalloc is much better, and also we know it and trust it.
the problem with using it that it means the allocations of Lua will get mixed with the allocations of redis, and these can prevent redis from being able to defrag it's memory.
i think the right approach to solve that it so use a dedicated arena for that.
i.e. in the allocator callback we pass to lua_newstate, use use MALLOCX_ARENA.

enjoy-binbin · 2024-02-26T09:24:10Z

ok, i see when i added a malloc_trim(0); just after lua_close, both sync and async work. @oranagra thanks. and now, what's next? In what way do i need to add malloc_trim to our code base?

[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep lua && ./src/redis-cli info memory | grep rss_human && ./src/redis-cli info | grep lazyfree
used_memory_lua:31744
used_memory_lua_human:31.00K
used_memory_rss_human:10.39G
lazyfree_pending_objects:12276361
lazyfreed_objects:0
[binbinzhu@VM-13-151-tencentos ~/github/redis]$ ./src/redis-cli info memory | grep lua && ./src/redis-cli info memory | grep rss_human && ./src/redis-cli info | grep lazyfreeused_memory_lua:31744
used_memory_lua_human:31.00K
used_memory_rss_human:97.94M
lazyfree_pending_objects:0
lazyfreed_objects:12276361

oranagra · 2024-02-26T10:15:35Z

i think that we'd want to do that in both sync and async flush.
but only if redis and lua use different allocators (i.e. currently only if redis is not using libc malloc).
i suppose we'd wanna do it right after lua_close (and before lua_open if it wasn't already done).

oranagra · 2024-02-26T10:18:09Z

the next project (if one of you wanna try it), is to use jemalloc with MALLOCX_ARENA as i suggested above.

enjoy-binbin · 2024-02-26T15:15:16Z

i am not good at this OS stuff, so just pushing the code.
full ci: https://github.com/redis/redis/actions/runs/8051173343

src/eval.c

oranagra

did you test it on all the platforms in our CI?

src/eval.c

enjoy-binbin · 2024-02-28T09:53:55Z

did you test it on all the platforms in our CI?

I ran full CI once before, but there was a problem with the alpine build. #13087 (comment)

I now re-push the new code, new full CI: https://github.com/redis/redis/actions/runs/8078422318

But I'm not very familiar with cross-platform OS, so need you to check it out.

Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of lua scripts, SCRIPT FLUSH ASYNC will still block the main thread. This is because lua_close is executed in the main thread, and lua heap needs to release a lot of memory. In this PR, we take the current lua instance on lctx.lua and call lua_close on it in a background thread, to close it in async way. This is MeirShpilraien's idea.

enjoy-binbin commented Feb 22, 2024

View reviewed changes

src/eval.c Show resolved Hide resolved

enjoy-binbin added 3 commits February 26, 2024 20:46

try to use malloc_trim

e29d4b3

try to use malloc_trim

ff3d723

take a try

451eb29

sundb reviewed Feb 27, 2024

View reviewed changes

src/eval.c Outdated Show resolved Hide resolved

oranagra approved these changes Feb 28, 2024

View reviewed changes

oranagra reviewed Feb 28, 2024

View reviewed changes

src/eval.c Outdated Show resolved Hide resolved

add new zlibc_trim

0efdf32

enjoy-binbin marked this pull request as ready for review February 28, 2024 09:54

oranagra merged commit a7abc2f into redis:unstable Feb 28, 2024

oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Feb 28, 2024

enjoy-binbin deleted the close_lua_async branch February 28, 2024 16:35

oranagra mentioned this pull request Mar 2, 2024

improvements around Lua Scripts memory usage #13102

Closed

SCRIPT FLUSH run truly async, close lua interpreter in bio #13087

SCRIPT FLUSH run truly async, close lua interpreter in bio #13087

Uh oh!

Conversation

enjoy-binbin commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enjoy-binbin commented Feb 22, 2024

Uh oh!

Uh oh!

MeirShpilraien commented Feb 22, 2024 • edited by enjoy-binbin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enjoy-binbin commented Feb 22, 2024 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MeirShpilraien commented Feb 22, 2024

Uh oh!

enjoy-binbin commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MeirShpilraien commented Feb 22, 2024

Uh oh!

enjoy-binbin commented Feb 22, 2024 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MeirShpilraien commented Feb 22, 2024

Uh oh!

enjoy-binbin commented Feb 23, 2024

Uh oh!

sundb commented Feb 23, 2024

Uh oh!

enjoy-binbin commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Feb 25, 2024

Uh oh!

enjoy-binbin commented Feb 26, 2024

Uh oh!

oranagra commented Feb 26, 2024

Uh oh!

oranagra commented Feb 26, 2024

Uh oh!

enjoy-binbin commented Feb 26, 2024

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

enjoy-binbin commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

enjoy-binbin commented Feb 22, 2024 •

edited

Loading

MeirShpilraien commented Feb 22, 2024 •

edited by enjoy-binbin

Loading

enjoy-binbin commented Feb 22, 2024 •

edited by oranagra

Loading

enjoy-binbin commented Feb 22, 2024 •

edited

Loading

enjoy-binbin commented Feb 22, 2024 •

edited by oranagra

Loading

enjoy-binbin commented Feb 23, 2024 •

edited

Loading

enjoy-binbin commented Feb 28, 2024 •

edited

Loading