-
Notifications
You must be signed in to change notification settings - Fork 24.4k
SCRIPT FLUSH run truly async, close lua interpreter in bio #13087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of lua scripts, SCRIPT FLUSH ASYNC will still block the main thread. This is because lua_close is executed in the main thread, and lua heap needs to release a lot of memory. In this PR, we take the current lua instance on lctx.lua and call lua_close on it in a background thread, to close it in async way. This is MeirShpilraien's idea.
|
@MeirShpilraien I meet a issue. When closing_lua in the bg thread, i can observe that the corresponding lua interpreter memory is declining (by looking at luaMemory(lua)). But the rss in the info memory still displays it: |
And without the async you do see this memory declining? |
ok, it have the same input. these lua body is about 110 bytes, it look like we need to use jemalloc on lua. but when you do it one more time, it become ok. |
|
This is what I suspected, Lua uses libc malloc which does not return the memory the same way jmalloc does. I am not sure exactly whats the logic of returning the memory for libc malloc but I believe we can see that its not related to the background release of the Lua interpreter and we can proceed with this fix, WDYT? |
|
this is another case: |
But if its empty it should not take so long, no? |
|
Yes, it's weird. I guess the second lua_close actually reclaims the memory. I'll try making lua_close always async: @@ -204,8 +204,8 @@ void freeLuaScriptsAsync(dict *lua_scripts, lua_State *lua) {
atomicIncr(lazyfree_objects,dictSize(lua_scripts));
bioCreateLazyFreeJob(lazyFreeLuaScripts,2,lua_scripts,lua);
} else {
- dictRelease(lua_scripts);
- lua_close(lua);
+ atomicIncr(lazyfree_objects,dictSize(lua_scripts));
+ bioCreateLazyFreeJob(lazyFreeLuaScripts,2,lua_scripts,lua);
}
}
ok. in this code, flush always async, but the rss won't change. then i try to use sync, one of them will take 1s, and then the rss will change. |
|
Interesting, I wonder if its not releasing the memory because of the libc allocator or because of Lua. Any idea how we can check it? |
@sundb do you have any thoughts? |
|
use jemalloc as thc Lua allocator, we can see that the Lua VM does get cleaned, but the memory isn't return to btw: using jemalloc_purge will eventually return the memory to the system. #include <lua.h>
#include <lauxlib.h>
#include <lualib.h>
static void *myAlloc(void *ud, void *ptr, size_t osize, size_t nsize) {
if (nsize == 0) {
zfree(ptr);
return NULL;
} else {
return zrealloc(ptr, nsize);
}
}
#define USE_LUA_JEMALLOC 1
int main(int argc, char **argv) {
struct timeval tv;
int j;
char config_from_stdin = 0;
#if USE_LUA_JEMALLOC
lua_State* L = lua_newstate(myAlloc, NULL); /* Use je as lua allocater*/
#else
lua_State* L = lua_open(); /* Use default(glibc) as lua allocater*/
#endif
if (!L) {
fprintf(stderr, "Failed to create Lua state.\n");
return 1;
}
/* Populate data to lua state. */
for (int i = 1; i <= 1000000; ++i) {
char script[100];
snprintf(script, sizeof(script), "print('Hello from script %d')", i);
luaL_loadstring(L, script);
lua_pcall(L, 0, LUA_MULTRET, 0);
}
lua_close(L);
je_malloc_stats_print(NULL, NULL, NULL); /* <- 1 */
jemalloc_purge();
printf("wait for 10s before exit\n");
sleep(10);
je_malloc_stats_print(NULL, NULL, NULL); /* <- 2 */
return 1;
}output of 1) output of 2) |
|
@sundb thanks, i also did relevant tests this morning using the same RDB and using jemalloc in lua seems to have an good effect (the rss is declining): According to the simple results, it seems that this function needs to be introduced together with lua jemalloc. In glibc, async does not seem to work well (it may even give users wrong instructions) and btw, using jemalloc, lua looks faster. RDB loading used to take 180s, now it takes 150s. Flush sync used to take 45s, now it takes 25s . |
|
i'm not sure we should tie these (get rid of the freeze by doing async, and the change of allocator) together. regarding the change of allocator, indeed jemalloc is much better, and also we know it and trust it. |
|
ok, i see when i added a |
|
i think that we'd want to do that in both sync and async flush. |
|
the next project (if one of you wanna try it), is to use jemalloc with MALLOCX_ARENA as i suggested above. |
|
i am not good at this OS stuff, so just pushing the code. |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you test it on all the platforms in our CI?
I ran full CI once before, but there was a problem with the alpine build. #13087 (comment) I now re-push the new code, new full CI: https://github.com/redis/redis/actions/runs/8078422318 But I'm not very familiar with cross-platform OS, so need you to check it out. |
Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of lua scripts, SCRIPT FLUSH ASYNC will still block the main thread. This is because lua_close is executed in the main thread, and lua heap needs to release a lot of memory. In this PR, we take the current lua instance on lctx.lua and call lua_close on it in a background thread, to close it in async way. This is MeirShpilraien's idea.
Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of
lua scripts, SCRIPT FLUSH ASYNC will still block the main thread.
This is because lua_close is executed in the main thread, and lua
heap needs to release a lot of memory.
In this PR, we take the current lua instance on lctx.lua and call
lua_close on it in a background thread, to close it in async way.
This is MeirShpilraien's idea.
This solves the first point in #13102.