Skip to content

Bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report#3029

Merged
zuiderkwast merged 3 commits into
valkey-io:unstablefrom
rainsupreme:patch-1
Feb 24, 2026
Merged

Bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report#3029
zuiderkwast merged 3 commits into
valkey-io:unstablefrom
rainsupreme:patch-1

Conversation

@rainsupreme

@rainsupreme rainsupreme commented Jan 8, 2026

Copy link
Copy Markdown
Contributor

note: Originally this PR only uncommented the crash report memory test, but I rewrote the PR for the GIL bug I uncovered and fixed. I added it to Valkey 9.1 since it's a follow-up bugfix for a 9.1 change.

The memory test was commented out in #2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the bioDrainWorker(BIO_LAZY_FREE) call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too:

  • main thread: serverCron() acquires GIL via afterSleep() then calls finishShutdown(), which eventually calls our script module unload code that calls bioDrainWorker().
  • bio threads: Pending lazy free jobs such as lazyFreeEvalScripts() call scriptingEngineCallFreeFunction() which requires the GIL.

@rainsupreme

rainsupreme commented Jan 8, 2026

Copy link
Copy Markdown
Contributor Author

maybe something is wrong with it after all? The sanitizer test crashed in doFastMemoryTest 🤷‍♀️

!!! WARNING The following tests failed:

*** [err]: Sanitizer error: =================================================================
==7594==ERROR: AddressSanitizer: unknown-crash on address 0x0000800f7000 at pc 0x7f1272cfb42e bp 0x7fff044601d0 sp 0x7fff0445f978
READ of size 1048576 at 0x0000800f7000 thread T0
    #0 0x7f1272cfb42d in memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115
    #1 0x564db6bb2cf0 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29
    #2 0x564db6bb2cf0 in memtest_preserving_test.constprop.0 /home/runner/work/valkey/valkey/src/memtest.c:304
    #3 0x564db691aadf in memtest_test_linux_anonymous_maps /home/runner/work/valkey/valkey/src/debug.c:2013
    #4 0x564db6922b4a in doFastMemoryTest /home/runner/work/valkey/valkey/src/debug.c:2056
    #5 0x564db6922b4a in printCrashReport /home/runner/work/valkey/valkey/src/debug.c:2260
    #6 0x564db693b38c in sigsegvHandler /home/runner/work/valkey/valkey/src/debug.c:2180
    #7 0x7f127284532f  (/lib/x86_64-linux-gnu/libc.so.6+0x4532f) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #8 0x7f12728ecade in clock_nanosleep (/lib/x86_64-linux-gnu/libc.so.6+0xecade) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #9 0x7f12728f9a26 in __nanosleep (/lib/x86_64-linux-gnu/libc.so.6+0xf9a26) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #10 0x7f127292975b in usleep (/lib/x86_64-linux-gnu/libc.so.6+0x12975b) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #11 0x564db6b7cec9 in bioDrainWorker /home/runner/work/valkey/valkey/src/bio.c:310
    #12 0x564db6b7cec9 in scriptingEngineManagerUnregister /home/runner/work/valkey/valkey/src/scripting_engine.c:198
    #13 0x564db6abc4df in VM_UnregisterScriptingEngine /home/runner/work/valkey/valkey/src/module.c:13742
    #14 0x7f12721be033 in ValkeyModule_OnUnload (/home/runner/work/valkey/valkey/src/modules/lua/libvalkeylua.so+0x16033) (BuildId: 5f3d19be330d3f972ed3c1f4b8cf27d402bbb875)
    #15 0x564db6ab3672 in moduleUnloadInternal /home/runner/work/valkey/valkey/src/module.c:12908
    #16 0x564db6ab57d9 in moduleUnloadAllModules /home/runner/work/valkey/valkey/src/module.c:12977
    #17 0x564db670fb18 in finishShutdown /home/runner/work/valkey/valkey/src/server.c:4792
    #18 0x564db6710fa9 in prepareForShutdown /home/runner/work/valkey/valkey/src/server.c:4598
    #19 0x564db66f4e31 in serverCron /home/runner/work/valkey/valkey/src/server.c:1535
    #20 0x564db66b2094 in processTimeEvents /home/runner/work/valkey/valkey/src/ae.c:370
    #21 0x564db66d090e in aeProcessEvents /home/runner/work/valkey/valkey/src/ae.c:513
    #22 0x564db66d090e in aeProcessEvents /home/runner/work/valkey/valkey/src/ae.c:411
    #23 0x564db66d090e in aeMain /home/runner/work/valkey/valkey/src/ae.c:543
    #24 0x564db66af74e in main /home/runner/work/valkey/valkey/src/server.c:7491
    #25 0x7f127282a1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #26 0x7f127282a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
    #27 0x564db66b1eb4 in _start (/home/runner/work/valkey/valkey/src/valkey-server+0x158eb4) (BuildId: e5a85635ade1185748c3b11573aaf3b79ebeb56a)

Address 0x0000800f7000 is located in the low shadow area.
SUMMARY: AddressSanitizer: unknown-crash ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115 in memcpy
==7594==ABORTING

@enjoy-binbin enjoy-binbin requested a review from rjd15372 January 8, 2026 07:22
This was commented out in valkey-io#2858 but I don't know why - it looks like it might have been accidental.

Signed-off-by: Rain Valentine <rsg000@gmail.com>
@codecov

codecov Bot commented Feb 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (87caeb7) to head (36f7782).
⚠️ Report is 37 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable   #3029       +/-   ##
============================================
- Coverage     74.90%       0   -74.91%     
============================================
  Files           129       0      -129     
  Lines         71327       0    -71327     
============================================
- Hits          53429       0    -53429     
+ Misses        17898       0    -17898     

see 129 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rainsupreme

Copy link
Copy Markdown
Contributor Author

Typically we avoid running ASAN with crash report tests, but in this case it is actually crashing when it should not - it is crashing while unloading the Lua module. I think we need to resolve the crash during shutdown.

…at may access it

Signed-off-by: Rain Valentine <rsg000@gmail.com>
@rainsupreme

Copy link
Copy Markdown
Contributor Author

Hmm. Taking a closer look at the logs, it appears that the server is hanging during shutdown maybe 50 seconds before the test harness terminated it, triggering the SIGSEGV handler and what followed. Gotta look for a hang in a bio job instead of a crash.

@rainsupreme rainsupreme marked this pull request as draft February 13, 2026 21:35
Signed-off-by: Rain Valentine <rsg000@gmail.com>
@rainsupreme rainsupreme marked this pull request as ready for review February 16, 2026 17:57
@rainsupreme rainsupreme changed the title reenable memory test in crash report bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report Feb 16, 2026
@rainsupreme rainsupreme moved this to Needs Review in Valkey 9.1 Feb 16, 2026
@rainsupreme rainsupreme added bug Something isn't working needs-review Maintainer attention labels Feb 16, 2026

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

@enjoy-binbin enjoy-binbin changed the title bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report Bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report Feb 24, 2026
Comment thread src/scripting_engine.c
Comment thread src/debug.c
Comment thread src/scripting_engine.c
@zuiderkwast zuiderkwast merged commit 1ee434e into valkey-io:unstable Feb 24, 2026
93 checks passed
@github-project-automation github-project-automation Bot moved this from Needs Review to Done in Valkey 9.1 Feb 24, 2026
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…y test in crash report (valkey-io#3029)

The memory test was commented out in valkey-io#2858 and should have been
reenabled. On further investigation I found that the server hangs during
shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes
deadlock because the lock was acquired for shutdown but lazy free jobs
require the GIL too:

- main thread: `serverCron()` acquires GIL via `afterSleep()` then calls
`finishShutdown()`, which eventually calls our script module unload code
that calls `bioDrainWorker()`.
- bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()`
call `scriptingEngineCallFreeFunction()` which requires the GIL.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
@rainsupreme rainsupreme deleted the patch-1 branch March 6, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-review Maintainer attention

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants