Bugfix for GIL deadlock while unloading script engine, reenable memory test in crash report#3029
Merged
Merged
Conversation
Contributor
Author
|
maybe something is wrong with it after all? The sanitizer test crashed in doFastMemoryTest 🤷♀️ |
enjoy-binbin
approved these changes
Jan 8, 2026
sarthakaggarwal97
approved these changes
Jan 10, 2026
This was commented out in valkey-io#2858 but I don't know why - it looks like it might have been accidental. Signed-off-by: Rain Valentine <rsg000@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3029 +/- ##
============================================
- Coverage 74.90% 0 -74.91%
============================================
Files 129 0 -129
Lines 71327 0 -71327
============================================
- Hits 53429 0 -53429
+ Misses 17898 0 -17898 🚀 New features to boost your workflow:
|
Contributor
Author
|
Typically we avoid running ASAN with crash report tests, but in this case it is actually crashing when it should not - it is crashing while unloading the Lua module. I think we need to resolve the crash during shutdown. |
…at may access it Signed-off-by: Rain Valentine <rsg000@gmail.com>
Contributor
Author
|
Hmm. Taking a closer look at the logs, it appears that the server is hanging during shutdown maybe 50 seconds before the test harness terminated it, triggering the SIGSEGV handler and what followed. Gotta look for a hang in a bio job instead of a crash. |
Signed-off-by: Rain Valentine <rsg000@gmail.com>
zuiderkwast
approved these changes
Feb 24, 2026
sarthakaggarwal97
approved these changes
Feb 24, 2026
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Mar 5, 2026
…y test in crash report (valkey-io#3029) The memory test was commented out in valkey-io#2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too: - main thread: `serverCron()` acquires GIL via `afterSleep()` then calls `finishShutdown()`, which eventually calls our script module unload code that calls `bioDrainWorker()`. - bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()` call `scriptingEngineCallFreeFunction()` which requires the GIL. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
note: Originally this PR only uncommented the crash report memory test, but I rewrote the PR for the GIL bug I uncovered and fixed. I added it to Valkey 9.1 since it's a follow-up bugfix for a 9.1 change.
The memory test was commented out in #2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the
bioDrainWorker(BIO_LAZY_FREE)call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too:serverCron()acquires GIL viaafterSleep()then callsfinishShutdown(), which eventually calls our script module unload code that callsbioDrainWorker().lazyFreeEvalScripts()callscriptingEngineCallFreeFunction()which requires the GIL.