[DRAFT] Failed attempt to directly use a Thread for repo fetching#22110
Closed
[DRAFT] Failed attempt to directly use a Thread for repo fetching#22110
Conversation
I managed to reproduce some deadlocks during repo fetching with virtual worker threads. One notable trigger was some _other_ repo failing to fetch, which seems to cause Skyframe to try to interrupt other concurrent repo fetches. This _might_ be the cause for a deadlock where we submit a task to the worker executor service, but the task never starts running before it gets cancelled, which causes us to wait forever for a `DONE` signal that never comes. (The worker task puts a `DONE` signal in the queue in a `finally` block -- but we don't even enter the `try`.) I then tried various things to fix this; this PR is an attempt that actually seemed to eliminate the deadlock. Instead of waiting for a `DONE` signal to make sure the worker thread has finished, we now hold on to the executor service, which offers a `close()` method that essentially uninterruptibly waits for any scheduled tasks to terminate, whether or not they have started running. (@justinhorvitz had suggested a similar idea before.) To make sure distinct repo fetches don't interfere with each other, we start a separate worker executor service for each repo fetch instead of making everyone share the same worker executor service. (This is recommended for virtual threads; see https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html#GUID-C0FEE349-D998-4C9D-B032-E01D06BE55F2 for example.) Related: #22003
Wyverald
commented
Apr 24, 2024
Member
Author
Wyverald
left a comment
There was a problem hiding this comment.
this setup has a deadlock that I couldn't work out how to eliminate.
| workerThread = null; | ||
| if (myWorkerThread != null) { | ||
| myWorkerThread.interrupt(); | ||
| Uninterruptibles.joinUninterruptibly(myWorkerThread); |
Member
Author
There was a problem hiding this comment.
host thread deadlocks here (from StarlarkRepositoryFunction.java:180)
| state.recordedInputValues, | ||
| key))); | ||
| } catch (Throwable e) { | ||
| state.signalQueue.put(new Signal.Failure(e)); |
Member
Author
There was a problem hiding this comment.
worker thread deadlocks here
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Failed attempt of #22100