WorkerThreadPool: Revamp interaction with ScriptServer#96959
Conversation
This reverts commit 2d1dd41.
|
Ho so it's not a full revert of the previous PR. We still have the bookkeeping done by the ScriptServer. My bad, I thought I tested with the multithreaded renderer, but it seems I messed up (Doesn't change that I still tested the branch against 4.3-stable and confirmed the difference in behaviour). Otherwise, it seems to be an improvement again. Now the WorkerThreadPool calls thread_exit() directly instead of letting the Thread do it itself, just like with thread_enter(). So if I understand, the order of execution is:
Now my doubt: What happens if tasks are still queued when we switch to |
|
Testing locally I can confirm that this fixes #96931 |
4bb4493 to
c30931a
Compare
c30931a to
82b39f9
Compare
|
@CedNaru, very good assessment. To address the issue about outstanding scripting tasks, I've added a pre-langs-exit runlevel that blocks the addition of new tasks and waits until the queues are empty. I'm not quite happy with the added complexity even before the last push. For all the reviewers: the new push performs some additional refactoring so the new logic is more self-contained and can be more easily replaced or further unified in the future. I have considered many approaches and this is the only one that would work. |
16d431b to
f96ab59
Compare
|
Indeed, that quite a bit of complexity added to handle the ScriptServer within the pool. If I may ask, wouldn't that be possible to invert the logic and make that the ScriptServer, the one with multiple "run levels" instead of the WorkerThreadPool ? The core of the issue is that threads are started before the ScriptServer, and terminated after the ScriptServer. There are probably a lot of reasons why the ScriptServer has to be initialized last and ended first (I can already think of a few myself), Wouldn't a two-stage (before and after servers) initialization and termination of the ScriptServer do the trick ? Sadly, I don't see it being possible without breaking changes to Language (and would probably force languages to register at the core phase at least) but asking anyway. |
- The main thread function and the collaborative wait functions have a much more similar structure than earlier, which yields (pun intended) better maintainability. - Also, there are not assertions anymore about the reason for ending a wait being valid, because spurious awakes can happen and so the assert would fail without that indicating an issue.
f96ab59 to
5d371e3
Compare
|
Sounds intriguing, but I'm not sure it'd be worth the hassle at this point. I've simplified a bit the code so the end result is not that daunting and, provided it does the trick and maintainability is good, I'd deem this one worth mergeable. Let's see if CI succeeds now (there was a missing initializer causing logic errors). |
|
Tested out a single-thread export with this PR and it fixes the issues that currently occur on master. @RandomShaper You could add this PR as fixing #97073. |
clayjohn
left a comment
There was a problem hiding this comment.
Tested locally and confirmed that it is working for the MT servers
|
I'm going to go ahead and merge this as we accidentally merged #95915 first which will introduce the deadlock to many more projects. |
|
Thank you for the quick turnaround! |
|
Added a note about cherry-picking for 4.3 to the main description. |
First, this reverts 2d1dd41 from #96760. The approach there is not enough, given that with separate thread rendering, there's a cyclic dependency chain on shutdown:
WorkerThreadPoolmust finish beforeScriptServer.RenderingServermust finish beforeWorkerThreadPool.ScriptServermust finish beforeRenderingServer.A way of breaking the cycle, which this PR implements, is splitting
WorkerThreadPoolregular lifetime intotwothree🆕 "runlevels":I've put the pure refactor into its own commit, which would help a lot for bisect and also for easier reviewing.
It should still fix #95809 (CC @CedNaru).
Fixes #96931.
Fixes #96978. 🆕
Fixes #97073. 🆕🆕
Cherry-picking note: We don't have
MutexLock::temp_unlock()/temp_relock()in 4.3. Therefore, the most recent commit in this PR needs to have the following patch applied: