Conversation
| executor._threads.remove(threading.current_thread()) | ||
| executor._threads.discard(threading.current_thread()) | ||
| rejoin_event.set() | ||
| executor._idle_semaphore.acquire(timeout=0) |
There was a problem hiding this comment.
This is where I'm not entirely sure if we're doing the right thing. If I do not acquire, we're deadlocking right away. Acquiring too often would cause the counter to underestimate idle threads and spawn more than necessary, therefore it should not be a huge deal
There was a problem hiding this comment.
deadlocks happen in test_rejoin_idempotent. I didn't have the patience to debug this any futher since I think it should not be a big deal.
| self._initargs, | ||
| ), | ||
| ) | ||
| t.daemon = True |
There was a problem hiding this comment.
this is probably the biggest change in the PR: https://bugs.python.org/issue39812
we might start seeing some processes sticking around waiting for deadlocked executors
There was a problem hiding this comment.
ah, I missed this one. That explains why the shutdown timeout was "harmless"
There was a problem hiding this comment.
We probably "require" this to be a daemon. Real world user functions may run for hours but I don't think this should block a worker close indefinitely.
#4726 is also relevant in this context
| # We use cpu_count + 4 for both types of tasks. | ||
| # But we limit it to 32 to avoid consuming surprisingly large resource | ||
| # on many core machine. | ||
| max_workers = min(32, (os.cpu_count() or 1) + 4) |
There was a problem hiding this comment.
this is another change I was concerned about, but this is set here
distributed/distributed/worker.py
Lines 965 to 967 in 8c98ad8
| raise RuntimeError("cannot schedule new futures after shutdown") | ||
| if _shutdown: | ||
| raise RuntimeError( | ||
| "cannot schedule new futures after " "interpreter shutdown" |
There was a problem hiding this comment.
https://github.com/keisheiled/flake8-implicit-str-concat
| "cannot schedule new futures after " "interpreter shutdown" | |
| "cannot schedule new futures after interpreter shutdown" |
|
Well, windows tests all run into hard timeouts so this will be interesting and will likely take a bit of time to sort through. |
Unit Test Results 4 files - 8 2 errors 2 suites - 10 0s ⏱️ - 6h 53m 23s For more details on these parsing errors and failures, see this check. Results for commit 4d3b1fb. ± Comparison against base commit 8c98ad8. |
|
Wonder if at some point it makes sense to get the changes we need in |
This updates the threadpool to the most recent version as is currently in python 3.10.2. It required a few compatibility adjustments to make it work for py3.8 but nothing major imo.
Incorporatign secede/rejoin was a bit more tricky and I'm not entirely sure if I didn't put in a bug around the idle counter/semaphore when a thread rejoins. However, if I'm not wrong, a miscounting in this semaphore would only remove the optimization of not spawning additional threads if there are idle workers which I don't consider to be a big deal.
I decided to remove the shutdown timeout that was introduced in #1330
If any of the above reasons are false or I missed anything, adding the shutdown timeout again should be easy enough
cc @graingert