[WIP] Update threadpool by fjetter · Pull Request #5893 · dask/distributed

fjetter · 2022-03-03T11:32:22Z

This updates the threadpool to the most recent version as is currently in python 3.10.2. It required a few compatibility adjustments to make it work for py3.8 but nothing major imo.

Incorporatign secede/rejoin was a bit more tricky and I'm not entirely sure if I didn't put in a bug around the idle counter/semaphore when a thread rejoins. However, if I'm not wrong, a miscounting in this semaphore would only remove the optimization of not spawning additional threads if there are idle workers which I don't consider to be a big deal.

I decided to remove the shutdown timeout that was introduced in #1330

The tests never actually worked. They test a lower limit for the shutdown but even if the timeout was removed, all tests would pass. Rather, the tests should've checked an upper threshold
I am not entirely convinced this is necessary, if not even harmful. Afaik, the python interpreter will not shut down unless all (non-daemon) threads finished, therefore timing out the shutdown would never close a worker faster in a real application than if we waited for the shutdown to finish properly.
In tests, this can overshadow flaws in our tests and will most definitely leak threads that run concurrently to other tests. We should avoid this as good as possible. Worst case would be that the threads never finish and we'd hit the pytest-timeout. I suspect these problems should be straight forward to debug.
The original issue mentions a memory leak but I cannot see how this relates to the shutdown.

If any of the above reasons are false or I missed anything, adding the shutdown timeout again should be easy enough

cc @graingert

fjetter · 2022-03-03T11:35:04Z

distributed/threadpoolexecutor.py

-                    executor._threads.remove(threading.current_thread())
+                    executor._threads.discard(threading.current_thread())
                    rejoin_event.set()
+                    executor._idle_semaphore.acquire(timeout=0)


This is where I'm not entirely sure if we're doing the right thing. If I do not acquire, we're deadlocking right away. Acquiring too often would cause the counter to underestimate idle threads and spawn more than necessary, therefore it should not be a huge deal

deadlocks happen in test_rejoin_idempotent. I didn't have the patience to debug this any futher since I think it should not be a big deal.

graingert · 2022-03-03T11:37:29Z

distributed/_concurrent_futures_thread.py

+                    self._initargs,
+                ),
            )
-            t.daemon = True


this is probably the biggest change in the PR: https://bugs.python.org/issue39812

we might start seeing some processes sticking around waiting for deadlocked executors

ah, I missed this one. That explains why the shutdown timeout was "harmless"

We probably "require" this to be a daemon. Real world user functions may run for hours but I don't think this should block a worker close indefinitely.
#4726 is also relevant in this context

graingert · 2022-03-03T11:50:18Z

distributed/_concurrent_futures_thread.py

+            # We use cpu_count + 4 for both types of tasks.
+            # But we limit it to 32 to avoid consuming surprisingly large resource
+            # on many core machine.
+            max_workers = min(32, (os.cpu_count() or 1) + 4)


this is another change I was concerned about, but this is set here

distributed/distributed/worker.py

Lines 965 to 967 in 8c98ad8

self.executors["default"] = ThreadPoolExecutor(

self.nthreads, thread_name_prefix="Dask-Default-Threads"

)

graingert · 2022-03-03T11:51:31Z

distributed/_concurrent_futures_thread.py

                raise RuntimeError("cannot schedule new futures after shutdown")
+            if _shutdown:
+                raise RuntimeError(
+                    "cannot schedule new futures after " "interpreter shutdown"


https://github.com/keisheiled/flake8-implicit-str-concat

Suggested change

"cannot schedule new futures after " "interpreter shutdown"

"cannot schedule new futures after interpreter shutdown"

fjetter · 2022-03-03T12:42:24Z

Well, windows tests all run into hard timeouts so this will be interesting and will likely take a bit of time to sort through.

github-actions · 2022-03-03T15:31:33Z

Unit Test Results

      4 files -         8     2 errors   2 suites - 10 0s ⏱️ - 6h 53m 23s
  511 tests -   2 110 471 ✔️ -   2 070 29 💤 -   51 11 ❌ +11
1 022 runs - 14 628 933 ✔️ - 13 854 70 💤 - 793 19 ❌ +19

For more details on these parsing errors and failures, see this check.

Results for commit 4d3b1fb. ± Comparison against base commit 8c98ad8.

jakirkham · 2022-04-09T02:31:31Z

Wonder if at some point it makes sense to get the changes we need in ThreadPoolExecutor in CPython itself

fjetter added 2 commits March 3, 2022 11:54

Update threadpool

25ef131

Remove timeout tests

e7ccfcd

fjetter commented Mar 3, 2022

View reviewed changes

backport for <3.10

a8cba46

graingert reviewed Mar 3, 2022

View reviewed changes

fjetter mentioned this pull request Mar 3, 2022

Unblock event loop while waiting for ThreadpoolExecutor to shut down #5883

Merged

graingert reviewed Mar 3, 2022

View reviewed changes

Use daemon threads

4d3b1fb

fjetter mentioned this pull request Jun 22, 2022

WIP / RFC Remove custom threadpoolexecutor #6607

Closed

fjetter closed this Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Update threadpool#5893

[WIP] Update threadpool#5893
fjetter wants to merge 4 commits intodask:mainfrom
fjetter:update_threadpool

fjetter commented Mar 3, 2022 •

edited

Loading

Uh oh!

fjetter Mar 3, 2022

Uh oh!

fjetter Mar 3, 2022

Uh oh!

graingert Mar 3, 2022 •

edited

Loading

Uh oh!

fjetter Mar 3, 2022

Uh oh!

fjetter Mar 3, 2022

Uh oh!

graingert Mar 3, 2022

Uh oh!

graingert Mar 3, 2022

Uh oh!

fjetter commented Mar 3, 2022

Uh oh!

github-actions bot commented Mar 3, 2022

Uh oh!

jakirkham commented Apr 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.executors["default"] = ThreadPoolExecutor(
	self.nthreads, thread_name_prefix="Dask-Default-Threads"
	)

	"cannot schedule new futures after " "interpreter shutdown"
	"cannot schedule new futures after interpreter shutdown"

Uh oh!

Conversation

fjetter commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjetter Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

fjetter Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

graingert Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjetter Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

fjetter Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

graingert Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

graingert Mar 3, 2022

Choose a reason for hiding this comment

Uh oh!

fjetter commented Mar 3, 2022

Uh oh!

github-actions bot commented Mar 3, 2022

Unit Test Results

Uh oh!

jakirkham commented Apr 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fjetter commented Mar 3, 2022 •

edited

Loading

graingert Mar 3, 2022 •

edited

Loading