Skip to content

Commit c15a10e

Browse files
authored
Optimise scheduler.get_comm_cost set difference (#6931)
1 parent 2a2c3bb commit c15a10e

1 file changed

Lines changed: 11 additions & 1 deletion

File tree

distributed/scheduler.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2620,7 +2620,17 @@ def get_comm_cost(self, ts: TaskState, ws: WorkerState) -> float:
26202620
on the given worker.
26212621
"""
26222622
dts: TaskState
2623-
deps: set = ts.dependencies.difference(ws.has_what)
2623+
deps: set
2624+
if 10 * len(ts.dependencies) < len(ws.has_what):
2625+
# In the common case where the number of dependencies is
2626+
# much less than the number of tasks that we have,
2627+
# construct the set of deps that require communication in
2628+
# O(len(dependencies)) rather than O(len(has_what)) time.
2629+
# Factor of 10 is a guess at the overhead of explicit
2630+
# iteration as opposed to just calling set.difference
2631+
deps = {dep for dep in ts.dependencies if dep not in ws.has_what}
2632+
else:
2633+
deps = ts.dependencies.difference(ws.has_what)
26242634
nbytes: int = 0
26252635
for dts in deps:
26262636
nbytes += dts.nbytes

0 commit comments

Comments
 (0)