Skip to content

Commit f577913

Browse files
authored
Avoid sorting large stacks in order (#3298)
When performning task ordering we sort tasks based on the number of dependents/dependencies they have. This is critical to low-memory processing. However, sometimes individual tasks have millions of dependencies, for which an n*log(n) sort adds significant overhead. In these cases we give up on sorting, and just hope that the tasks are well ordered naturally (such as is often the case in Python 3.6+ due to sorted dicts and the natural ordering that exists when constructing common graphs) See pangeo-data/pangeo#150 (comment) for a real-world case
1 parent 10c46a1 commit f577913

2 files changed

Lines changed: 11 additions & 3 deletions

File tree

dask/order.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,10 @@ def dependents_key(x):
109109
i = 0
110110

111111
stack = [k for k, v in dependents.items() if not v]
112-
stack = sorted(stack, key=dependencies_key)
112+
if len(stack) < 10000:
113+
stack = sorted(stack, key=dependencies_key)
114+
else:
115+
stack = stack[::-1]
113116

114117
while stack:
115118
item = stack.pop()
@@ -120,7 +123,9 @@ def dependents_key(x):
120123
deps = waiting[item]
121124
if deps:
122125
stack.append(item)
123-
stack.extend(sorted(deps, key=dependencies_key))
126+
if len(deps) < 1000:
127+
deps = sorted(deps, key=dependencies_key)
128+
stack.extend(deps)
124129
continue
125130

126131
result[item] = i
@@ -130,7 +135,9 @@ def dependents_key(x):
130135
waiting[dep].discard(item)
131136

132137
deps = [d for d in dependents[item] if d not in result]
133-
stack.extend(sorted(deps, key=dependents_key, reverse=True))
138+
if len(deps) < 1000:
139+
deps = sorted(deps, key=dependents_key, reverse=True)
140+
stack.extend(deps)
134141

135142
return result
136143

docs/source/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Core
3030

3131
- Fixed bug when using unexpected but hashable types for keys (:pr:`3238`) `Daniel Collins`_
3232
- Fix bug in task ordering so that we break ties consistently with the key name (:pr:`3271`) `Matthew Rocklin`_
33+
- Avoid sorting tasks in order when the number of tasks is very large (:pr:`3298`) `Matthew Rocklin`_
3334

3435

3536
0.17.1 / 2018-02-22

0 commit comments

Comments
 (0)