Revert "Revert "Use HighLevelGraph layers everywhere in collections (…#6707
Revert "Revert "Use HighLevelGraph layers everywhere in collections (…#6707jrbourbeau merged 1 commit intodask:masterfrom
Conversation
|
Let's wait until we do a release. There were some issues last time around
with string keys in subgraphcallables. I think that we might want to do a
small mid-week release for this as well as anything else that feels
critical, and then start merging in the HLG fixes.
…On Tue, Oct 6, 2020 at 4:35 AM Tom Augspurger ***@***.***> wrote:
…#6510 <#6510>)" (#6697
<#6697>)"
This reverts commit e09d8d9
<e09d8d9>
.
cc @madsbk <https://github.com/madsbk>
------------------------------
You can view, comment on, or merge this pull request online at:
#6707
Commit Summary
- Revert "Revert "Use HighLevelGraph layers everywhere in collections
(#6510)" (#6697)"
File Changes
- *M* dask/array/optimization.py
<https://github.com/dask/dask/pull/6707/files#diff-10351144025e8c061b086321d0d9c9b6>
(33)
- *M* dask/blockwise.py
<https://github.com/dask/dask/pull/6707/files#diff-99ed6f84a26a1697121f5e38e8b555c1>
(55)
- *M* dask/core.py
<https://github.com/dask/dask/pull/6707/files#diff-1ede41868431981d4249ebdea17d91b4>
(28)
- *M* dask/dataframe/io/parquet/core.py
<https://github.com/dask/dask/pull/6707/files#diff-c9c912457d8bd444a48523c4b8d1d58e>
(34)
- *M* dask/dataframe/optimize.py
<https://github.com/dask/dask/pull/6707/files#diff-6737b31c383e68893904fdd3c986b6c7>
(27)
- *M* dask/highlevelgraph.py
<https://github.com/dask/dask/pull/6707/files#diff-813eb8bcdb581604b1788c0e38305049>
(229)
- *M* dask/tests/test_highgraph.py
<https://github.com/dask/dask/pull/6707/files#diff-8cab368f81ee56fe90ad3f18200dea56>
(19)
Patch Links:
- https://github.com/dask/dask/pull/6707.patch
- https://github.com/dask/dask/pull/6707.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6707>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTEMPOHEII4W3KBHXZ3SJL6JTANCNFSM4SF4ZNKA>
.
|
jrbourbeau
left a comment
There was a problem hiding this comment.
Playing around with this PR locally it looks like we can still produce HighLevelGraphs that have raw dictionaries in their layers which causes HighLevelGraph.get_dependencies to fail. For example:
import dask.array as da
x = da.ones(10, chunks=(2,))
dsk = x.__dask_graph__()
# Print out HighLevelGraph layers for x
from pprint import pprint
pprint(dsk.layers)
# Get dependencies for all the keys
deps = dsk.get_dependencies()
print(f"{deps = }")raises and AttributeError:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-61e7febb21f5> in <module>
8
9 # Get dependencies for all the keys
---> 10 deps = dsk.get_dependencies()
11 print(f"{deps = }")
~/projects/dask/dask/dask/highlevelgraph.py in get_dependencies(self)
369 self.key_dependencies = {}
370 for layer in self.layers.values():
--> 371 self.key_dependencies.update(layer.get_dependencies(all_keys))
372
373 return self.key_dependencies
AttributeError: 'dict' object has no attribute 'get_dependencies'
I see there's a HighLevelGraph._fix_hlg_layers_inplace method which ensures all layers are of type Layer. Should we call that at the end of HighLevelGraph.__init__ to make sure we always have a uniform layer type? @madsbk I'd be curious to hear what you think -- I also could be totally missing something
Good point, I totally agree :) I think we should:
Afterwards, we can look at merging:
After which, we can begin the working of sending HLGs directly to the scheduler in Distributed:
|
|
Thanks all for coordinating! |
…#6510)" (#6697)"
This reverts commit e09d8d9.
cc @madsbk