Skip to content

Revert "Revert "Use HighLevelGraph layers everywhere in collections (…#6707

Merged
jrbourbeau merged 1 commit intodask:masterfrom
TomAugspurger:revert-revert
Oct 8, 2020
Merged

Revert "Revert "Use HighLevelGraph layers everywhere in collections (…#6707
jrbourbeau merged 1 commit intodask:masterfrom
TomAugspurger:revert-revert

Conversation

@TomAugspurger
Copy link
Member

#6510)" (#6697)"

This reverts commit e09d8d9.

cc @madsbk

@mrocklin
Copy link
Member

mrocklin commented Oct 6, 2020 via email

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing around with this PR locally it looks like we can still produce HighLevelGraphs that have raw dictionaries in their layers which causes HighLevelGraph.get_dependencies to fail. For example:

import dask.array as da

x = da.ones(10, chunks=(2,))
dsk = x.__dask_graph__()
# Print out HighLevelGraph layers for x
from pprint import pprint
pprint(dsk.layers)

# Get dependencies for all the keys
deps = dsk.get_dependencies()
print(f"{deps = }")

raises and AttributeError:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-61e7febb21f5> in <module>
      8
      9 # Get dependencies for all the keys
---> 10 deps = dsk.get_dependencies()
     11 print(f"{deps = }")

~/projects/dask/dask/dask/highlevelgraph.py in get_dependencies(self)
    369             self.key_dependencies = {}
    370             for layer in self.layers.values():
--> 371                 self.key_dependencies.update(layer.get_dependencies(all_keys))
    372
    373         return self.key_dependencies

AttributeError: 'dict' object has no attribute 'get_dependencies'

I see there's a HighLevelGraph._fix_hlg_layers_inplace method which ensures all layers are of type Layer. Should we call that at the end of HighLevelGraph.__init__ to make sure we always have a uniform layer type? @madsbk I'd be curious to hear what you think -- I also could be totally missing something

@madsbk
Copy link
Contributor

madsbk commented Oct 8, 2020

I see there's a HighLevelGraph._fix_hlg_layers_inplace method which ensures all layers are of type Layer. Should we call that at the end of HighLevelGraph.__init__ to make sure we always have a uniform layer type? @madsbk I'd be curious to hear what you think -- I also could be totally missing something

Good point, I totally agree :) I think we should:

  1. Merge this PR
  2. Merge HLG: get_dependencies() of single keys #6699
  3. Create a PR that moves HighLevelGraph._fix_hlg_layers_inplace() into HighLevelGraph.__init__()

Afterwards, we can look at merging:

After which, we can begin the working of sending HLGs directly to the scheduler in Distributed:

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice outline @madsbk, that gives me a much better sense of how the ongoing HLG pieces fit together.

+1 to merging this PR, then #6699, and then moving _fix_hlg_layers_inplace to HighLeveLGraph.__init__. I'll leave this open for a bit for feedback, but then plan to start merging

@jrbourbeau jrbourbeau merged commit d4447e7 into dask:master Oct 8, 2020
@rjzamora
Copy link
Member

rjzamora commented Oct 8, 2020

Thanks all for coordinating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants