Map on HighLevelGraph Layers#6689
Conversation
|
In principle this seems fine to me. At first I was thinking that maybe we should make a |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for your work here @madsbk! Overall this looks good, I've left a few comments below
| layers[k] = layer | ||
| else: | ||
| layers[k] = v | ||
| return HighLevelGraph(layers, self.dependencies) |
There was a problem hiding this comment.
Should we also pass self.key_dependencies through here?
There was a problem hiding this comment.
In map_basic_layers() we cannot assume that key dependencies doesn't changes, func is allowed to modify both keys and tasks.
As an optimization, we could allow the user to provide a new key_dependencies but let's save that for another PR.
| if use_layer_map_task: | ||
| # Overwrite Blockwise.map_tasks with Layer.map_tasks | ||
| blockwise_layer = list(dsk.layers.values())[1] | ||
| blockwise_layer.map_tasks = partial(Layer.map_tasks, blockwise_layer) |
There was a problem hiding this comment.
I'm a little confused as to what we're trying to test here
There was a problem hiding this comment.
Updated the description to more descriptive, hope it makes more sense?
if use_layer_map_task:
# In order to test the default map_tasks() implementation on a Blockwise Layer,
# we overwrite Blockwise.map_tasks with Layer.map_tasks
blockwise_layer = list(dsk.layers.values())[1]
blockwise_layer.map_tasks = partial(Layer.map_tasks, blockwise_layer)Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for the updates @madsbk! Overall this looks good to me. I've left one small comment about a simplification we can make and another about a potential API change
|
|
||
| return HighLevelGraph(ret_layers, ret_dependencies, ret_key_deps) | ||
|
|
||
| def map_basic_layers( |
There was a problem hiding this comment.
API question: What do you think about having a more generic map_layers method where the func being mapped is responsible for checking the layer type? This would be to prevent us from needing to add map_*_layers methods as the number of layer types grows.
Apologies for not bringing this point up earlier in the process
There was a problem hiding this comment.
My sense here is that this probably makes sense, but that it also probably doesn't need to be handled right now. The API choices here don't preclude us adding intermediate methods like map_layers later.
There was a problem hiding this comment.
Sounds good, we can revisit this point later on as needed 👍
There was a problem hiding this comment.
API question: What do you think about having a more generic
map_layersmethod where thefuncbeing mapped is responsible for checking the layer type? This would be to prevent us from needing to addmap_*_layersmethods as the number of layer types grows.
Make sense, my thinking behind map_basic_layers() was to have a map for the layers that are materialized low level graphs. And if the user wants to apply something on all layers, it is always possible to access the layers of a HLG directly.
But I agree, let's wait a bit before we settle on the API.
jrbourbeau
left a comment
There was a problem hiding this comment.
LGTM, thanks for all your work here @madsbk! Planning to merge this once CI passes
This PR introduces two
HighLevelGraphmethods:The motivation is to avoid materialization of high level graphs in Distributed: dask/distributed#4119
black dask/flake8 dask