Moving high level graph pack/unpack Dask #7179
Conversation
8a31de8 to
38413e6
Compare
96f4709 to
07af9a2
Compare
3827be1 to
0f4d5d6
Compare
ian-r-rose
left a comment
There was a problem hiding this comment.
Thanks for taking this on @madsbk, this looks good to me
|
Thanks for working on this @madsbk. I aim to provide feedback later today. |
Co-authored-by: crusaderky <crusaderky@gmail.com>
| expanded[a] = v | ||
| else: | ||
| if not keys_stringified: | ||
| keys = [stringify(k) for k in keys] | ||
| keys_stringified = True | ||
|
|
||
| expanded[a] = dict.fromkeys(keys, v) | ||
|
|
||
| # Merge the expanded annotations with the existing annotations mapping | ||
| for k, v in expanded.items(): | ||
| v.update(annotations.get(k, {})) | ||
| annotations.update(expanded) |
There was a problem hiding this comment.
| expanded[a] = v | |
| else: | |
| if not keys_stringified: | |
| keys = [stringify(k) for k in keys] | |
| keys_stringified = True | |
| expanded[a] = dict.fromkeys(keys, v) | |
| # Merge the expanded annotations with the existing annotations mapping | |
| for k, v in expanded.items(): | |
| v.update(annotations.get(k, {})) | |
| annotations.update(expanded) | |
| else: | |
| if not keys_stringified: | |
| keys = [stringify(k) for k in keys] | |
| keys_stringified = True | |
| v = dict.fromkeys(keys, v) | |
| try: | |
| v.update(annotations[a]) | |
| except KeyError: | |
| pass | |
| annotations[a] = v |
There was a problem hiding this comment.
In case of collision, the original annotations silently win; is it deliberate?
There was a problem hiding this comment.
I think this may be a case of the original annotations (i.e. supplied as kwargs to Client.{submit, persist} overriding layer annotations. @ian-r-rose may know more here.
There was a problem hiding this comment.
I think this may be a case of the original annotations (i.e. supplied as kwargs to Client.{submit, persist} overriding layer annotations
Yes, this was also what I was thinking. @ian-r-rose what do you think?
There was a problem hiding this comment.
I think this may be a case of the original annotations (i.e. supplied as kwargs to Client.{submit, persist} overriding layer annotations.
Sorry for the slow reply here. Yes, that was my understanding as well. However, I don't have a strong intuition of which should win, and I can think of an argument for both (more specific annotation vs later-applied annotation).
|
@madsbk Here's my suggestion, I wasn't able to use your fork as a base for a PR: Summary of changes:
More broadly, MaterialisedLayer is now the single point of responsibility for transmitting materialised graphs. Layer returns to a more abstract form (is_materialised return False).
|
|
I have a hard time seeing the advantages of your approach. Personally, I was hoping that we could avoid introducing a new protocol/requirements for the packed state returned by |
I think I understand that you're asking why the materialised packing behaviour has been moved into MaterialisedLayer? If so, its an attempt to separate interface from implementation. In that sense, Layer is the interface while the derived classes are implementations of that interface. The reason for this is that Implementing materialised behaviour in the interface (base class) could potentially constrain the behaviour of derived classes in future: better to place in the behaviour in a specific implementation class. I get that it may seem like quibbling, but inheritance hierarchies can get out of hand. |
I agree inheritance hierarchies can get out of hand but now that we have a class hierarchy and we have a perfectly suited default implementation that is compatible with any conceivable derived class, I think it makes sense. Particularly, if the alternative is an introduction of a key scheme for the packed state dict. |
OK, I see your point. I'm happy with the PR as it stands then. |
As suggested by @sjperkins, this makes it easier to extend the API in the future.
Layer.__reduce__() isn't used by the client -> scheduler communication any more.
Previously, BasicLayer could be partial materialized, which isn't possible any more. Now BasicLayer is always materialized.
|
@sjperkins, thanks for the discussion I have added your suggestions to let Also @crusaderky, @ian-r-rose, thanks for the reviews. @jrbourbeau, I think this is ready to be merged after the removal of the CI hack. Could you take a look and if you agree, we should revert |
|
@jrbourbeau, I think we should merge this soon. Can you revert |
jrbourbeau
left a comment
There was a problem hiding this comment.
Apologies for the delayed review, thank you for the ping @madsbk. Overall this PR and dask/distributed#4489 look good to me. My primary comment is that there are a few places where I think using the full annotations term instead of anno for shorthand will improve code clarity. Otherwise, I think we're good to revert the temporary CI changes here and merge.
|
Hmm actually it looks like there's an issue with how we're handling annotations (see CI over in dask/distributed#4489 -- for example this CI build) |
4db5b83 to
b6e0d20
Compare
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @madsbk! This, and the companion PR in distributed, look good to me : )
Since we may be releasing tomorrow (xref dask/community#129 (comment)), and the changes here are an internal refactor instead of addressing user-facing issues, I'm inclined to wait a day before merging these PRs (until after the release) to give the changes here a little time to simmer in the main branch before pushing them out to users
Sounds good! |
|
Note I've opened up a follow-up PR at #7279 to resolve a CI issue |
|
Thanks James! 😄 |
Important: remove CI hack before merge!
This PR moves all pack and unpacking of high level graphs from Distributed to Dask. This work is based on the discussion in dask/distributed#4406.
Beside the move,
unpacknow doesn't modify the input graph and dependencies, instead the unpacked graph and dependencies a returned, which makes it possible to delegate the pack and unpack of annotations to the base class.Notice, this PR includes a hack that makes CI install dask/distributed#4489 before testing. This should be removed before merging.
black dask/flake8 daskcc. @sjperkins, @ian-r-rose, @jrbourbeau