Conversation
|
@jrbourbeau and I were talking about how to effectively coax you into doing
this work when you submitted the PR :)
…On Mon, Oct 26, 2020 at 10:28 AM Simon Perkins ***@***.***> wrote:
- Tests added / passed
- Passes black dask / flake8 dask
------------------------------
You can view, comment on, or merge this pull request online at:
#6767
Commit Summary
- Initial Implementation
File Changes
- *M* dask/__init__.py
<https://github.com/dask/dask/pull/6767/files#diff-fee11b514bc43bbb0e0b832b28dd310d252e2c95f9960ab391c4f3013db5fe63>
(1)
- *A* dask/annotations.py
<https://github.com/dask/dask/pull/6767/files#diff-4f5b17ce3625ec1aa614d4dc7342124429386e22002704c00af155d16002b388>
(39)
- *M* dask/highlevelgraph.py
<https://github.com/dask/dask/pull/6767/files#diff-fcf20b06a1a7fbca83c546765f996db29477b9d43cbccf64f3426ef7cc6eddec>
(83)
- *M* dask/tests/test_highgraph.py
<https://github.com/dask/dask/pull/6767/files#diff-b476bb508a6642113dfb259ad53ac6fdf53eacb6c4fa7b112eec2463ce257cfb>
(60)
Patch Links:
- https://github.com/dask/dask/pull/6767.patch
- https://github.com/dask/dask/pull/6767.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6767>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFRLTSFAFZFF6LDKE3SMWWTTANCNFSM4S7V44LA>
.
|
|
@mrocklin I got delayed while preparing this: https://www.radiocamera.io/seminars/dask ;-) Looking forward to plugging this into distributed once dask/distributed#4140 is merged. |
|
/cc @madsbk |
* Pass Mapping key views into LayerAnnotations * Convert LayerAnnotations sets to keys during pickling * Warn on MapLayerAnnotations, due to remote execution issues.
|
I'd say this PR is ready for review. |
|
Thanks for starting this @sjperkins My initial reaction is that this seems somewhat verbose. I wonder if there are ways that we can simplify the implementation and the UI. For the stack, I wonder if it would make sense to reuse the existing dask.config functionality @contextmanager
def annotate(**kwargs):
with dask.config.set({"annotations": kwargs}):
yieldI suspect that this doesn't work directly due to nesting. It looks like it might work though if we change around keys a bit. Here is a brief example In [5]: with dask.config.set({"a.b": 1}):
...: with dask.config.set({"a.c": 2}):
...: print(dask.config.get("a"))
...:
...:
{'b': 1, 'c': 2}
In [6]: with dask.config.set({"a": {"b": 1}}):
...: with dask.config.set({"a": {"c": 2}}):
...: print(dask.config.get("a"))
...:
...:
{'c': 2}To avoid the three classes of annotations, I'm inclined to just leave the values attached to the Layer as-is, and not worry too much about making a taxonomy just yet. This seems premature. Also, regarding tests, I would love to see tests only use user facing API. This both helps to communicate intent to other devs, and also lets future devs modify the implementation and still get the benefit of the existing test suite. To that end, I would expect tests to look a lot simpler, maybe something like the following: def test_annotations():
with dask.annotate(a=1):
x = da.ones(10)
with dask.annotate(b=toolz.second):
y = x + 1
assert y.__dask_graph__().layers[x.name].annotations == {"a": 1}
assert y.__dask_graph__().layers[y.name].annotations == {"a": 1, "b": toolz.second}In writing this test I'm actually a little sad about having to go into the implementation details of the HLG. I'll be more excited about testing this when I can bring up the distributed scheduler and use priorities and things. But at least I'm not constructing layers by hand or dealing with custom Annotation subclasses. Thoughts on any of this? I tend to be more allergic to building systems than most developers, so I might be overreacting here. |
|
I'm happy to use dask.config.
In general I agree with you, and I've reworked the tests to use the high-level Arrays user interfaces. The user would normally never see these objects.
Are you suggesting something like: The reason I've refrained from allowing callables within annotations is due to remote execution security concerns as per these comments: #6701 (comment) and #6701 (comment). This is actually one of the main reasons for the LayerAnnotation classes: To disallow this while attempting to provide some sort of compression of annotations for transfer to the scheduler. Would the expansion of annotations be more appropriate on the Layer class itself? I'm happy to experiment. |
Yes.
I think that the system that consumes the collections and sends along the annotations can handle this for us before it gets to the scheduler. Code-wise I think that Dask core should not be prescriptive about how we represent annnotations. Instead, I think that we will handle this in |
OK, that makes things easier, I'll make those changes. |
dask/core.py
Outdated
|
|
||
|
|
||
| @contextmanager | ||
| def annotate(**kwargs): |
There was a problem hiding this comment.
I think that we could us a docstring here. My guess is that this will be the most commonly read piece of information about annotations. https://docs.dask.org/en/latest/develop.html#docstrings
In particular, I suspect that it will be useful to include a couple of common examples, perhaps with special values from dask.distributed like priority and retries`
There was a problem hiding this comment.
Also, I recommend that we move this to base.py. I think that core.py is mostly based around direct graph manipulation.
There was a problem hiding this comment.
This might also allow us to do from . import config, which might allow us to avoid the underscore namespacing.
There was a problem hiding this comment.
I've added some documentation. More could be added when HLG transmission to the scheduler is implemented, especially in relation to SchedulerPlugins. It would probably be good to mention the current annotation taxonomy understood by the distributed scheduler (priority, retry, worker, resource).
I've refrained at this point because this isn't yet functional. What do you think?
|
|
||
| alayer = A.__dask_graph__().layers[A.name] | ||
| assert alayer.annotations == annotation | ||
| assert dask.config.get("annotations", None) is None |
There was a problem hiding this comment.
We might want to verify that a layer created outside of the context manager doesn't have annotations
| assert dask.config.get("annotations", None) is None | |
| assert dask.config.get("annotations", None) is None | |
| b = A + 1 | |
| assert not b.__dask_graph__().layers[b.name].annotations |
There was a problem hiding this comment.
Currently, this produces a Blockwise object for the layer (not a BasicLayer)
B = A + 1
assert isinstance(B.__dask_graph__().layers[B.name], Blockwise)The BasicLayer does live in Blockwise's _cached_dict attribute, but this feels like digging into the implementation a bit too far?
There was a problem hiding this comment.
I've added a test_no_annotations test case instead.
There was a problem hiding this comment.
I would hope that all Layer classes support annotations, not just BasicLayer
There was a problem hiding this comment.
Move annotation attachment to Layer.__init__
dask/highlevelgraph.py
Outdated
| annotations = config.get("annotations", None) | ||
| self.annotations = None if annotations is None else annotations.copy() |
There was a problem hiding this comment.
Maybe this should move to a new Layer.__init__ function and all layer subclasses should call super? cc also @madsbk
There was a problem hiding this comment.
Also, thoughts on this? It's a purely aesthetic change and not very important.
| annotations = config.get("annotations", None) | |
| self.annotations = None if annotations is None else annotations.copy() | |
| self.annotations = copy.copy(config.get("annotations", None)) |
There was a problem hiding this comment.
Maybe this should move to a new
Layer.__init__function and all layer subclasses should callsuper? cc also @madsbk
That is fine with me.
mrocklin
left a comment
There was a problem hiding this comment.
This looks good to me. I made a couple of tiny suggestions on the docstring, but everything here seems both simple and powerful to me.
| -------- | ||
|
|
||
| All tasks within array A should have priority 100 and be retried 3 times | ||
| on failure. |
There was a problem hiding this comment.
| on failure. | |
| on failure. | |
dask/base.py
Outdated
| All tasks within array A should have priority 100 and be retried 3 times | ||
| on failure. | ||
| >>> with dask.annotate(priority=100, retries=3): | ||
| >>> A = da.ones((10000, 10000)) |
There was a problem hiding this comment.
| >>> A = da.ones((10000, 10000)) | |
| ... A = da.ones((10000, 10000)) |
dask/base.py
Outdated
|
|
||
| >>> nblocks = (10, 10) | ||
| >>> with dask.annotate(priority=lambda k: k[1]*nblocks[1] + k[2]): | ||
| >>> A = da.ones((1000, 1000), chunks=(100, 100)) |
There was a problem hiding this comment.
| >>> A = da.ones((1000, 1000), chunks=(100, 100)) | |
| ... A = da.ones((1000, 1000), chunks=(100, 100)) |
Thanks for the review @mrocklin . I've fixed the docstrings |
|
Just a by-the-by, if you use IPython, the In [1]: %doctest_mode
Exception reporting mode: Plain
Doctest mode is: ON
>>> for i in range(5):
... print(i)
...
0
1
2
3
4
>>> %doctest_mode
Exception reporting mode: Context
Doctest mode is: OFF
In [4]: |
|
Ooh, that is nice. |
|
This looks great to me. Merging. Thanks for your continued work on this @sjperkins ! It's nice to get something in after all of that effort :) For my part, I'm glad that we were able to do this this cleanly :) |
black dask/flake8 dask