Conversation
Ended up writing this test, seemed like a good thing to test even though it's currently broken because of dask#8173.
TODO: tests
TODO: tests
| # NOTE: Layer only used by `Item.from_delayed`, to handle Delayed objects created by other collections. | ||
| # e.g.: Item.from_delayed(da.ones(1).to_delayed()[0]) | ||
| # See Delayed.__init__ | ||
| self._layer = layer or key |
There was a problem hiding this comment.
| self._layer = layer or key | |
| if layer: | |
| self._layer = layer | |
| elif isinstance(key, tuple): | |
| self._layer = key[0] | |
| else: | |
| self._layer = key |
| dsk = self.__dask_optimize__(dsk, keys) | ||
| return [Delayed(k, dsk) for k in keys] | ||
| layer = None | ||
| return [Delayed(k, dsk, layer=layer) for k in keys] |
There was a problem hiding this comment.
Please make this coherent with dask.array above
There was a problem hiding this comment.
I actually feel like setting the layer at all here is a bit unnecessary, since the graphs are always low-level anyway, but sure.
| self._length = length | ||
|
|
||
| # NOTE: Layer is used by `to_delayed` in other collections, but not in normal Delayed use | ||
| self._layer = layer or key |
There was a problem hiding this comment.
| self._layer = layer or key | |
| if layer: | |
| self._layer = layer | |
| elif isinstance(key, tuple): | |
| self._layer = key[0] | |
| else: | |
| self._layer = key |
There was a problem hiding this comment.
I intentionally didn't do this because I'd rather be explicit about passing the layer name when it's needed, and not make any assumptions about key structure in Delayed. The self._layer not in dsk.layers check below should catch any misuse. Since __init__ should only be used internally, not directly by users, this seemed reasonable to me.
Co-authored-by: crusaderky <crusaderky@gmail.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
The test_daily_stock_deprecated failure is unrelated to the changes here. xref #8477 which will remove that test altogether.
In dask#8452 I realized that an incorrect pattern had emerged from dask#6510 of including ```python if not isinstance(dsk, HighLevelGraph): dsk = HighLevelGraph.from_collections(id(dsk), dsk, dependencies=()) ``` in optimization functions. Specifically, `id(dsk)` is incorrect as the layer name here. The layer name must match the `.name` of the resulting collection that gets created by `__dask_postpersist__()`, otherwise `__dask_layers__()` on the optimized collection will be wrong. Since `optimize` doesn't know about collections and isn't passed a layer name, the only reasonable thing to do here is to error when given a low-level graph. This is safe to do for Arrays and DataFrames, since their constructors convert any low-level graphs to HLGs. This PR doesn't really fix anything—the code path removed should be unused—but it eliminates a confusing pattern that has already wandered its way into other places dask#8316 (comment).
As discussed in dask#8173, it's necessary for `Delayed` and `bag.Item` objects to be able to keep track of the HLG layer name they're referencing. When constructed with `to_delayed(optimize_graph=False)`, this layer name will not be the same as the key (`array-abcde` vs `('array-abcde', 0)`). Co-authored-by: crusaderky <crusaderky@gmail.com>
As discussed in #8173, it's necessary for
Delayedandbag.Itemobjects to be able to keep track of the HLG layer name they're referencing. When constructed withto_delayed(optimize_graph=False), this layer name will not be the same as the key (array-abcdevs('array-abcde', 0)).to_delayed(optimize_graph=False)objects often have incorrect__dask_layers__#8173pre-commit run --all-files