Skip to content

__dask_layers__() should return layers #6507

Merged
mrocklin merged 1 commit intodask:masterfrom
madsbk:fix__dask_layers__
Aug 13, 2020
Merged

__dask_layers__() should return layers #6507
mrocklin merged 1 commit intodask:masterfrom
madsbk:fix__dask_layers__

Conversation

@madsbk
Copy link
Contributor

@madsbk madsbk commented Aug 12, 2020

This PR makes Scalar.__dask_layers__() return self._name instead of self.key.

Quoting from the docs, the .__dask_layers__() should return the output layers:

While the DataFrame points to the output layers on which it depends directly:
>>> df.__dask_layers__()
{'filter'}

Context

I am working on a higher level cull() function (see #6438) that uses HighLevelGraph.dependencies to determine the dependency relationship between layers. Doing this work I noticed that sometimes HighLevelGraph.dependencies will refer to low level keys and not layers because of __dask_layers__() returning keys instead of names, which means HighLevelGraph.dependencies will have values that refer to non-existing keys.

What to do with Delayed?

Delayed.__dask_layers__() also returns a key. However, Delayed doesn't have a name or anything other than the key so I am not sure what to do. Does it even make sense to define __dask_layers__() when Delayed isn't a collection?

Any suggestions?

@mrocklin
Copy link
Member

Delayed also returns a key. However, Delayed doesn't have a name or anything other than the key so I am not sure what to do. Does it even make sense to define dask_layers() when Delayed isn't a collection?

We can keep delayed as a low-level-graph, or we can treat it as a collection with single-task layers. Either is fine. Delayed is a bit awkward because it's hard to keep things efficient while building these graphs.

@mrocklin
Copy link
Member

This change generally is fine with me (although I haven't looked much into the Scalar behavior). Is there a test that would make sense here?

@madsbk
Copy link
Contributor Author

madsbk commented Aug 13, 2020

This change generally is fine with me (although I haven't looked much into the Scalar behavior). Is there a test that would make sense here?

The test in #6509 will catch this one.

@mrocklin mrocklin merged commit 97c6d8d into dask:master Aug 13, 2020
@madsbk madsbk deleted the fix__dask_layers__ branch August 18, 2020 06:32
kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants