Skip to content

Add HighLevelGraph.validate call to assert_eq in tests#6509

Closed
mrocklin wants to merge 2 commits intodask:masterfrom
mrocklin:hlg-verify
Closed

Add HighLevelGraph.validate call to assert_eq in tests#6509
mrocklin wants to merge 2 commits intodask:masterfrom
mrocklin:hlg-verify

Conversation

@mrocklin
Copy link
Member

This applies a validation function made by @madsbk in #6508 to all of the dask array and dask dataframe test suite. It exposes a few cases where our book keeping around HLG dependencies is incorrect.

@madsbk
Copy link
Contributor

madsbk commented Aug 13, 2020

FYI: I have been debugging using HighLevelGraph.validate() and I have traced must of the bugs to __dask_layers__() not returning layers. For instance, Delayed.__dask_layers__() and Scalar.__dask_layers__() returns low level keys.
I have fixed Scalar but I don't know what we should do with Delayed. See #6507

@mrocklin
Copy link
Member Author

Good to know, and thank you for organizing things as we proceed down this path. HLGs are relatively new in the codebase and not as well worn as other parts. I hope that this PR helps to expose additional errors. My experience is that placing checks into assert_eq functions is a good way to get a lot of coverage throughout the codebase.

@mrocklin
Copy link
Member Author

I'm not sure what RAPIDS folks are up to these days, but this seems like a task taht @rjzamora could probably knock out pretty well.

madsbk added a commit to madsbk/dask that referenced this pull request Aug 18, 2020
When <dask#6509> passes, we can remove
this fix, which introduces a significant overhead
@rjzamora
Copy link
Member

rjzamora commented Aug 18, 2020

It looks like #6507 fixes all failures besides dask/tests/test_delayed.py::test_keys_from_array, which (I think) is due to the Delayed.__dask_layers__() problem described by @madsbk in that PR. If we do want a dask_layers definitiion for Delayed, perhaps it does make sense to to treat it as a "simple" collection?

@mrocklin
Copy link
Member Author

If we do want a dask_layers definitiion for Delayed, perhaps it does make sense to to treat it as a "simple" collection?

My guess is that the smallest delta of a change would be to keep Delayed using HighLevelGraphs. Naively I would expect that using the keyname as the layer name would be fine. I'm curious what it isn't working out well.

It also looks like we didn't wire everything up properly in the svd code, which is not surprising, it's one of the more complicated multi-layered functions we have.

madsbk added a commit to madsbk/dask that referenced this pull request Aug 27, 2020
When <dask#6509> passes, we can remove
this fix, which introduces a significant overhead
madsbk added a commit to madsbk/dask that referenced this pull request Sep 8, 2020
@jakirkham
Copy link
Member

Is this still needed now that PR ( #6588 ) is in?

@mrocklin mrocklin closed this Sep 8, 2020
@mrocklin
Copy link
Member Author

mrocklin commented Sep 8, 2020

Nope, we're good. Thanks for flagging this @jakirkham

@mrocklin mrocklin deleted the hlg-verify branch September 8, 2020 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants