Skip to content

Remove bad graph logic in BlockwiseIO#6933

Merged
jrbourbeau merged 10 commits intodask:masterfrom
rjzamora:fix-blockwise-io
Dec 11, 2020
Merged

Remove bad graph logic in BlockwiseIO#6933
jrbourbeau merged 10 commits intodask:masterfrom
rjzamora:fix-blockwise-io

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Dec 4, 2020

Fixes incorrect logic being used to build dsk in the BlockwiseIO constructor.

Currently, the number of dimensions is used to determine the number of dummy arguments. However, the number of dummy argument should always be one dummy argument for IO.

Thanks to @ian-r-rose for pointing out this problem in #6931

  • Tests added / passed
  • Passes black dask / flake8 dask

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora! Just for my own understanding this, we only need blockwise_token(0) here because PackedFunctionCall gets a tuple of arguments at run time?

@rjzamora
Copy link
Member Author

rjzamora commented Dec 4, 2020

Thanks @rjzamora! Just for my own understanding this, we only need blockwise_token(0) here because PackedFunctionCall gets a tuple of arguments at run time?

Right - This is my understanding. The number of dummy arguments depends on the number of inputs to the underlying function, and BlockwiseIO should always have a single input (the tuple that you mentioned).

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks. And, so far, ninds has never been >1 since BlockwiseIO was only utilized for DataFrames.

Also, @ian-r-rose is trying out this fix over in #6931

@rjzamora
Copy link
Member Author

rjzamora commented Dec 4, 2020

Also, @ian-r-rose is trying out this fix over in #6931

Sounds good! Note that I locally added this change to that PR and seem to get correct behavior with a series of subsequent blockwise operations:

Screen Shot 2020-12-04 at 1 02 45 PM

@mrocklin
Copy link
Member

mrocklin commented Dec 8, 2020 via email

# TODO: Check that this makes sense
sz = len([i for i in self.output_indices if i not in self.new_axes])
io_key = (self.io_name,) + tuple([k[i + 1] for i in range(sz)])
io_key = (self.io_name,) + tuple(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Awesome, thanks, making progress. This fixes the above issue.

@rjzamora
Copy link
Member Author

rjzamora commented Dec 9, 2020

@ian-r-rose's latest find demonstrates that two or more BlockwiseIO layers cannot be fused into a single layer, because the class is only designed to handle a single io_name and single io_subgraph. We can certainly expand BlockwiseIO to handle a set of "io_dependencies", but I am a bit unsure if we always want to fuse the creation of multiple collections into the same layer like this. @mrocklin - Do you have any intuition on this?

@rjzamora
Copy link
Member Author

@ian-r-rose - The latest change here revises the BlockwiseIO API a bit, but seems to resolve the problem you uncovered. With the latest changes, you will no longer be expected to pass separate io_name and io_subgraph arguments to the constructor. Instead you should pass a single dictionary argument: {<io_name>: <io_subgraph>}

@rjzamora
Copy link
Member Author

@jrbourbeau - It may be best to avoid adding any more "fixes" to this PR. Do you have thoughts on getting this in and opening follow-up PRs if working through #6791 and #6931 exposes other problems?

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge this in and continue updating as needed

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrbourbeau - It may be best to avoid adding any more "fixes" to this PR. Do you have thoughts on getting this in and opening follow-up PRs if working through #6791 and #6931 exposes other problems?

I'm happy to move future work on this into #6931 (there are still some cases we will need to track down, I'll post some minimal examples to that PR). @rjzamora feel free to push commits to that branch, though be aware that I've been rebasing regularly.

# If we are not getting a `dsk` input, there must be a single
# element in `io_deps`
if len(io_deps) != 1:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - The case where we use the BlockwiseIO constructor without an intial dsk definition could probably handle the creation of multiple subgraphs. However, I couldn't think of any reason why we would need to support.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora! I pushed a small commit to make _inject_io_tasks return the updated graph which I think makes things a bit more clear to read -- hope that's okay. Will merge after CI passes

@rjzamora
Copy link
Member Author

Thanks @rjzamora! I pushed a small commit to make _inject_io_tasks return the updated graph which I think makes things a bit more clear to read -- hope that's okay. Will merge after CI passes

Sounds good @jrbourbeau - Thanks!

It seems that the _inject_io_tasks will need a further revision to get #6931 working, but I think it makes sense to work the problem out in that PR (or in a follow-up PR).

@jrbourbeau jrbourbeau merged commit 603d064 into dask:master Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants