Skip to content

Move read_hdf to Blockwise#7625

Merged
jsignell merged 7 commits intodask:mainfrom
rjzamora:blockwise-io-hdf
Jun 10, 2021
Merged

Move read_hdf to Blockwise#7625
jsignell merged 7 commits intodask:mainfrom
rjzamora:blockwise-io-hdf

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented May 3, 2021

Superceeds #7284 (since that PR is now quite stale)

Follows same approach as #7415 and #7615 to use Blockwise for Dask-Dataframe's read_hdf. A slight refactor of the original logic was required.

@jrbourbeau
Copy link
Member

Thanks @rjzamora! @ian-r-rose could you take a look at this when you get a moment?

lock=None,
mode="a",
):
class HDFFunctionWrapper:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to be able to use a Protocol here. Some day...

assert_eq(df16, out)

# Test getitem optimization
with tmpfile("h5") as fn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment on lines +462 to +465
if division and global_divisions:
global_divisions = global_divisions[:-1] + division
elif division:
global_divisions = division
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, this is tricky. Can we make the divisions global to begin with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is just a refactor, most of the actual logic is just a copy of what already existed (iow, this is not my code, and I haven't thought through it carefully). With that said, I can try to revist this later and simplify the logic a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can appreciate not wanting to mess with that logic too much

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed response here @rjzamora. Would you mind taking care of the merge conflict here and then we can merge this in

@rjzamora
Copy link
Member Author

rjzamora commented Jun 9, 2021

Apologies for the delayed response here @rjzamora. Would you mind taking care of the merge conflict here and then we can merge this in

No worries! Conflict should be resolved. I will also fix any problems if a fresh ci run turns up any failures.

@jsignell
Copy link
Member

I'm going to merge this since it's passing CI :)

@jsignell jsignell merged commit b2bc454 into dask:main Jun 10, 2021
@rjzamora rjzamora deleted the blockwise-io-hdf branch June 10, 2021 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants