-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Datasets] Improve user experience of zip() #32375
Copy link
Copy link
Closed
Labels
dataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability
Description
Description
Dataset.zip() has strict requirements for underlying data layout. It requires both datasets to be zipped, have same number of blocks, and same number of rows per block. It also requires the block formats to be same. It would throw exceptions without clear action items: Cannot zip .... Based on several users feedback, the API is not easy to use.
We should either:
- Handle all the blocks alignment and format conversion internally in
zip(). E.g. align the number of blocks, and convert to same block format (e.g. simple/Pandas -> Arrow block). Add example in our documentation forzip. - Improve error message with action items.
Use case
as above.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataRay Data-related issuesRay Data-related issuesenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability