Skip to content

[Datasets] Improve user experience of zip() #32375

@c21

Description

@c21

Description

Dataset.zip() has strict requirements for underlying data layout. It requires both datasets to be zipped, have same number of blocks, and same number of rows per block. It also requires the block formats to be same. It would throw exceptions without clear action items: Cannot zip .... Based on several users feedback, the API is not easy to use.

We should either:

  • Handle all the blocks alignment and format conversion internally in zip(). E.g. align the number of blocks, and convert to same block format (e.g. simple/Pandas -> Arrow block). Add example in our documentation for zip.
  • Improve error message with action items.

Use case

as above.

Metadata

Metadata

Assignees

Labels

dataRay Data-related issuesenhancementRequest for new feature and/or capability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions