-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability
Milestone
Description
Describe your feature request
For my use case, I'd like to be able to read / shuffle large windows of my dataset at a time, to feed data into my multi-node training job.
Constraints:
- Data >> available working memory/disk
Ideally:
- I can specify the amount of data to read into memory at a time, since rows can have varying sizes.
- I can pipeline the reading/shuffling/ingest so that I don't waste GPU resources. Sliding windows might be nice here.
Currently, there's a way to do basic windowing (with some workarounds/stability issues), but the ideal requests are not easy to implement.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability