-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Sometimes it is advantageous to split one large RecordBatch into smaller batches for processing (for example, processing the multiple smaller RecordBatches in parallel)
So instead of 1 RecordBatch with 1M rows, we could have 100 RecordBatches with 10,000 rows each that could be processed in paralle.
@tustvold implemented such a function in https://github.com/apache/arrow-datafusion/pull/379/files
fn split_batch(sorted: &RecordBatch, batch_size: usize) -> Vec<RecordBatch> {
Describe the solution you'd like
Port the split_batch function into RecordBatch::split(batch_size) or something similar and add appropriate tests
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomers