-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is common to "project" (and pick a subset) of columns from a schema (and then also RecordBatch) for processing.
There are many instances of projection
// apply projection
match &self.projection {
Some(columns) => Some(RecordBatch::try_new(
self.schema.clone(),
columns.iter().map(|i| batch.column(*i).clone()).collect(),
)),
None => Some(Ok(batch.clone())),
}
Many (most) instances of projection don't handle metadata leading to bugs like apache/datafusion#1361
Describe the solution you'd like
Add projection functions to Schema and RecordBatch structs in the arrow-rs crate that properly handle metadata.
Proposed signatures:
/// Returns a new schema consisting of only the specified columns
///
/// So if a schema had Fields A, B and C, schema.project([2,1]) would return a new
/// schema with Fields B, and A
///
/// TODO example
fn Schema::project(&self, indices: impl IntoIterator<Item=usize>) -> Result<Schema> {
...
}
/// Returns a new RecordBatch consisting of only the specified columns
///
/// So if a RecordBatch had Columns A, B and C, batch.project([2,1]) would return a new
/// RecordBatch with Columns B, and A
///
/// TODO example
fn RecordBatch::project(&self, indices: impl IntoIterator<Item=usize>) -> Result<Schema> {
...
}
Describe alternatives you've considered
Additional context
@hntd187 added this feature in DataFusion in apache/datafusion#1378
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomers