-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
good first issueGood for newcomersGood for newcomers
Description
Background
@hntd187 fixed #1361 via #1378 but when I was reviewing the code, I found several other places that project RecordBatchs and Schemas that may also have the same subtle issues about losing the metadata. I am not sure of any bugs related to this yet but I fear they are lurking
The basic idea is to make functions like the following (which handle metadata correctly, following the pattern in #1361 )
fn project_schema(schema: &Schema, projection: &[usize]) -> <Schema> {
...
}
fn project_batch(batch: &RecordBatch, projection: &[usize]) -> Result<RecordBatch> {
...
}And replace the duplicated code like
let projected_schema = match &projection {
Some(columns) => {
let fields: Result<Vec<Field>> = columns
.iter()
.map(|i| {
if *i < schema.fields().len() {
Ok(schema.field(*i).clone())
} else {
Err(DataFusionError::Internal(
"Projection index out of range".to_string(),
))
}
})
.collect();
Arc::new(Schema::new(fields?))
}
None => Arc::clone(&schema),
};And
Some(columns) => Some(RecordBatch::try_new(
self.schema.clone(),
columns.iter().map(|i| batch.column(*i).clone()).collect(),
)),ALl over the datafusion codebase
Additional context
Here is a corresponding arrow ticket to put the logic into arrow-rs: apache/arrow-rs#1014
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers