Skip to content

Conversation

@tustvold
Copy link
Contributor

@tustvold tustvold commented Jun 6, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

Follow up to #6458. This reworks the mapping logic to avoid needing to do column lookups per batch

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jun 6, 2023

let rows_num = batch.num_rows();
let mapped_batch = mapping.map_batch(batch).unwrap();
let projected = batch.project(&projection).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights the major change, the schema adaptor assumes that the projection it output has been applied to the file_schema batches.

/// to the table schema where possible.
///
/// Returns a [`SchemaMapping`] that can be applied to the output batch
/// along with an ordered list of columns to project from the file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordered is important as parquet::ProjectionMask is not order preserving

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold -- this looks like a nice cleanup to me

.zip(&self.field_mappings)
.map(|(field, file_idx)| match file_idx {
Some(batch_idx) => cast(&batch_cols[*batch_idx], field.data_type()),
None => Ok(new_null_array(field.data_type(), batch_rows)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tustvold tustvold merged commit 8f7f76d into apache:main Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants