Skip to content

Conversation

@etseidl
Copy link
Contributor

@etseidl etseidl commented Dec 12, 2024

Which issue does this PR close?

Closes #182.

It's an old issue, so perhaps this change is not wanted, in which case this can be closed.

Rationale for this change

Allows projecting columns by name rather than index.

What changes are included in this PR?

Adds a new method ProjectionMask::columns which takes a list of column names and returns a ProjectionMask.

Are there any user-facing changes?

New API call.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a userful API to me -- thank you @etseidl 🙏

message test_schema {
OPTIONAL INT32 a;
OPTIONAL INT32 b;
OPTIONAL INT32 a;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a nasty thing to do (repeat the name of a field in the parquet file) but it seems to be allowed and your code handles it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not a fan of this behavior, but I think some query engines (spark perhaps) will produce duplicate names when joining tables. Necessary evil I guess.

@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

🚀 -- thanks again @etseidl

@alamb alamb merged commit cbe1765 into apache:main Dec 18, 2024
16 checks passed
CurtHagenlocher pushed a commit to CurtHagenlocher/arrow-rs that referenced this pull request Dec 28, 2024
* add function to create ProjectionMask from column names

* add some more tests
@etseidl etseidl deleted the string_column_projection branch May 28, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

String-based path column projection

2 participants