Skip to content

fix(parquet/pqarrow): fix definition levels with non-nullable lists#325

Merged
zeroshade merged 2 commits intoapache:mainfrom
zeroshade:fix-path-builder-lists
Mar 24, 2025
Merged

fix(parquet/pqarrow): fix definition levels with non-nullable lists#325
zeroshade merged 2 commits intoapache:mainfrom
zeroshade:fix-path-builder-lists

Conversation

@zeroshade
Copy link
Copy Markdown
Member

Rationale for this change

Related to apache/arrow#38503, it was found in apache/iceberg-go#357 that the Parquet file being written by pqarrow was incompatible with pyarrow in some testing. After some digging I determined the cause. So we should fix this to finally put #38503 to bed

What changes are included in this PR?

Fixing the computation of definition levels when writing from Arrow data with lists of non-nullable elements. Previously we were basing the nullableInParent on whether it was a list or a map, assuming that the element was always nullable in a list. This, of course, is incorrect and we need to actually check the nullability of the list element to compute the correct definition level. This way we don't produce incongruous levels that are larger than what the max definition level should be.

Are these changes tested?

Yes, I've updated the corresponding test, which wasn't actually testing a non-nullable element until now.

Are there any user-facing changes?

Only fixing writing of files.

Copy link
Copy Markdown
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the kind of thing where we should try to push out a new release ASAP to avoid "bad" files being written?

Is there a way to "fix" a bad file?

@zeroshade
Copy link
Copy Markdown
Member Author

There isn't really a way to fix a "bad" file other than re-writing it unfortunately. It might be worth a patch release to fix this

@zeroshade zeroshade merged commit 4be5561 into apache:main Mar 24, 2025
22 of 23 checks passed
@zeroshade zeroshade deleted the fix-path-builder-lists branch March 27, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants