Skip to content

Fix Invalid offset in sparse column chunk data error for multiple predicates#9509

Merged
alamb merged 1 commit intoapache:mainfrom
pydantic:fix_multi_predicate_selection_policy
Mar 11, 2026
Merged

Fix Invalid offset in sparse column chunk data error for multiple predicates#9509
alamb merged 1 commit intoapache:mainfrom
pydantic:fix_multi_predicate_selection_policy

Conversation

@cetra3
Copy link
Copy Markdown
Contributor

@cetra3 cetra3 commented Mar 5, 2026

Which issue does this PR close?

Raised an issue at #9516 for this one

Same issue as #9239 but extended to another scenario

Rationale for this change

When there are multiple predicates being evaluated, we need to reset the row selection policy before overriding the strategy.

Scenario:

  • Dense initial RowSelection (alternating select/skip) covers all pages → Auto resolves to Mask
  • Predicate 1 evaluates on column A, narrows selection to skip middle pages
  • Predicate 2's column B is fetched sparsely with the narrowed selection (missing middle pages)
  • Without the fix, the override for predicate 2 returns early (policy=Mask, not Auto), so Mask is used and tries to read missing pages → "Invalid offset" error

What changes are included in this PR?

This is a one line change to reset the selection policy in the RowGroupDecoderState::WaitingOnFilterData arm

Are these changes tested?

Yes a new test added that fails currently on main, but as you can see it's a doozy to set up.

Are there any user-facing changes?

Nope

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 5, 2026
@cetra3 cetra3 force-pushed the fix_multi_predicate_selection_policy branch from 39d2986 to 72ca82b Compare March 5, 2026 04:34
@cetra3 cetra3 mentioned this pull request Mar 5, 2026
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! The fix makes sense and the test is well designed and comprehensive. I'm not a committer but 👍🏻 from my end.

/// - Without the fix, the override for predicate 2 returns early (policy=Mask, not Auto),
/// so Mask is used and tries to read missing pages → "Invalid offset" error
#[tokio::test]
async fn test_multi_predicate_mask_policy_carryover() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on main and can confirm that it fails on main!

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @cetra3 and @adriangb

fyi @hhhizzz

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 5, 2026

@cetra3 can you please file a bug ticket to track this issue (it makes the release notes easier)

Also do you think we should contemplate backporting this to the previous stable branch (57 I think) and making a patch set?

@cetra3
Copy link
Copy Markdown
Contributor Author

cetra3 commented Mar 5, 2026

It does make sense to possibly backport this change, given it's a one liner. We have done it for ourselves internally and seen the issue go away

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@adriangb
Copy link
Copy Markdown
Contributor

Thanks all. Ping to a committer (@alamb or @etseidl) for merge since neither @cetra3 nor I are committers.

@alamb alamb merged commit b3e047f into apache:main Mar 11, 2026
17 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 11, 2026

🚀

BTW I plan to pull together a 58.1.0 release next week

friendlymatthew pushed a commit to pydantic/arrow-rs that referenced this pull request Mar 21, 2026
…dicates (apache#9509)

# Which issue does this PR close?

Raised an issue at apache#9516 for
this one

Same issue as apache#9239 but
extended to another scenario

# Rationale for this change

When there are multiple predicates being evaluated, we need to reset the
row selection policy before overriding the strategy.

Scenario:
- Dense initial RowSelection (alternating select/skip) covers all pages
→ Auto resolves to Mask
- Predicate 1 evaluates on column A, narrows selection to skip middle
pages
- Predicate 2's column B is fetched sparsely with the narrowed selection
(missing middle pages)
- Without the fix, the override for predicate 2 returns early
(policy=Mask, not Auto), so Mask is used and tries to read missing pages
→ "Invalid offset" error

# What changes are included in this PR?

This is a one line change to reset the selection policy in the
`RowGroupDecoderState::WaitingOnFilterData` arm

# Are these changes tested?

Yes a new test added that fails currently on `main`, but as you can see
it's a doozy to set up.

# Are there any user-facing changes?

Nope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants