Skip to content

Remove counts from CalculateNextIterationRangeEndValues that caused problems with --panic-on-warnings#12

Merged
grodowski merged 2 commits intomasterfrom
grodowski/coding-chimp/fix-builder-chunk-order
Jun 5, 2025
Merged

Remove counts from CalculateNextIterationRangeEndValues that caused problems with --panic-on-warnings#12
grodowski merged 2 commits intomasterfrom
grodowski/coding-chimp/fix-builder-chunk-order

Conversation

@grodowski
Copy link
Member

@grodowski grodowski commented May 22, 2025

The split between Temptable and Offset query builders was originally introduced in github#471. Then, both query builders have been changed to calculate actual chunk sizes in github#1500.

The updated implementation of the Offset query introduced a bug, where a missing order by clause in select_osc_chunk can result in end values potentially surpassing the chunk_size. This wasn't detected during our initial testing where the query just returned rows in their original order, but may happen in real-world scenarios in case the db returns data in an undefined order.

An obvious fix would be to just add an order by to the Offset builder subquery, however since both builders use temptables now, it makes more sense to simplify and use only one of them.

Alternatively, the builder could only use the less performant count query variants when --panic-on-warnings is enabled and otherwise use the simpler ones from pull/471. We decided to not follow this path for now, hoping --panic-on-warnings becomes an updated and safer default in the future.

Update: following github#1557 (comment) the builders were rolled back to original and we think we can safely try a more strict PanicOnWarnings with no row count check.

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

The split between Temptable and Offset query builders was originally introduced in github#471. Then, both query builders have been changed to calculate actual chunk sizes in github#1500.

The updated implementation of the Offset query introduced a bug, where a missing `order by` clause in `select_osc_chunk` can result in end values potentially surpassing the chunk_size. This wasn't detected during our initial testing where the query just returned rows in their original order, but may happen in real-world scenarios in case the db returns data in an undefined order.

An obvious fix would be to just add an `order by` to the Offset builder subquery, however since both builders use temptables now, it makes more sense to simplify and use only one of them.

Alternatively, the builder could only use the less performant count query variants when `--panic-on-warnings` is enabled and otherwise use the simpler ones from pull/471. We decided to not follow this path for now, hoping `--panic-on-warnings` becomes an updated and safer default in the future.

Co-authored-by: Bastian Bartmann <bastian.bartmann@shopify.com>
from
%s.%s
where
%s and %s
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order by should be added here if we were to keep the offset variant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we're dropping the expectedRowCount entirely after discovering more issues with the count approach: github#1557 (comment)

@grodowski grodowski requested a review from coding-chimp May 22, 2025 13:51
@grodowski grodowski force-pushed the grodowski/coding-chimp/fix-builder-chunk-order branch 2 times, most recently from ef735d1 to 53b9faf Compare June 4, 2025 14:14
@grodowski grodowski force-pushed the grodowski/coding-chimp/fix-builder-chunk-order branch from 53b9faf to 80568cd Compare June 4, 2025 17:52
- Removed count subqueries from range builders to restore the more performant approach from PR github#471
- Modified panic-on-warnings logic to trigger errors based solely on SQL warnings, not on row count mismatches
- This addresses potential race conditions where row count comparisons could produce false positives due to concurrent table modifications
@grodowski grodowski force-pushed the grodowski/coding-chimp/fix-builder-chunk-order branch from 80568cd to d4157d7 Compare June 4, 2025 17:58
@grodowski grodowski changed the title Fix CalculateNextIterationRangeEndValues order by Remove counts from CalculateNextIterationRangeEndValues that caused problems with --panic-on-warnings Jun 5, 2025
@grodowski grodowski merged commit 3912039 into master Jun 5, 2025
6 checks passed
@grodowski grodowski deleted the grodowski/coding-chimp/fix-builder-chunk-order branch June 5, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants