Skip to content

Fixes for parquet reader v3, _row_number, and iceberg positioned deletes#87220

Merged
al13n321 merged 10 commits intomasterfrom
pqice
Sep 30, 2025
Merged

Fixes for parquet reader v3, _row_number, and iceberg positioned deletes#87220
al13n321 merged 10 commits intomasterfrom
pqice

Conversation

@al13n321
Copy link
Copy Markdown
Member

@al13n321 al13n321 commented Sep 17, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed logical errors in _row_number virtual column and iceberg positioned deletes.


  • Made parquet reader v3 report row numbers (for iceberg positioned deletes and _row_number virtual column).
  • _row_number virtual column was broken in a simple way and untested.
  • IcebergStreamingPositionDeleteTransform expected the input chunks to arrive in order of increasing row number, but parquet readers didn't guarantee this. Made iceberg reader enable input_format_parquet_preserve_order if use_roaring_bitmap_iceberg_positional_deletes is false.
  • To prevent the above from being slow, changed input_format_parquet_preserve_order implementation to not be slow. I previously planned to deprecate and delete it instead, but I guess we can't, since it's used in iceberg.
  • Fixed parquet reader getting stuck on iceberg positioned deletes because of how the thread pool worked.
  • Kind of fixed the progress bar for parquet reader v3. (It now works the same way as parquet reader v0, but that way is half broken.)
  • Added a couple missing increments of profile events in parquet reader v3.

@al13n321 al13n321 requested a review from scanhex12 September 17, 2025 04:02
@clickhouse-gh

This comment was marked as resolved.

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Sep 17, 2025
@scanhex12 scanhex12 self-assigned this Sep 17, 2025
@divanik divanik self-assigned this Sep 18, 2025
@al13n321 al13n321 added this pull request to the merge queue Sep 30, 2025
Merged via the queue into master with commit c0347e6 Sep 30, 2025
123 checks passed
@al13n321 al13n321 deleted the pqice branch September 30, 2025 03:22
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 30, 2025
@robot-clickhouse robot-clickhouse added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 30, 2025
mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Nov 13, 2025
Fixes for parquet reader v3, _row_number, and iceberg positioned deletes
Enmk added a commit to Altinity/ClickHouse that referenced this pull request Nov 17, 2025
Antalya 25.8 Backport of ClickHouse#87220 - Fixes for parquet reader v3, _row_number, and iceberg positioned deletes
/// worth the added complexity.
for (size_t i = 0; i < max_threads; ++i)
{
pool->scheduleOrThrowOnError([this] { threadFunction(); });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduleOrThrow: always throws when the task cannot be enqueued (queue full, pool stopped, etc.).

scheduleOrThrowOnError :only throws on genuine scheduling errors; if the pool is already stopping/finished, it simply skips scheduling without throwing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants