Skip to content

Fix iterator locking#1497

Merged
zvonand merged 2 commits intoantalya-25.8from
bugfix/antalya-25.8/fix_iterator_wrappers
Mar 10, 2026
Merged

Fix iterator locking#1497
zvonand merged 2 commits intoantalya-25.8from
bugfix/antalya-25.8/fix_iterator_wrappers

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix issues found in #1436

Documentation entry for user-facing changes

Add has_concurrent_next method to iterator wrappers.
Add separate iterator thread in query thread group to collect ProfileEvents properly.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 10, 2026

Workflow [PR], commit [55cef3c]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d01eb1575f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

size_t estimatedKeysCount() override { return iterator->estimatedKeysCount(); }
std::optional<UInt64> getSnapshotVersion() const override { return iterator->getSnapshotVersion(); }

bool has_concurrent_next() const override { return iterator->has_concurrent_next(); }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep bucket-splitting iterator serialized

Propagating has_concurrent_next() from the wrapped iterator here allows StorageObjectStorageStableTaskDistributor::getMatchingFileFromIterator to call next() without its mutex, but ObjectIteratorSplitByBuckets::next mutates pending_objects_info via unsynchronized push/pop in IObjectIterator.cpp. When multiple replicas pull tasks concurrently (e.g., distributed reads with cluster_table_function_split_granularity=BUCKET and a concurrent source iterator), this introduces a data race that can corrupt task assignment or crash.

Useful? React with 👍 / 👎.

[this]()
[this, thread_group = DB::CurrentThread::getGroup()]()
{
ThreadGroupSwitcher switcher(thread_group, "IcebergKeysIterator");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a valid ThreadGroupSwitcher thread name length

ThreadGroupSwitcher forwards thread_name to setThreadName, which rejects names longer than 15 bytes; "IcebergKeysIterator" is 19 bytes, so this path deterministically throws and enters the switcher’s exception handler every time. That means the new thread-group setup always takes the error path (with noisy logs and skipped normal switcher teardown logic), so the locking/profile-events fix does not run cleanly.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@ianton-ru
Copy link
Copy Markdown
Author

@codex review, please!

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Copy Markdown
Author

Disclaimer: This review was generated by an AI coding assistant (Cursor, Auto agent). It is not an official human review.


Audit summary for PR #1497 (Fix iterator locking) — both commits (d01eb15 + 55cef3c) reviewed.

Result: no remaining confirmed defects.

  • P1 (bucket iterator data race): Addressed in 55cef3cObjectIteratorSplitByBuckets::next() is now fully protected by a mutex; has_concurrent_next() correctly returns true while remaining thread-safe.
  • P2 (thread name length): Addressed in 55cef3c — thread name shortened from "IcebergKeysIterator" (19 bytes) to "IcebergKeys" (11 bytes), within the 15-byte limit.

Recommendation: Safe to merge from a concurrency and thread-naming perspective. Ensure any future refactors of ObjectIteratorSplitByBuckets::next() keep the mutex held for the entire method body (including the refill loop). Consider adding a TSAN test for cluster reads with cluster_table_function_split_granularity=BUCKET and an Iceberg source.

Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tho it is sad we had to add yet another mutex

@zvonand zvonand merged commit b18559e into antalya-25.8 Mar 10, 2026
209 of 214 checks passed
@alsugiliazova
Copy link
Copy Markdown
Member

PR #1497 CI Verification Report

Summary

Bug fix for issues found in PR #1436 (Improvement locking Iceberg iterator). Three changes:

  1. IcebergIterator.cpp — Add the background iterator thread to the query's thread group via ThreadGroupSwitcher so that ProfileEvents (like IcebergIteratorNextMicroseconds) are properly collected in system.query_log.
  2. IObjectIterator.h — Propagate has_concurrent_next() through the ObjectIteratorWithPathAndFileFilter wrapper (was missing, causing the optimization from Improvement locking Iceberg iterator #1436 to be bypassed when filters were applied). Also mark ObjectIteratorSplitByBuckets as concurrent-safe.
  3. IObjectIterator.cpp — Add std::lock_guard to ObjectIteratorSplitByBuckets::next() to protect pending_objects_info which is not thread-safe when called without the distributor lock.

MasterCI Results

85 success, 1 failure, 2 errors, 3 skipped (91 total)

# Suite Failure Error Related to PR Notes
1 Install packages (amd_release) Run server init.d Server init.d script failure No Infrastructure issue
2 Integration tests (amd_asan, old analyzer, 1/6) CI runner / script error No "Job failed to produce Result due to a script error or CI runner issue"
3 Integration tests (amd_asan, old analyzer, 2/6) CI runner / script error No "Job failed to produce Result due to a script error or CI runner issue"

All MasterCI failures are infrastructure-level issues (server init.d failure, CI runner errors). None produced actual test failures related to code changes. The parent commit (0e0446dd) had similar infrastructure failures on the same branch (Stress test (amd_tsan) - "Cannot start clickhouse-server", Stateless tests (amd_tsan, s3 storage, sequential, 2/2) - "Start ClickHouse Server" failure).

Regression Tests (clickhouse-regression)

Iceberg-Specific Regression Tests

All 4 Iceberg regression suites PASSED:

Suite Platform Status
Iceberg (1) Release Success
Iceberg (2) Release Success
Iceberg (1) Aarch64 Success
Iceberg (2) Aarch64 Success

Conclusion

PR #1497 is verified.

@zvonand
Copy link
Copy Markdown
Collaborator

zvonand commented Mar 12, 2026

Fix issues found in #1436 is NOT a valid changelog entry. Please, next time make a proper entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants