Fix flaky test_backup_to_s3_different_credentials#101499
Fix flaky test_backup_to_s3_different_credentials#101499groeneai wants to merge 1 commit intoClickHouse:masterfrom
Conversation
…strict S3ReadRequestsErrors assertion The assertion `assert "S3ReadRequestsErrors" not in events` fails intermittently because it counts all non-throttling errors in GET/HEAD requests, including transient network errors that are retried and succeed. Under ASAN/UBSAN overhead in CI, MinIO operations are slower and transient read errors become more likely during large (10M rows) data transfers. The test's core purpose — verifying native copy fallback with different credentials — is validated by the remaining assertions (S3CopyObject, S3WriteRequestsErrors, S3CreateMultipartUpload) and the data integrity check in check_backup_and_restore. Closes ClickHouse#90662
Pre-PR Validation GateSession: a) Deterministic repro? b) Root cause explained? c) Fix matches root cause? d) Test intent preserved?
e) Demonstrated in both directions? f) Fix is general? |
|
cc @vitlibar — could you review this? Removes the |
|
Workflow [PR], commit [5c21aaa] Summary: ❌
AI ReviewSummaryThis PR makes ClickHouse Rules
Final Verdict
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
CI Fix or Improvement (changelog entry is not required)
Description
Remove overly strict
S3ReadRequestsErrors/DiskS3ReadRequestsErrorsassertions fromtest_backup_to_s3_different_credentials.Root cause:
S3ReadRequestsErrorscounts all non-throttling errors in GET/HEAD requests to S3, including transient network errors that are retried and succeed. Under ASAN/UBSAN overhead in CI, MinIO operations are slower and transient read errors become more likely during the large (10M rows) data transfer. The backup/restore operation itself succeeds — the data integrity check passes.CIDB evidence: Failures concentrate on
amd_asan_ubsan, db disk, old analyzerbuilds (2 master + 20+ PR failures in 30 days excluding March 17-18 infrastructure outage), consistent with sanitizer overhead amplifying transient network errors.Why the assertions are unnecessary: The test's core purpose — verifying S3 native copy fallback with different credentials — is validated by:
assert ("S3CopyObject" in events) == (allow_s3_native_copy == True)— native copy was attemptedassert ("S3WriteRequestsErrors" in events) == (allow_s3_native_copy == True)— native copy failed (different credentials)assert ("S3CreateMultipartUpload" in events) == use_multipart_copy— multipart behaviorcheck_backup_and_restore— backup/restore correctnessThe
S3ReadRequestsErrorsassertion tests S3 transport reliability rather than credentials behavior.Closes #90662