Improve S3 Retry Strategy for backup and restore#84421
Improve S3 Retry Strategy for backup and restore#84421jkartseva merged 11 commits intoClickHouse:masterfrom
Conversation
pufit
left a comment
There was a problem hiding this comment.
Everything besides the failing test looks good.
| assert response == "OK", response | ||
|
|
||
|
|
||
| class Throttler: |
There was a problem hiding this comment.
Cool, you are using broken_s3 mock :)
There was a problem hiding this comment.
Yes, it helped emulate the 'Slow Down' error.
730cd32 to
cc7db5f
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR improves the S3 retry strategy for backup and restore operations by introducing jitter, fixing a bug where the first retry had no delay, and adding configurable parameters for the retry backoff strategy.
- Introduces jitter to the S3 retry strategy to avoid thundering herd problems
- Fixes a bug where the first retry attempt had no delay
- Adds configurable parameters for initial delay, max delay, and jitter factor in the S3 retry backoff strategy
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/IO/S3/PocoHTTPClient.h |
Defines new RetryStrategy struct with configurable parameters |
src/IO/S3/PocoHTTPClient.cpp |
Adds validation for retry strategy parameters |
src/IO/S3/Client.h |
Updates RetryStrategy class to use new configuration |
src/IO/S3/Client.cpp |
Implements improved retry delay calculation with jitter |
src/Core/Settings.cpp |
Adds new settings for retry configuration |
src/Core/SettingsChangesHistory.cpp |
Records new settings in version history |
src/Backups/BackupIO_S3.cpp |
Uses new retry strategy configuration for backups |
tests/integration/test_backup_restore_s3/test.py |
Updates test structure and adds throttling test |
tests/integration/helpers/s3_mocks/broken_s3.py |
Implements S3 throttling mock functionality |
| "enable_s3_requests_logging": "1", | ||
| } | ||
|
|
||
| node = cluster.instances["node"] |
There was a problem hiding this comment.
The variable node is assigned twice in this function (lines 936 and 946). The second assignment is redundant and should be removed.
| node = cluster.instances["node"] |
https://github.com/ClickHouse/clickhouse-private/issues/19914
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Introduce
backup_restore_s3_retry_initial_backoff_ms,backup_restore_s3_retry_max_backoff_ms,backup_restore_s3_retry_jitter_factorto configure the S3 retry backoff strategy used during backup and restore operations.Documentation entry for user-facing changes