Bug #73799
openRGWBucketFullSyncCR can indefinitely stuck at syncing objects that don't exist anymore
0%
Description
TLDR¶
This is a multisite issue that pops up in the presence of certain type of workflows. When a large bucket (with thousands of objects) is being populated and soon after all of its objects and the bucket is deleted, RGWBucketFullSyncCR can stuck in an infinite loop trying to get the same set of objects (which have already been deleted) over and over again.
What follows is more details.
Related Codepath¶
The impacted while-loop: https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4688
int RGWBucketFullSyncCR::operate(const DoutPrefixProvider *dpp)
{
...
do {
...
yield call(new RGWListRemoteBucketCR(sc, bs, list_marker, &list_result));
...
entries_iter = list_result.entries.begin();
for (; entries_iter != list_result.entries.end(); ++entries_iter) {
...
}
...
} while (list_result.is_truncated && sync_result == 0);
}
Root-Cause¶
When the following conditions met for the source bucket:- in the target zone, multisite falls back to using bucket-full-sync (since incremental sync may not be possible due to missing bilog, etc.) and
- it's a large bucket; i.e., RGWListRemoteBucketCR provides paginated/truncated listing and
- when the for-loop is still in progress, the source bucket is deleted.
- RGWListRemoteBucketCR is provided with the class variable `list_result` which may have stale state from the previous iteration.
- RGWListRemoteBucketCR indeed, when listing the remote source bucket, gets 404 (source not found) and doesn't update the given `list_result` so RGWBucketFullSyncCR continues to use the stale `list_result`.
- When `list_result.is_truncated==True` (from the previous iteration), the outer while-loop keeps working on the same list of entries over and over again.
Related Events¶
When a sync of a deleted-bucket falls into this state and debug_rgw is set to 20, we can see the that the same coroutine goes into an infinite loop of- https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4627 -> listing bucket for full sync, and
- https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4657 -> same set of objects are tried to be sync'ed between two listings
These 2 events loop indefinitely until rgw instance is restarted.
Reproducer¶
When we are able to inject an artificial delay in the above for-loop, we can deterministically reproduce the issue. The follow-up draft PR attempting to resolve this issue introduces such a mechanism to have an integration test as a reproducer and to validate the fix. Given such a delayer mechanism
- create a bucket
- start a periodic bilog trimming for this bucket so that target zone's ability to incrementally sync the bucket is crippled
- initiate the delayer in the for-loop
- upload a large number of objects to have the pagination when full-sync engages
- wait for the delayer to pause/slow-down the for-loop
- then, delete all the objects and the bucket
- remove the delayer to let bucket full-sync for-loop work at its regular pace
This reproduces the indefinite while-loop. The integration test, without the proposed fix, shows the behaviour defined above.
Implications¶
Although, at the high level, it doesn't cause any functional issues as the source bucket is already deleted -- the sync status reports that replication is lagging which raises an alarm in our case. We rely on "sync status" and "data sync status --shard-id=X" to report replication progress and when this issue occurs, the sync status reports the impacted bucket(s) as the oldest items until the rgw(s) (holding the sync_lock for the impacted shards) are restarted. For example, running the proposed integration test for this issue w/o the fix, sync-status stucks at the same datapoint as we observed in our production systems. For example,
At 2025-11-11T01:50:59 ==> oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
# radosgw-admin sync status
realm 5bbe50b0-ce66-4bde-aa3e-d8ecb9708240 (r)
zonegroup 93a25ca9-d34e-4c01-a889-5bc9aaf835f8 (a)
zone 5d685e9d-219a-42c4-a61a-54ff4a1ab7d6 (a2)
current time 2025-11-11T01:50:59Z
zonegroup features enabled: notification_v2,resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 7db4dd01-6060-4c3d-ad5f-cfe2b32b3337 (a1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [105]
oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
10 shards are recovering
recovering shards: [103,104,106,107,108,109,110,111,112,113]
bucket-full-sync loops indefinitely and sync status shows the same delay until the rgws are restarted: e.g., several mins later
at 2025-11-11T01:58:39Z still => oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
# radosgw-admin sync status
realm 5bbe50b0-ce66-4bde-aa3e-d8ecb9708240 (r)
zonegroup 93a25ca9-d34e-4c01-a889-5bc9aaf835f8 (a)
zone 5d685e9d-219a-42c4-a61a-54ff4a1ab7d6 (a2)
current time 2025-11-11T01:58:39Z
zonegroup features enabled: notification_v2,resharding
disabled: compress-encrypted
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 7db4dd01-6060-4c3d-ad5f-cfe2b32b3337 (a1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [105]
oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
10 shards are recovering
recovering shards: [103,104,106,107,108,109,110,111,112,113]
Below shows another example from our staging system when such a workload runs with multiple buckets impacted. The metrics we gather from sync-status commands show the gap between current-time and oldest-incremental-change-not-applied increasing and likewise for each of the affected datalog shards:
Files
Updated by Oguzhan Ozmen 4 months ago
- Pull request ID set to 66203
[[https://github.com/ceph/ceph/pull/66203]] (RGW/multisite: fix bucket-full-sync infinite loop caused by stale bucket_list_result reuse) is added as a potential fix.
Updated by Oguzhan Ozmen 4 months ago
- Status changed from New to Fix Under Review
Updated by Casey Bodley about 2 months ago
- Priority changed from Normal to High
- Backport set to squid tentacle
Updated by Casey Bodley about 2 months ago
- Blocked by Bug #74526: qa/multisite: ModuleNotFoundError: No module named 'boto.vendored.six.moves' added
Updated by Upkeep Bot 14 days ago
- Status changed from Fix Under Review to Pending Backport
- Merge Commit set to 16c48425c07c472fd10b948a2dcffad271384521
- Fixed In set to v20.3.0-5943-g16c48425c0
- Upkeep Timestamp set to 2026-03-10T16:34:41+00:00
Updated by Upkeep Bot 14 days ago
- Copied to Backport #75438: squid: RGWBucketFullSyncCR can indefinitely stuck at syncing objects that don't exist anymore added
Updated by Upkeep Bot 14 days ago
- Copied to Backport #75439: tentacle: RGWBucketFullSyncCR can indefinitely stuck at syncing objects that don't exist anymore added