Bug #73799: RGWBucketFullSyncCR can indefinitely stuck at syncing objects that don't exist anymore - rgw - Ceph

Actions

Copy link

Bug #73799

open

RGWBucketFullSyncCR can indefinitely stuck at syncing objects that don't exist anymore

Added by Oguzhan Ozmen 4 months ago. Updated 14 days ago.

Status:

Pending Backport

Priority:

High

Assignee:

Oguzhan Ozmen

Target version:

% Done:

Source:

Community (dev)

Backport:

squid tentacle

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

66203

Tags (freeform):

backport_processed

Merge Commit:

16c48425c07c472fd10b948a2dcffad271384521

Fixed In:

v20.3.0-5943-g16c48425c0

Released In:

Upkeep Timestamp:

2026-03-10T16:34:41+00:00

Tags:

rgw multisite

Description

TLDR¶

This is a multisite issue that pops up in the presence of certain type of workflows. When a large bucket (with thousands of objects) is being populated and soon after all of its objects and the bucket is deleted, RGWBucketFullSyncCR can stuck in an infinite loop trying to get the same set of objects (which have already been deleted) over and over again.

What follows is more details.

Related Codepath¶

The impacted while-loop: https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4688

int RGWBucketFullSyncCR::operate(const DoutPrefixProvider *dpp)
{
  ...
  do {
    ...
    yield call(new RGWListRemoteBucketCR(sc, bs, list_marker, &list_result));
    ...
    entries_iter = list_result.entries.begin();
    for (; entries_iter != list_result.entries.end(); ++entries_iter) {
      ...     
    }
    ...
  } while (list_result.is_truncated && sync_result == 0);

}

Root-Cause¶

When the following conditions met for the source bucket:

in the target zone, multisite falls back to using bucket-full-sync (since incremental sync may not be possible due to missing bilog, etc.) and
it's a large bucket; i.e., RGWListRemoteBucketCR provides paginated/truncated listing and
when the for-loop is still in progress, the source bucket is deleted.

When the conditions met, the outer while-loop can fall into an infinite loop because

RGWListRemoteBucketCR is provided with the class variable `list_result` which may have stale state from the previous iteration.
RGWListRemoteBucketCR indeed, when listing the remote source bucket, gets 404 (source not found) and doesn't update the given `list_result` so RGWBucketFullSyncCR continues to use the stale `list_result`.
When `list_result.is_truncated==True` (from the previous iteration), the outer while-loop keeps working on the same list of entries over and over again.

Related Events¶

When a sync of a deleted-bucket falls into this state and debug_rgw is set to 20, we can see the that the same coroutine goes into an infinite loop of

https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4627 -> listing bucket for full sync, and
https://github.com/ceph/ceph/blob/3c5207386c876f54d545aa4c47096caff303036d/src/rgw/driver/rados/rgw_data_sync.cc#L4657 -> same set of objects are tried to be sync'ed between two listings

These 2 events loop indefinitely until rgw instance is restarted.

Reproducer¶

When we are able to inject an artificial delay in the above for-loop, we can deterministically reproduce the issue. The follow-up draft PR attempting to resolve this issue introduces such a mechanism to have an integration test as a reproducer and to validate the fix. Given such a delayer mechanism

create a bucket
start a periodic bilog trimming for this bucket so that target zone's ability to incrementally sync the bucket is crippled
initiate the delayer in the for-loop
upload a large number of objects to have the pagination when full-sync engages
wait for the delayer to pause/slow-down the for-loop
then, delete all the objects and the bucket
remove the delayer to let bucket full-sync for-loop work at its regular pace

This reproduces the indefinite while-loop. The integration test, without the proposed fix, shows the behaviour defined above.

Implications¶

Although, at the high level, it doesn't cause any functional issues as the source bucket is already deleted -- the sync status reports that replication is lagging which raises an alarm in our case. We rely on "sync status" and "data sync status --shard-id=X" to report replication progress and when this issue occurs, the sync status reports the impacted bucket(s) as the oldest items until the rgw(s) (holding the sync_lock for the impacted shards) are restarted. For example, running the proposed integration test for this issue w/o the fix, sync-status stucks at the same datapoint as we observed in our production systems. For example,

At 2025-11-11T01:50:59 ==> oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]

# radosgw-admin sync status 
          realm 5bbe50b0-ce66-4bde-aa3e-d8ecb9708240 (r)
      zonegroup 93a25ca9-d34e-4c01-a889-5bc9aaf835f8 (a)
           zone 5d685e9d-219a-42c4-a61a-54ff4a1ab7d6 (a2)
   current time 2025-11-11T01:50:59Z
zonegroup features enabled: notification_v2,resharding
                   disabled: compress-encrypted
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 7db4dd01-6060-4c3d-ad5f-cfe2b32b3337 (a1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [105]
                        oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
                        10 shards are recovering
                        recovering shards: [103,104,106,107,108,109,110,111,112,113]

bucket-full-sync loops indefinitely and sync status shows the same delay until the rgws are restarted: e.g., several mins later

at 2025-11-11T01:58:39Z still => oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]

# radosgw-admin sync status
          realm 5bbe50b0-ce66-4bde-aa3e-d8ecb9708240 (r)
      zonegroup 93a25ca9-d34e-4c01-a889-5bc9aaf835f8 (a)
           zone 5d685e9d-219a-42c4-a61a-54ff4a1ab7d6 (a2)
   current time 2025-11-11T01:58:39Z
zonegroup features enabled: notification_v2,resharding
                   disabled: compress-encrypted
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 7db4dd01-6060-4c3d-ad5f-cfe2b32b3337 (a1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [105]
                        oldest incremental change not applied: 2025-11-11T01:50:10.630171+0000 [105]
                        10 shards are recovering
                        recovering shards: [103,104,106,107,108,109,110,111,112,113]

Below shows another example from our staging system when such a workload runs with multiple buckets impacted. The metrics we gather from sync-status commands show the gap between current-time and oldest-incremental-change-not-applied increasing and likewise for each of the affected datalog shards:

Files

clipboard-202511111453-zu54z.png (288 KB) clipboard-202511111453-zu54z.png

Oguzhan Ozmen, 11/11/2025 07:53 PM

Related issues 3 (3 open — 0 closed)

Actions

Copy link

Updated by Oguzhan Ozmen 4 months ago

Pull request ID set to 66203

[[https://github.com/ceph/ceph/pull/66203]] (RGW/multisite: fix bucket-full-sync infinite loop caused by stale bucket_list_result reuse) is added as a potential fix.

Actions

Copy link