Bug #73348
closedData corruption when using server-side copy and bypass-gc.
0%
Description
We observed a bug in Ceph where we corrupted around 60k objects in S3. The objects are still getting listed by S3 clients (the metadata seems to be fine) but when you try download the object you get a NoSuchKey error or the download starts but never finishes (missing parts).
After some research we were able to reproduce it (in v17.2.6, v17.2.8 and v19.2.2).
Steps to reproduce it:- Create 2 buckets with the same credential
- Upload a file thats greater than 4MB to the first bucket (I used a 10MB file). This makes sure we get a shadow object in Rados. (I have not tested with multipart uploads)
- Use rclone to server-side copy the file to the second bucket:
rclone copyto -v ceph-remote:test-corruption-source-bucket/myfile_small ceph-remote:test-corruption-destination--bucket/myfile_small - When you inspect the rados location of the file in the destination bucket you will see that it refers to the same location on rados as the file in the source bucket.
sudo radosgw-admin object stat --bucket test-corruption-source-bucket --object myfile_small sudo radosgw-admin object stat --bucket test-corruption-destination-bucket --object myfile_small - Delete the destination bucket but use flag bypass-gc
sudo radosgw-admin bucket rm --bucket=test-corruption-destination-bucket --purge-objects --bypass-gc - Source file gets corrupted because bypass-gc does not check if the files are still referenced somewhere else.
I tested a normal delete and this does not corrupt the source. I also tested without the bypass-gc flag, this also does not corrupt source. In both cases I could see the shadow objects being scheduled for GC:
sudo radosgw-admin gc list --include-all
But once GC has run, the list is empty and the source file is still intact. So it looks like GC does check the reference in other indexes and the bug only happens if GC is being bypassed.
If you need more information please let me know.
Updated by Casey Bodley 6 months ago
tracing through the code for --bypass-gc:
RadosBucket::remove_bypass_gc() -> RGWRados::delete_raw_obj_aio() -> cls_rgw_remove_obj()
cls_rgw_remove_obj() is for head objects, not tail objects. we should be calling cls_refcount_put() instead
Updated by Casey Bodley 6 months ago
- Status changed from New to Fix Under Review
- Assignee set to Casey Bodley
- Backport set to squid tentacle
- Pull request ID set to 65772
Updated by Casey Bodley 6 months ago
- Priority changed from Normal to High
- Backport changed from squid tentacle to reef squid tentacle
Updated by J. Eric Ivancich 5 months ago
- Related to Bug #73138: Ceph bucket object corruption added
Updated by Casey Bodley 5 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 5 months ago
- Merge Commit set to 3e25d8c3d703b138034d1ffa45cf10f9085c2964
- Fixed In set to v20.3.0-3714-g3e25d8c3d7
- Upkeep Timestamp set to 2025-10-21T13:04:02+00:00
Updated by Upkeep Bot 5 months ago
- Copied to Backport #73596: tentacle: Data corruption when using server-side copy and bypass-gc. added
Updated by Upkeep Bot 5 months ago
- Copied to Backport #73597: reef: Data corruption when using server-side copy and bypass-gc. added
Updated by Upkeep Bot 5 months ago
- Copied to Backport #73598: squid: Data corruption when using server-side copy and bypass-gc. added
Updated by J. Eric Ivancich about 2 months ago
- Tags (freeform) changed from backport_processed to backport_processed dataloss
Updated by Konstantin Shalygin about 2 months ago
- Status changed from Pending Backport to Resolved