Bug #66286
openTail Objects are not removed when both SRC and Dest of a server-side copy are removed
50%
Description
S3 Server-Side copy only copies the head-object and instead of copying the tail-objects it increments the ref_count on the Copy-SRC.
It then copies the manifest from SRC to Dest enabling access to the full object data.
When the SRC object is removed we decrement the ref-count on the tail-objects (correctly), but when the Dest object is removed we pass the wrong ref_tag to cls_refcount_put() so the tail objects are never deleted.
Recreation steps:
0) start with an empty ceph install and no objects in it
1) create bucket1 -> s3cmd mb s3://bucket1
2) choose a source file larger than 4MB (e.g. ceph/build/bin/osdmaptool)
3) put the file in a bucket -> s3cmd put s3://bucket1/osm ceph/build/bin/osdmaptool
4) server-side copy -> s3cmd cp s3://bucket1/osm s3://bucket1/osm_copy
5) list the object in rados -> bin/rados -p default.rgw.buckets.data
output should look something like this with 2 head objects and 3 shared tail-objects
[gbenhano@o09 build]$ bin/rados -p default.rgw.buckets.data ls
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_2
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_1
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_3
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1_osm_copy
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1_osm
6) remove both objects ->
s3cmd del s3://bucket1/osm
s3cmd del s3://bucket1/osm_copy
7) list again and you will see that the tail objects are not removed
gbenhano@o09 build]$ bin/rados -p default.rgw.buckets.data ls
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_2
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_1
6a468ac1-d575-4ff9-9409-8b9d5bec2abd.4182.1__shadow_.oikVkne8F5shYFl84PAS0fvbViMiee4_3
8)force gc to work -> radosgw-admin gc process --include-all --bucket=bucket1
9) list rados again and you will still see the 3 tail-objects
10) listing from rados-admin -> radosgw-admin bucket stats
You will see that usage is set at zero
11) rados df will still report the tail objects
The problem is that when we inc ref_count we pass a "ref_tag" to cls_refcount_ get (op, ref_tag , true), but when we remove the object and call cls_refcount_ put (op, ref_tag , true) we pass in another ref_tag.
The delete pass in the value in the "tag" attribute, but we used another value for the cls_refcount_ get (op, ref_tag , true)
Updated by Casey Bodley almost 2 years ago ยท Edited
there is some test coverage for orphans that i would have expected to catch this issue. specifically # copy multipart objects and delete original:
https://github.com/ceph/ceph/blob/main/qa/workunits/rgw/test_rgw_orphan_list.sh#L354-L355
maybe the problem is that it isn't deleting the destination bucket/objects? cc @J. Eric Ivancich
I added two new test cases deleting both the destination and the original buckets (in different order), both tests lead to reported orphans
Updated by Gabriel BenHanokh over 1 year ago
- Status changed from New to Fix Under Review
- Assignee set to Gabriel BenHanokh
- % Done changed from 0 to 50
- Pull request ID set to 58213
Updated by Casey Bodley over 1 year ago
- Backport set to reef squid
we believe this is a regression from https://github.com/ceph/ceph/pull/44616, which was introduced for reef. tagging for backports
Updated by J. Eric Ivancich over 1 year ago
Casey Bodley wrote in #note-1:
there is some test coverage for orphans that i would have expected to catch this issue. specifically
# copy multipart objects and delete original:https://github.com/ceph/ceph/blob/main/qa/workunits/rgw/test_rgw_orphan_list.sh#L354-L355
maybe the problem is that it isn't deleting the destination bucket/objects? cc @J. Eric Ivancich
I added two new test cases deleting both the destination and the original buckets (in different order), both tests lead to reported orphans
I've looked things over and it sounds like we have a test that catches this and a good theory as to when the regression was introduced. Anything I should be doing, @Casey Bodley ?
Updated by Casey Bodley over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Assignee changed from Gabriel BenHanokh to Casey Bodley
Updated by Casey Bodley over 1 year ago
- Copied to Backport #67268: reef: Tail Objects are not removed when both SRC and Dest of a server-side copy are removed added
Updated by Casey Bodley over 1 year ago
- Copied to Backport #67269: squid: Tail Objects are not removed when both SRC and Dest of a server-side copy are removed added
Updated by Casey Bodley over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Casey Bodley about 1 year ago
- Has duplicate Bug #69457: RGW: CopyObject cannot delete objects in the pool added
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 42ecd4c8d767bd66a8ea331ddb0089de19e0ef95
- Fixed In set to v19.3.0-3853-g42ecd4c8d76
- Upkeep Timestamp set to 2025-07-08T22:38:21+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3853-g42ecd4c8d76 to v19.3.0-3853-g42ecd4c8d767
- Upkeep Timestamp changed from 2025-07-08T22:38:21+00:00 to 2025-07-14T15:46:53+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3853-g42ecd4c8d767 to v19.3.0-3853-g42ecd4c8d7
- Upkeep Timestamp changed from 2025-07-14T15:46:53+00:00 to 2025-07-14T21:37:56+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2345
- Upkeep Timestamp changed from 2025-07-14T21:37:56+00:00 to 2025-11-01T01:03:22+00:00