rgw/rgw_rados: fix server side-copy orphans tail-objects#58213
rgw/rgw_rados: fix server side-copy orphans tail-objects#58213
Conversation
|
jenkins test make check |
|
@benhanokh I posted #58320 that should fix the obj_ctx issue for copy_object(). Can you review and test it please? |
|
thanks @dang, i tested #58320 in a vstart cluster with: before these changes, the second after deleting both copies and running gc, 'rados ls' shows no orphaned objects 👍 the test_rgw_orphan_list.sh changes in this pr look like a great way to validate the fix. @benhanokh if you're satisfied with Dan's approach, would you mind pulling that commit into this pr so we can get it through qa? |
|
@cbodley, assuming you need the TAIL-TAG in the OBJ-COPY then Dan's approach will work better. In any case Dan's PR needs to add test cases for deleting all copies and making sure no orphans are left behind (can take the test cases from my PR) |
i think this would need further investigation. a hash based on bucket+object name might not do what we want in the case of overwrites:
in that case, the tail objects from bucket/A would get two refs with the same bucket/B tag. but i think the intent is for each ref to have a different tag so that garbage collection is idempotent - if gc crashes in the middle of derefs, we'll restart and deref the same tail objects again. if we use a separate tag for each ref, then the second deref will be ignored (its ref count already went to 0) |
@benhanokh i was thinking that you could take Dan's commit into this pr since we want your test cases and |
done |
|
jenkins test make check arm64 |
|
jenkins test make check |
|
@benhanokh can you please remove your own changes from the branch (ie squash "rollback changes" into the ealier commits) and cherry-pick Dan's commit to preserve his commit title, authorship, and signoff? |
|
in the meantime, i ran this through the rgw suite: https://pulpito.ceph.com/cbodley-2024-07-10_21:34:12-rgw-wip-66286-distro-default-smithi/ in https://qa-proxy.ceph.com/teuthology/cbodley-2024-07-10_21:34:12-rgw-wip-66286-distro-default-smithi/7795944/teuthology.log, test_rgw_orphan_list.sh failed after an osd crashed with earlier in the log there were a bunch of s3cmd processes killed, i'm not sure whether that's expected or related: |
|
@cbodley, Maybe we should call GC after each test to limit the used space? |
sounds good |
Improve display of ref_count in the rados commandline utility New test cases were added to detect behavior after server side copy in the following cases: 1) delete original only 2) delete destination only 3) delete original then delete destination (this will lead to orphaned tail-objects without the changes made in this PR) d) delete destination then delete original (this will lead to orphaned tail-objects without the changes made in this PR) Add call to GC between tests to help control the used disk space since we keep writing huge files of 5GB each Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
Fixes: https://tracker.ceph.com/issues/66286 (Line added by Gabriel) In RadosStore, the source and dest objects in the copy_object() call used to share an obj_ctx. When obj_ctx was removed from the SAL API, they each got their own, but RGWRados::copy_obj() still assumed they shared one. Pass in each one separately, and use the correct one for further calls. Signed-off-by: Daniel Gryniewicz <dang@fprintf.net> Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
jenkins test make check |
|
https://pulpito.ceph.com/cbodley-2024-07-16_17:46:19-rgw-wip-66286-distro-default-smithi/ shows lots of s3test failures because the branch is missing some fixes from main: those should go away with a rebase but test_rgw_orphan_list.sh failed with what looks like ENOSPC again: http://qa-proxy.ceph.com/teuthology/cbodley-2024-07-16_17:46:19-rgw-wip-66286-distro-default-smithi/7804039/teuthology.log |
|
@ivancich regarding your question in https://tracker.ceph.com/issues/66286#note-6, we're trying to get these orphan test cases passing. they keep causing osds to fill up and crash with ENOSPC, despite the addition of can you think of other ways to reduce the storage requirements of this testing? does it need to operate at this scale to provide sufficient regression test coverage? separately, i've noticed a lot of odd errors from the teuthology logs - any idea what's happening here? (this error is repeated ~1000 times) |
|
My machine can't test latest code because it is centos 8. Maybe I can also nreduce the huge-obj size from 5GB down to 1GB? |
is there a reason to use 'huge' objects, other than the fact that they contain at least one tail object? maybe 5MB would suffice there |
|
This is how the original tests were used |
…rom 5100MB -> 51MB Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
|
My machine is still waiting for an upgrade from centos 8 so I was unable to test anything. I hope it will solve the out-of-space problem. |
| obj_refcount oref; | ||
| auto p = bl.cbegin(); | ||
| decode(oref, p); | ||
| for (auto itr = oref.refs.begin(); itr != oref.refs.end(); itr++) { |
|
yay, latest job succeeded with
better still, the total runtime was reduced to 0:26:55 |
|
also scheduled the updated test case against packages from main to verify that it catches the refcount issue:
pending in https://pulpito.ceph.com/cbodley-2024-07-23_16:43:49-rgw:tools-main-distro-default-smithi/ |
that did indeed detect the refcount orphans: |
|
jenkins test api |
|
passed the rgw suite in https://pulpito.ceph.com/cbodley-2024-07-30_11:48:11-rgw-wip-cbodley-testing-distro-default-smithi/ we want this on squid by the next rc (which branches tomorrow) so i'm not blocking this on @ronen-fr's review |
| string s(bl.c_str(), bl.length()); | ||
| cout << s << std::endl; |
There was a problem hiding this comment.
The last std::endl is superfluous and is breaking cephfs tests.
The std::endl did not exist earlier.
Is there any specific requirement to have an EOL here or could we change it to std::flush ?
There was a problem hiding this comment.
sorry! whatever it takes to fix the cephfs test. reverting to cout << s; should be fine
There was a problem hiding this comment.
looks like the rados suite caught this too, see #59025
Server-side copy copies the manifest from SRC->DST after incrementing the ref_count on the SRC tail-objects. The problem was that it uses the task-id as ref_tag and it got lost after the ref increment.When deleting the tail-objects from the COPY it passed in the obj_tag as ref_tag and since it was different from the one used during ref_inc it was ignored and the tail objects were never removed. This PR changes the ref_tag used for ref increment passing an MD5 of the bucket-name and the oid which is recreated when the COPY object is deleted
Fixes: https://tracker.ceph.com/issues/66286
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e