Skip to content

RGW | fixed enqueueing the overwritten object for gc#64997

Merged
AliMasarweh merged 1 commit intoceph:mainfrom
AliMasarweh:wip-alimasa-72398
Sep 4, 2025
Merged

RGW | fixed enqueueing the overwritten object for gc#64997
AliMasarweh merged 1 commit intoceph:mainfrom
AliMasarweh:wip-alimasa-72398

Conversation

@AliMasarweh
Copy link
Member

@AliMasarweh AliMasarweh commented Aug 12, 2025

fixed enqueueing the overwritten object for gc in RGWRados::Object::complete_atomic_modification()
https://tracker.ceph.com/issues/72398
https://tracker.ceph.com/issues/72517

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@AliMasarweh AliMasarweh requested a review from a team as a code owner August 12, 2025 14:40
@github-actions github-actions bot added the rgw label Aug 12, 2025
Comment on lines +7255 to +7257
int r = get_state(dpp, &state, &manifest, false, y);
if (r < 0)
return r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? maybe this is why you're still needing to duplicate the get_state() call in complete_atomic_modification()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before my change prepare_atomic_modification was calling for get_state, I just moved it outside of it in the change of prepare_atomic_modification in the change RGW | fix conditional MultiWrite
if I don't call it in here, we would get a segmentation fault

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
Comment on lines +6030 to +6032
int r = get_state(dpp, &state, &manifest, false, y);
if (r < 0)
return r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand why manifest would be null here? i would have assumed that _do_write_meta() initializes that with either version of:

-   r = target->get_state(rctx.dpp, &target->state, &target->manifest, false, rctx.y);
-  if (r < 0)
-    return r;
+  target->manifest = manifest;
+  target->state = state;

is there some code path that's calling complete_atomic_modification() but not _do_write_meta()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is called also by RGWRados::Object::Delete::delete_obj, but the issue is in the flow of _do_write_meta
somewhere in the code manifest is set to point to nullptr (both of the local manifest and target->manifest)
I still don't know where it is set to point to nullptr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I looked at the history of this a bit. Before zipper changed manifests in
88bcacc23ab RGW - Zipper - remove RGWObjectCtx from SAL API
_do_write_meta() explicitly used a different state and manifest (since manifest was inside of state) than the one stored in RGWRados::Object. This was, presumably, the reason for the call to get_state at the start. This means that, by design, complete_atomic_modification was using a different copy of state (and manifest) than _do_write_meta was using. My changes in the above commit preserved that. The "conditional delete fix" commit added a get_state for the state in RGWRados::Object in there, so that check_preconditions could work.

So that's a change from how it was. If we want to use the RGWRados::Object version, we probably should just use that everywhere, rather than having copies at all, but I don't know the issues that might cause.

@dang
Copy link
Contributor

dang commented Aug 27, 2025

jenkins test make check

@dang
Copy link
Contributor

dang commented Aug 27, 2025

jenkins test make check arm64

@AliMasarweh
Copy link
Member Author

Copy link
Contributor

@dang dang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge the workaround, since it's passing, and then do an audit of manifest so find the root cause.

@AliMasarweh
Copy link
Member Author

jenkins test make check arm64

1 similar comment
@AliMasarweh
Copy link
Member Author

jenkins test make check arm64

@AliMasarweh AliMasarweh merged commit 7da2ef3 into ceph:main Sep 4, 2025
13 checks passed
@github-actions
Copy link

github-actions bot commented Sep 4, 2025

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/17465158803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants