Skip to content

rgw/logging: fix source bucket cleanup process#66110

Merged
yuvalif merged 3 commits intoceph:mainfrom
yuvalif:wip-yuval-73675
Nov 11, 2025
Merged

rgw/logging: fix source bucket cleanup process#66110
yuvalif merged 3 commits intoceph:mainfrom
yuvalif:wip-yuval-73675

Conversation

@yuvalif
Copy link
Copy Markdown
Contributor

@yuvalif yuvalif commented Nov 3, 2025

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
if a target log bucket is deleted, and then recreated with the same name,
the attribute holding the sources will not exist.
this is fixing it, but updating the list anytime we try to flush
or add a record.
if the source is already in the list, there will be no update to the attribute.

Fixes: https://tracker.ceph.com/issues/73675

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
ldpp_dout(dpp, 5) << "WARNING: failed to update logging sources attribute '" << RGW_ATTR_BUCKET_LOGGING_SOURCES
<< "' in logging bucket '" << target_bucket_id << "'. error: " << ret << dendl;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the function is called multiple times to add the same source bucket (every time a record is written), setting debug_rgw=20 could flood the logs because of :

ldpp_dout(dpp, 20) << "INFO: logging source '" << src_bucket_id << "' already " <<
      (add ? "added to" : "removed from") << " bucket '" << bucket->get_key() << "'" << dendl;

Comment thread src/rgw/rgw_bucket_logging.cc Outdated
}
ldpp_dout(dpp, 20) << "INFO: successfully updated bucket logging source '" <<
bucket->get_key() << "' during cleanup"<< dendl;
if (const int ret = target_bucket->commit_logging_object(obj_name, y, dpp, conf->target_prefix, last_committed); ret < 0) {
Copy link
Copy Markdown
Contributor

@nbalacha nbalacha Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could end up in data loss if a diff source bucket sharing the same prefix commits the same object just before the logging object name is removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both bucket1 and bucket2 share the same prefix, the logging object name will be the same, say log-obj-1

  1. bucket1 is deleted
  2. bucket1 source bucket cleanup gets temp object name log-obj-1
  3. bucket2 commits the log object which reads log object name log-obj-1, sets a new name log-obj-2 and commits log-obj-1. Writes the latest log record to log-obj-2.
  4. bucket1 cleanup commits log-obj-1 leading to the GC issue and eventual data loss.
  5. bucket1 deletes the logging object name object.
  6. The next op on bucket2 will not find the logging object name object and create it with a new name. The record written to log-obj-2 will be lost.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, how about changing the order?

  • read name
  • delete name object
  • commit

in the scenario above:

  1. bucket1 is deleted
  2. bucket1 source bucket cleanup gets temp object name log-obj-1
  3. bucket2 commits the log object which reads log object name log-obj-1, sets a new name log-obj-2 and commits log-obj-1. Writes the latest log record to log-obj-2.
  4. bucket 1 will try to delete logging object name object, and would fail (version mismatch), and will not continue to do the commit

in a different race scenario. if bucket2 try to commit, after the logging object name object was deleted. it will see that the object is missing, will create it and put a name in it, then will try to commit the empty temp object.
at the same time, bucket1 will commit the realm temp object, but there should be no data loss.

what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess that for the 2nd race, we will not commit the old temp object, since we don't know if the -EBUSY was because someone else is comitting the same object

// since the object holding the name was deleted, we cannot fetch the last commited name from it
if (const int ret = target_bucket->commit_logging_object(obj_name, y, dpp, conf->target_prefix, nullptr); ret < 0) {
ldpp_dout(dpp, 5) << "WARNING: could not commit pending logging object of bucket '" <<
bucket->get_key() << "' during source cleanup. ret = " << ret << dendl;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return here? Otherwise last_committed will be set to obj_name even if the commit failed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most paths return 0 whether or not they succeeded, there doesn't seem to be a way to detect if last_committed is valid.

…eanup

* in case of prefix per source this would prevent leaking this object
* in case of share prefix, it would prevent data loss when other source
buckets will try to commit an already comitted temporary object
* when updatign the "last committed" attribute, the object must exist.
  this is so that commit without rollover (in case of cleanup) won't
  recreate the deleted object
* some refactoring of try-catch code to have less nesting

Fixes: https://tracker.ceph.com/issues/73675

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
@yuvalif
Copy link
Copy Markdown
Contributor Author

yuvalif commented Nov 11, 2025

@yuvalif
Copy link
Copy Markdown
Contributor Author

yuvalif commented Nov 11, 2025

jenkins test make check arm64

@yuvalif
Copy link
Copy Markdown
Contributor Author

yuvalif commented Nov 11, 2025

regression test: https://pulpito.ceph.com/yuvalif-2025-11-11_07:32:55-rgw-wip-yuval-73675-test-distro-default-smithi/
has 19 failures:

  • 9 valgrind issue
  • 2 multisite
  • 4 keystone clone
  • 1 d4n
  • 3 radosgw_admin.py issue

all seen in other test runs.

@yuvalif yuvalif merged commit 28307aa into ceph:main Nov 11, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants