rgw/logging: fix source bucket cleanup process#66110
Conversation
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
if a target log bucket is deleted, and then recreated with the same name, the attribute holding the sources will not exist. this is fixing it, but updating the list anytime we try to flush or add a record. if the source is already in the list, there will be no update to the attribute. Fixes: https://tracker.ceph.com/issues/73675 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
| ldpp_dout(dpp, 5) << "WARNING: failed to update logging sources attribute '" << RGW_ATTR_BUCKET_LOGGING_SOURCES | ||
| << "' in logging bucket '" << target_bucket_id << "'. error: " << ret << dendl; | ||
| } | ||
|
|
There was a problem hiding this comment.
Is the function is called multiple times to add the same source bucket (every time a record is written), setting debug_rgw=20 could flood the logs because of :
ldpp_dout(dpp, 20) << "INFO: logging source '" << src_bucket_id << "' already " <<
(add ? "added to" : "removed from") << " bucket '" << bucket->get_key() << "'" << dendl;
| } | ||
| ldpp_dout(dpp, 20) << "INFO: successfully updated bucket logging source '" << | ||
| bucket->get_key() << "' during cleanup"<< dendl; | ||
| if (const int ret = target_bucket->commit_logging_object(obj_name, y, dpp, conf->target_prefix, last_committed); ret < 0) { |
There was a problem hiding this comment.
This could end up in data loss if a diff source bucket sharing the same prefix commits the same object just before the logging object name is removed.
There was a problem hiding this comment.
If both bucket1 and bucket2 share the same prefix, the logging object name will be the same, say log-obj-1
- bucket1 is deleted
- bucket1 source bucket cleanup gets temp object name log-obj-1
- bucket2 commits the log object which reads log object name log-obj-1, sets a new name log-obj-2 and commits log-obj-1. Writes the latest log record to log-obj-2.
- bucket1 cleanup commits log-obj-1 leading to the GC issue and eventual data loss.
- bucket1 deletes the logging object name object.
- The next op on bucket2 will not find the logging object name object and create it with a new name. The record written to log-obj-2 will be lost.
There was a problem hiding this comment.
so, how about changing the order?
- read name
- delete name object
- commit
in the scenario above:
- bucket1 is deleted
- bucket1 source bucket cleanup gets temp object name log-obj-1
- bucket2 commits the log object which reads log object name log-obj-1, sets a new name log-obj-2 and commits log-obj-1. Writes the latest log record to log-obj-2.
- bucket 1 will try to delete logging object name object, and would fail (version mismatch), and will not continue to do the commit
in a different race scenario. if bucket2 try to commit, after the logging object name object was deleted. it will see that the object is missing, will create it and put a name in it, then will try to commit the empty temp object.
at the same time, bucket1 will commit the realm temp object, but there should be no data loss.
what do you think?
There was a problem hiding this comment.
i guess that for the 2nd race, we will not commit the old temp object, since we don't know if the -EBUSY was because someone else is comitting the same object
cf64280 to
a6fd1db
Compare
| // since the object holding the name was deleted, we cannot fetch the last commited name from it | ||
| if (const int ret = target_bucket->commit_logging_object(obj_name, y, dpp, conf->target_prefix, nullptr); ret < 0) { | ||
| ldpp_dout(dpp, 5) << "WARNING: could not commit pending logging object of bucket '" << | ||
| bucket->get_key() << "' during source cleanup. ret = " << ret << dendl; |
There was a problem hiding this comment.
Return here? Otherwise last_committed will be set to obj_name even if the commit failed.
There was a problem hiding this comment.
Since most paths return 0 whether or not they succeeded, there doesn't seem to be a way to detect if last_committed is valid.
a6fd1db to
a16b89c
Compare
…eanup * in case of prefix per source this would prevent leaking this object * in case of share prefix, it would prevent data loss when other source buckets will try to commit an already comitted temporary object * when updatign the "last committed" attribute, the object must exist. this is so that commit without rollover (in case of cleanup) won't recreate the deleted object * some refactoring of try-catch code to have less nesting Fixes: https://tracker.ceph.com/issues/73675 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
a16b89c to
494fb20
Compare
|
jenkins test make check arm64 |
|
regression test: https://pulpito.ceph.com/yuvalif-2025-11-11_07:32:55-rgw-wip-yuval-73675-test-distro-default-smithi/
all seen in other test runs. |
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.