cephfs_mirror: update peer status for invalid metadata in remote snapshot#56698
cephfs_mirror: update peer status for invalid metadata in remote snapshot#56698
Conversation
e8ef142 to
8de5970
Compare
There was a problem hiding this comment.
It concerns me that we have failure_reason and failed as separate fields set at different sites and conditions because it creates opportunities for inconsistent state.
Would it be possible to design an interface for updating these fields in unison? Or even, have an std::optional<string> failure_reason where we could then say bool failed() { return failure_reason.has_value(); }?
Maybe we could deduce the info from the error code - another reason to make sure that we return meaningful codes. If that were the case, we could save space because the string could be constructed from the error code at the recipient side.
If we have to share an error code for different internal reasons, we could still do something about it, like passing around a pair of the rc and the error string, or the rc and the detail code representing some error string. In either case, we could then treat the pair as an atomic result and status that would be passed into the _inc_failed_count function to unambiguously encode the state into the PeerReplayer object
UPD:
Looking into this further, I realize that failed could be false while some failures could have been seen, as long as they are fewer than max_failures. That explains how we could have the fields out of sync, so failure_reason.has_value() is not a valid suggestion.
However, there is still a point about having the int error code stored instead of a formatted string, and a new point about naming: the reason field should probably be prefixed with last_ just like the boost::optional<monotime> last_failed. That would not allow for the misinterpretation of how the fields were related I had above.
I could make it a Regarding the error code: there's no error here, maybe I'll rename the variable |
There's
It should be Note: union won't work. |
…shot Fixes: https://tracker.ceph.com/issues/65317 Signed-off-by: Jos Collin <jcollin@redhat.com>
8de5970 to
f87d84b
Compare
How does that help? The
There's no error, just a sync failure (please check the code, understand the difference). I've updated them to
_inc_failed_count is not specific to 'invalid metadata' errors. The failures are incremented for what ever error comes out of do_sync_snaps(). Also EINVAL is not a unique error code for 'invalid metadata', it's returned for other errors too. The suggested change should make do_sync_snaps() return unique error codes for every error it encounters, so that _inc_failed_count() could set the values. Otherwise, it would display/log |
|
The last edit looks better with the The whole point of my comment was to keep related fields coherent: we update the reason in one place, and the timestamp in another, and we use the error code to control the flow. With the new naming, it's easier to see related fields, that's an improvement. A function to set both the timestamp and the reason together could be another step forward, as would be checking for Having that said, I'm happy to close the thread about the failure reason and how we store it, thanks for discussing it with me. More on the functional side, shouldn't we now update that failure reason in other error flows? Like the failure to fetch the snapinfo, or the failure to find |
|
Closing in favour of #56816 |
Fixes: https://tracker.ceph.com/issues/65317
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e