rbd-mirror: clean up stale pool replayers and callouts better#57082
Merged
rbd-mirror: clean up stale pool replayers and callouts better#57082
Conversation
The code in Mirror::update_pool_replayers() responsible for shutting down and removing stale pool replayers kicks in only in case the peer is removed, but not if the peer changes. However, the code responsible for (re)starting pool replayers in the same method _does_ create and start a new pool replayer in that case. As a result, we can end up with nearly identical pool replayers running at the same time, hogging OS resources and confusing instance_id tracking logic and mirror status reporting at the very least. The root cause is that PeerSpec is matched normally (i.e. based on all fields) when it comes to m_pool_replayers, and based only on UUID when it comes to pool_peers. This was missed in commit 5463e1a ("rbd-mirror: extract optional peer mon_host/key values from MON"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Contributor
Author
|
jenkins test windows |
nbalacha
approved these changes
Apr 29, 2024
Contributor
Author
|
If a pool replayer is removed in an error state (e.g. after failing to connect to the remote cluster), its callout should be removed as well. Otherwise, the error would persist causing "daemon health: ERROR" status to be reported even after a new pool replayer is created and started successfully. Fixes: https://tracker.ceph.com/issues/65487 Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
wait_for_replay_complete() doesn't wait for image status to get updated. This didn't matter previously because these tests are run on two different pools and nothing else was following. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Contributor
Author
Had to tighten preceding tests to fix a sporadic failure on a pre-condition assert in the new integration test. |
Contributor
Author
https://pulpito.ceph.com/dis-2024-05-06_11:17:03-rbd-wip-dis-testing-distro-default-smithi/ |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e