cephadm: Detect stale and then recreate connections#35281
Merged
sebastian-philipp merged 1 commit intoceph:masterfrom Jun 2, 2020
Merged
cephadm: Detect stale and then recreate connections#35281sebastian-philipp merged 1 commit intoceph:masterfrom
sebastian-philipp merged 1 commit intoceph:masterfrom
Conversation
Currently we make and cache connections to nodes during a check_host. If a cached connection is disconnect from the other end the remoto connection object doesn't track this, so further checks to the host fail. I have pushed up a PR[0] to remoto to add a `has_connection` method to their `BaseConnection` class, which we now use in this patch to check to see if the connection is stale. If it is it is then recreated. There is some monkey patching happening so we can add the required `has_connection` to remoto in this patch which we can remove as soon as the other PR have landed and a new version of remoto is released. [0] alfredodeza/remoto#56 Fixes: https://tracker.ceph.com/issues/45627 Fixes: https://tracker.ceph.com/issues/45032 Signed-off-by: Matthew Oliver <moliver@suse.com>
Contributor
|
perfect! Maybe it would make sense to wait, if we get feedback for the upstream PR. Otherwise we don't know if the upstream PR will get merged as it is and the monkey patch will needs to be updated. |
Contributor
Author
|
The remoto PR already has an approval from the main author, so hopefully it'll land soon. |
Contributor
|
it will take some time to get a new remoto package into centos8, and other distributions. so yes. we should monkey patch it anyway. |
Contributor
|
Feels bad to merge this before alfredodeza/remoto#56 is merged. Anyway we need this now. |
sebastian-philipp
approved these changes
Jun 2, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently we make and cache connections to nodes during a check_host.
If a cached connection is disconnected from the other end the remoto
connection object doesn't track this, so further checks to the host
fail.
I have pushed up a PR[0] to remoto to add a
has_connectionmethod totheir
BaseConnectionclass, which we now use in this patch to check tosee if the connection is stale. If it is it is then recreated.
There is some monkey patching happening so we can add the required
has_connectionto remoto in this patch which can be removed as soon asthe other PR has landed and a new version of remoto is released.
[0] alfredodeza/remoto#56
Fixes: https://tracker.ceph.com/issues/45627
Fixes: https://tracker.ceph.com/issues/45032
Signed-off-by: Matthew Oliver moliver@suse.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard backendjenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox