Skip to content

cephadm: Detect stale and then recreate connections#35281

Merged
sebastian-philipp merged 1 commit intoceph:masterfrom
matthewoliver:cephadm_reset_stale_connections
Jun 2, 2020
Merged

cephadm: Detect stale and then recreate connections#35281
sebastian-philipp merged 1 commit intoceph:masterfrom
matthewoliver:cephadm_reset_stale_connections

Conversation

@matthewoliver
Copy link
Contributor

Currently we make and cache connections to nodes during a check_host.
If a cached connection is disconnected from the other end the remoto
connection object doesn't track this, so further checks to the host
fail.

I have pushed up a PR[0] to remoto to add a has_connection method to
their BaseConnection class, which we now use in this patch to check to
see if the connection is stale. If it is it is then recreated.

There is some monkey patching happening so we can add the required
has_connection to remoto in this patch which can be removed as soon as
the other PR has landed and a new version of remoto is released.

[0] alfredodeza/remoto#56
Fixes: https://tracker.ceph.com/issues/45627
Fixes: https://tracker.ceph.com/issues/45032
Signed-off-by: Matthew Oliver moliver@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Currently we make and cache connections to nodes during a check_host.
If a cached connection is disconnect from the other end the remoto
connection object doesn't track this, so further checks to the host
fail.

I have pushed up a PR[0] to remoto to add a `has_connection` method to
their `BaseConnection` class, which we now use in this patch to check to
see if the connection is stale. If it is it is then recreated.

There is some monkey patching happening so we can add the required
`has_connection` to remoto in this patch which we can remove as soon as
the other PR have landed and a new version of remoto is released.

[0] alfredodeza/remoto#56
Fixes: https://tracker.ceph.com/issues/45627
Fixes: https://tracker.ceph.com/issues/45032
Signed-off-by: Matthew Oliver <moliver@suse.com>
@sebastian-philipp
Copy link
Contributor

perfect! Maybe it would make sense to wait, if we get feedback for the upstream PR. Otherwise we don't know if the upstream PR will get merged as it is and the monkey patch will needs to be updated.

@matthewoliver
Copy link
Contributor Author

The remoto PR already has an approval from the main author, so hopefully it'll land soon.
Do we need the monkey patch in place for anyone running old versions of remoto (when a new one with the patch is released).

@sebastian-philipp
Copy link
Contributor

it will take some time to get a new remoto package into centos8, and other distributions. so yes. we should monkey patch it anyway.

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp
Copy link
Contributor

Feels bad to merge this before alfredodeza/remoto#56 is merged. Anyway we need this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants