Skip to content

mgr/cephadm: still check agent deps if it is marked down#44489

Merged
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:agent-down-alerts
Jan 18, 2022
Merged

mgr/cephadm: still check agent deps if it is marked down#44489
sebastian-philipp merged 1 commit intoceph:masterfrom
adk3798:agent-down-alerts

Conversation

@adk3798
Copy link
Contributor

@adk3798 adk3798 commented Jan 6, 2022

Fixes: https://tracker.ceph.com/issues/53723

Signed-off-by: Adam King adking@redhat.com

Additionally, it's important to make sure the agent actually successfully received the new
config before updating the deps so that has been moved to the thread that sends the config to the agent

Most of the line changes here are just un-indenting a bunch of code that was formerly part of an else block.

Thinking this will fix the teuthology issues in the linked tracker. Will have to make sure that cephadm/workunits/{agent/on mon_election/connectivity task/test_nfs} and cephadm/workunits/{agent/on mon_election/connectivity task/test_orch_cli} don't fail with wait until healthy timeouts on whatever qa runs this is put through.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@adk3798 adk3798 requested a review from a team as a code owner January 6, 2022 22:49
@sebastian-philipp
Copy link
Contributor

Could you add a more meaningful description to your commit message?

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message is empty

Right now if an agent is down, the way _check_agent works
if will return without ever going on to check the deps or
scheduled actions for that agent. This causes a few issues.
For one, if an agent is marked down and then a mgr failover
happens, even if reconfiguring the agent would put it in a working
state (e.g. changing the target ip if the active mgr has moved)
we never try it because _check_agent just returns as soon as it
sees the agent is down. Additionally, if someone purposely tried
to schedule a redeploy of a down agent for whatever reason, we
would never make good on this action.

This change allows us to still carry out the normal checks/
scheduled actions even on down agents

Fixes: https://tracker.ceph.com/issues/53723

Signed-off-by: Adam King <adking@redhat.com>
@adk3798 adk3798 force-pushed the agent-down-alerts branch from b64704c to 09a593c Compare January 7, 2022 15:31
@ljflores
Copy link
Member

jenkins test api

@ljflores
Copy link
Member

jenkins test dashboard cephadm

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jan 17, 2022
@sebastian-philipp
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants