Project

General

Profile

Actions

Bug #53448

closed

cephadm: agent failures double reported by two health checks

Added by Adam King over 4 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v17.0.0-9913-gdc8f3bedbb
Released In:
v17.2.0~198
Upkeep Timestamp:
2025-07-14T15:50:32+00:00

Description

Whe nagents are down they are reported in both the agent down and failed daemon health check.
It's only really necessary to have them in one and it can be confusing since the criteria for agent down is different than failed daemon (not reporting in time vs. systemd status) yet being put in the former automatically puts them in the latter.

Example, almost all the "failed cephadm daemon(s)" reported here are just repeat reports of the agents marked

cluster:
    id:     f148c330-47c9-11ec-9f19-1dfe2cdc6a6d
    health: HEALTH_ERR
            126 Cephadm Agent(s) are not reporting. Hosts may be offline
            Kernel Security Module (SELinux/AppArmor) is inconsistent for 19 hosts
            131 failed cephadm daemon(s)
            failed to probe daemons or devices


Related issues 1 (0 open1 closed)

Related to Orchestrator - Bug #53723: Cephadm agent fails to report and causes a health timeoutResolvedAdam King

Actions
Actions #1

Updated by Adam King over 4 years ago

  • Pull request ID set to 44158
Actions #2

Updated by Laura Flores about 4 years ago

@Adam King would you say that https://tracker.ceph.com/issues/53723 is related to this Tracker?

Actions #3

Updated by Sebastian Wagner about 4 years ago

  • Related to Bug #53723: Cephadm agent fails to report and causes a health timeout added
Actions #4

Updated by Sebastian Wagner about 4 years ago

  • Status changed from In Progress to Resolved
Actions #5

Updated by Laura Flores about 4 years ago

  • Related to deleted (Bug #53723: Cephadm agent fails to report and causes a health timeout)
Actions #6

Updated by Laura Flores about 4 years ago

  • Related to Bug #53723: Cephadm agent fails to report and causes a health timeout added
Actions #7

Updated by Laura Flores about 4 years ago

Accidentally deleted the related issue; ignore.

Actions #8

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to dc8f3bedbb1b946716242ecd888c90430ab3bec6
  • Fixed In set to v17.0.0-9913-gdc8f3bedbb
  • Released In set to v17.2.0~198
  • Upkeep Timestamp set to 2025-07-14T15:50:32+00:00
Actions

Also available in: Atom PDF