Project

General

Profile

Actions

Bug #72011

closed

mgr/prometheus module at ceph_cluster is unreachable

Added by Shraddha Agrawal 9 months ago. Updated 16 days ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v20.3.0-1609-ge1c6a7d524
Released In:
Upkeep Timestamp:
2025-07-15T13:36:57+00:00

Description

/a/skanta-2025-07-04_23:32:34-rados-wip-bharath13-t[…]2025-07-04-0559-distro-default-smithi/8370622

Error in teuthology.log:

2025-07-05T03:25:49.688 INFO:teuthology.orchestra.run.smithi062.stderr:+ curl -s http://172.21.15.142:9095/api/v1/alerts
2025-07-05T03:25:49.692 INFO:teuthology.orchestra.run.smithi062.stderr:+ curl -s http://172.21.15.142:9095/api/v1/alerts
2025-07-05T03:25:49.692 INFO:teuthology.orchestra.run.smithi062.stderr:+ jq -e '.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"'
2025-07-05T03:25:50.273 DEBUG:teuthology.orchestra.run:got remote process result: 4
2025-07-05T03:25:50.274 INFO:teuthology.orchestra.run.smithi062.stdout:{"status":"success","data":{"alerts":[{"labels":{"alertname":"CephMgrPrometheusModuleInactive","cluster":"fa9941a0-594d-11f0-8720-adfe0268badd","instance":"ceph_cluster","job":"ceph","oid":"1.3.6.1.4.1.50495.1.2.1.6.2","severity":"critical","type":"ceph_default"},"annotations":{"description":"The mgr/prometheus module at ceph_cluster is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'.","summary":"The mgr/prometheus module is not available"},"state":"firing","activeAt":"2025-07-05T03:22:03.245200013Z","value":"0e+00"}]}}
2025-07-05T03:25:50.275 ERROR:teuthology.run_tasks:Saw exception from tasks.

This error was also observed in this nightly run: https://pulpito.ceph.com/teuthology-2025-07-06_20:00:21-rados-main-distro-default-smithi/, but not the one before it: https://pulpito.ceph.com/teuthology-2025-06-29_20:00:18-rados-main-distro-default-smithi/. Looking at the PRs that were merged between these two runs, looks like https://github.com/ceph/ceph/pull/61468 might be the origin of the error.


Related issues 1 (0 open1 closed)

Is duplicate of mgr - Bug #72012: Test failure: test_standby (tasks.mgr.test_prometheus.TestPrometheus)ResolvedNizamudeen A

Actions
Actions #1

Updated by Shraddha Agrawal 9 months ago

  • Related to Bug #72012: Test failure: test_standby (tasks.mgr.test_prometheus.TestPrometheus) added
Actions #2

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-07-04_23:32:34-rados-wip-bharath13-testing-2025-07-04-0559-distro-default-smithi/8370622

Actions #3

Updated by Nizamudeen A 9 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Nizamudeen A
  • Pull request ID set to 64385
Actions #4

Updated by Laura Flores 9 months ago

  • Status changed from Fix Under Review to Duplicate
Actions #5

Updated by Laura Flores 9 months ago

  • Related to deleted (Bug #72012: Test failure: test_standby (tasks.mgr.test_prometheus.TestPrometheus))
Actions #6

Updated by Laura Flores 9 months ago

  • Is duplicate of Bug #72012: Test failure: test_standby (tasks.mgr.test_prometheus.TestPrometheus) added
Actions #7

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to e1c6a7d5243069e841a41eee5f23181dade125a7
  • Fixed In set to v20.3.0-1609-ge1c6a7d524
  • Upkeep Timestamp set to 2025-07-15T13:36:57+00:00
Actions #8

Updated by Naveen Naidu 16 days ago

/a/yuriw-2026-03-02_18:34:01-rados-wip-yuri3-testing-2026-03-02-1622-distro-default-trial/76688

Actions

Also available in: Atom PDF