Bug #74148
openPrometheus module experiences connection issues related to cherrypy
0%
Description
/a/teuthology-2025-12-07_20:00:23-rados-main-distro-default-smithi/8644597
2025-12-07T21:26:39.859 INFO:teuthology.orchestra.run.smithi060.stderr:+ jq -e '.status == "success"'
2025-12-07T21:26:39.863 INFO:teuthology.orchestra.run.smithi060.stdout:{"status":"success","data":{"yaml":"global:\n scrape_interval: 10s\n scrape_timeout: 10s\n scrape_protocols:\n - OpenMetricsText1.0.0\n - OpenMetricsText0.0.1\n - PrometheusText1.0.0\n - PrometheusText0.0.4\n evaluation_interval: 10s\n external_labels:\n cluster: e8525cbe-d3b1-11f0-87af-adfe0268badd\nruntime:\n gogc: 75\nalerting:\n alertmanagers:\n - follow_redirects: true\n enable_http2: true\n scheme: http\n timeout: 10s\n api_version: v2\n http_sd_configs:\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.60:8765/sd/prometheus/sd-config?service=alertmanager\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.99:8765/sd/prometheus/sd-config?service=alertmanager\nrule_files:\n- /etc/prometheus/alerting/*\nscrape_configs:\n- job_name: ceph\n honor_labels: true\n honor_timestamps: true\n track_timestamps_staleness: false\n scrape_interval: 10s\n scrape_timeout: 10s\n scrape_protocols:\n - OpenMetricsText1.0.0\n - OpenMetricsText0.0.1\n - PrometheusText1.0.0\n - PrometheusText0.0.4\n always_scrape_classic_histograms: false\n convert_classic_histograms_to_nhcb: false\n metrics_path: /metrics\n scheme: http\n enable_compression: true\n metric_name_validation_scheme: utf8\n metric_name_escaping_scheme: allow-utf-8\n follow_redirects: true\n enable_http2: true\n relabel_configs:\n - source_labels: [__address__]\n separator: ;\n target_label: cluster\n replacement: e8525cbe-d3b1-11f0-87af-adfe0268badd\n action: replace\n - source_labels: [instance]\n separator: ;\n target_label: instance\n replacement: ceph_cluster\n action: replace\n http_sd_configs:\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.60:8765/sd/prometheus/sd-config?service=ceph\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.99:8765/sd/prometheus/sd-config?service=ceph\n- job_name: node-exporter\n honor_labels: true\n honor_timestamps: true\n track_timestamps_staleness: false\n scrape_interval: 10s\n scrape_timeout: 10s\n scrape_protocols:\n - OpenMetricsText1.0.0\n - OpenMetricsText0.0.1\n - PrometheusText1.0.0\n - PrometheusText0.0.4\n always_scrape_classic_histograms: false\n convert_classic_histograms_to_nhcb: false\n metrics_path: /metrics\n scheme: http\n enable_compression: true\n metric_name_validation_scheme: utf8\n metric_name_escaping_scheme: allow-utf-8\n follow_redirects: true\n enable_http2: true\n relabel_configs:\n - source_labels: [__address__]\n separator: ;\n target_label: cluster\n replacement: e8525cbe-d3b1-11f0-87af-adfe0268badd\n action: replace\n http_sd_configs:\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.60:8765/sd/prometheus/sd-config?service=node-exporter\n - follow_redirects: true\n enable_http2: true\n refresh_interval: 1m\n url: http://172.21.15.99:8765/sd/prometheus/sd-config?service=node-exporter\notlp:\n translation_strategy: UnderscoreEscapingWithSuffixes\n"}}true
2025-12-07T21:26:39.863 INFO:teuthology.orchestra.run.smithi060.stderr:+ curl -s http://172.21.15.99:9095/api/v1/alerts
2025-12-07T21:26:39.868 INFO:teuthology.orchestra.run.smithi060.stderr:+ curl -s http://172.21.15.99:9095/api/v1/alerts
2025-12-07T21:26:39.868 INFO:teuthology.orchestra.run.smithi060.stderr:+ jq -e '.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"'
2025-12-07T21:26:40.488 DEBUG:teuthology.orchestra.run:got remote process result: 4
2025-12-07T21:26:40.488 INFO:teuthology.orchestra.run.smithi060.stdout:{"status":"success","data":{"alerts":[{"labels":{"alertname":"CephMgrPrometheusModuleInactive","cluster":"e8525cbe-d3b1-11f0-87af-adfe0268badd","instance":"ceph_cluster","job":"ceph","oid":"1.3.6.1.4.1.50495.1.2.1.6.2","severity":"critical","type":"ceph_default"},"annotations":{"description":"The mgr/prometheus module at ceph_cluster is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'.","summary":"The mgr/prometheus module is not available"},"state":"firing","activeAt":"2025-12-07T21:21:33.245200013Z","value":"0e+00"}]}}
2025-12-07T21:26:40.490 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_ceph_6ce249e0e13e12a74d5c855ed12d6b50671977c9/qa/tasks/cephadm.py", line 1467, in shell
_shell(
File "/home/teuthworker/src/git.ceph.com_ceph_6ce249e0e13e12a74d5c855ed12d6b50671977c9/qa/tasks/cephadm.py", line 41, in _shell
return remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/remote.py", line 575, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi060 with status 4: 'sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:6ce249e0e13e12a74d5c855ed12d6b50671977c9 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid e8525cbe-d3b1-11f0-87af-adfe0268badd -- bash -c \'set -e\nset -x\nceph orch apply node-exporter\nceph orch apply grafana\nceph orch apply alertmanager\nceph orch apply prometheus\nsleep 240\nceph orch ls\nceph orch ps\nceph orch host ls\nMON_DAEMON=$(ceph orch ps --daemon-type mon -f json | jq -r \'"\'"\'last | .daemon_name\'"\'"\')\nGRAFANA_HOST=$(ceph orch ps --daemon-type grafana -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nPROM_HOST=$(ceph orch ps --daemon-type prometheus -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nALERTM_HOST=$(ceph orch ps --daemon-type alertmanager -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nGRAFANA_IP=$(ceph orch host ls -f json | jq -r --arg GRAFANA_HOST "$GRAFANA_HOST" \'"\'"\'.[] | select(.hostname==$GRAFANA_HOST) | .addr\'"\'"\')\nPROM_IP=$(ceph orch host ls -f json | jq -r --arg PROM_HOST "$PROM_HOST" \'"\'"\'.[] | select(.hostname==$PROM_HOST) | .addr\'"\'"\')\nALERTM_IP=$(ceph orch host ls -f json | jq -r --arg ALERTM_HOST "$ALERTM_HOST" \'"\'"\'.[] | select(.hostname==$ALERTM_HOST) | .addr\'"\'"\')\n# check each host node-exporter metrics endpoint is responsive\nALL_HOST_IPS=$(ceph orch host ls -f json | jq -r \'"\'"\'.[] | .addr\'"\'"\')\nfor ip in $ALL_HOST_IPS; do\n curl -s http://${ip}:9100/metric\ndone\n# check grafana endpoints are responsive and database health is okay\ncurl -k -s https://${GRAFANA_IP}:3000/api/health\ncurl -k -s https://${GRAFANA_IP}:3000/api/health | jq -e \'"\'"\'.database == "ok"\'"\'"\'\n# stop mon daemon in order to trigger an alert\nceph orch daemon stop $MON_DAEMON\nsleep 120\n# check prometheus endpoints are responsive and mon down alert is firing\ncurl -s http://${PROM_IP}:9095/api/v1/status/config\ncurl -s http://${PROM_IP}:9095/api/v1/status/config | jq -e \'"\'"\'.status == "success"\'"\'"\'\ncurl -s http://${PROM_IP}:9095/api/v1/alerts\ncurl -s http://${PROM_IP}:9095/api/v1/alerts | jq -e \'"\'"\'.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"\'"\'"\'\n# check alertmanager endpoints are responsive and mon down alert is active\ncurl -s http://${ALERTM_IP}:9093/api/v2/status\ncurl -s http://${ALERTM_IP}:9093/api/v2/alerts\ncurl -s http://${ALERTM_IP}:9093/api/v2/alerts | jq -e \'"\'"\'.[] | select(.labels | .alertname == "CephMonDown") | .status | .state == "active"\'"\'"\'\n# check prometheus metrics endpoint is not empty and make sure we can get metrics\nMETRICS_URL=$(
/a/teuthology-2025-12-07_20:00:23-rados-main-distro-default-smithi/8644597/remote/smithi060/log/e8525cbe-d3b1-11f0-87af-adfe0268badd/ceph-mgr.a.log.gz
2025-12-07T21:26:41.895+0000 7ff8f7017640 0 [prometheus INFO cherrypy.error] [07/Dec/2025:21:26:41] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('::', 9283)) shut down
2025-12-07T21:26:41.895+0000 7ff8f7017640 0 [prometheus INFO cherrypy.error] [07/Dec/2025:21:26:41] ENGINE Bus STOPPED
I suspect this PR: https://github.com/ceph/ceph/pull/65245
Updated by Laura Flores 3 months ago
- Related to Bug #74149: Prometheus module fails when trying to load security configuration JSON added
Updated by Aishwarya Mathuria 3 months ago
/a/yuriw-2025-12-03_15:44:36-rados-wip-yuri5-testing-2025-12-02-1256-distro-default-smithi/8639563
2025-12-03T16:55:03.646 INFO:teuthology.orchestra.run.smithi062.stderr:+ curl -s http://172.21.15.78:9095/api/v1/alerts
2025-12-03T16:55:03.652 INFO:teuthology.orchestra.run.smithi062.stderr:+ curl -s http://172.21.15.78:9095/api/v1/alerts
2025-12-03T16:55:03.652 INFO:teuthology.orchestra.run.smithi062.stderr:+ jq -e '.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"'
2025-12-03T16:55:04.255 DEBUG:teuthology.orchestra.run:got remote process result: 4
2025-12-03T16:55:04.256 INFO:teuthology.orchestra.run.smithi062.stdout:{"status":"success","data":{"alerts":[{"labels":{"alertname":"CephMgrPrometheusModuleInactive","cluster":"1b3bd86a-d067-11f0-87ab-adfe0268badd","instance":"ceph_cluster","job":"ceph","oid":"1.3.6.1.4.1.50495.1.2.1.6.2","severity":"critical","type":"ceph_default"},"annotations":{"description":"The mgr/prometheus module at ceph_cluster is unreachable. This could mean that the module has been disabled or the mgr daemon itself is down. Without the mgr/prometheus module metrics and alerts will no longer function. Open a shell to an admin node or toolbox pod and use 'ceph -s' to to determine whether the mgr is active. If the mgr is not active, restart it, otherwise you can determine module status with 'ceph mgr module ls'. If it is not listed as enabled, enable it with 'ceph mgr module enable prometheus'.","summary":"The mgr/prometheus module is not available"},"state":"firing","activeAt":"2025-12-03T16:51:13.245200013Z","value":"0e+00"}]}}
2025-12-03T16:55:04.257 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_151fc19e8957de33a9ab329f5cd67d0d2eab7212/qa/tasks/cephadm.py", line 1467, in shell
_shell(
File "/home/teuthworker/src/github.com_ceph_ceph-c_151fc19e8957de33a9ab329f5cd67d0d2eab7212/qa/tasks/cephadm.py", line 41, in _shell
return remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/remote.py", line 575, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_258eb6279f4d7fcd4b45c82e521f2a2e799d7f33/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
Updated by Nitzan Mordechai 3 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 66570
Updated by Nitzan Mordechai 3 months ago ยท Edited
- Pull request ID changed from 66570 to 66571
Updated by Nitzan Mordechai 3 months ago
- Related to Backport #74056: tentacle: ceph-mgr memory leak in prometheus module added
- Related to Backport #74057: squid: ceph-mgr memory leak in prometheus module added
Updated by Nitzan Mordechai 3 months ago
i'm not adding new backport trackers since we are using https://tracker.ceph.com/issues/68989 backports - the issue found on main branch, tentacle and squid backports are on hold until that tracker is resolved.
Updated by Laura Flores about 2 months ago
/a/lflores-2026-01-21_20:56:39-rados-main-distro-default-trial/11956
Updated by Nitzan Mordechai about 2 months ago
- Related to Bug #74564: Rocky10 - prometheus not active added
Updated by Laura Flores about 2 months ago
/a/lflores-2026-01-26_23:21:06-rados-wip-yuri12-testing-2026-01-22-2045-distro-default-trial/19097
Updated by Sridhar Seshasayee about 2 months ago
/a/skanta-2026-01-27_05:35:03-rados-wip-bharath1-testing-2026-01-26-1242-distro-default-trial/19767
Updated by Nitzan Mordechai about 2 months ago
/a/yuriw-2026-01-29_18:33:05-rados-wip-yuri2-testing-2026-01-28-1643-tentacle-distro-default-trial/26512
Updated by Nitzan Mordechai about 2 months ago
- Related to deleted (Bug #74564: Rocky10 - prometheus not active)
Updated by Nitzan Mordechai about 2 months ago
- Has duplicate Bug #74564: Rocky10 - prometheus not active added
Updated by Aishwarya Mathuria about 1 month ago
/a/skanta-2026-01-30_23:46:16-rados-wip-bharath7-testing-2026-01-29-2016-distro-default-trial/28574
Updated by Connor Fawcett about 1 month ago
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19866
Updated by Laura Flores about 1 month ago
/a/yuriw-2026-02-03_16:00:06-rados-wip-yuri4-testing-2026-02-02-2122-distro-default-trial/31737
Updated by Aishwarya Mathuria about 1 month ago
/a/skanta-2026-02-07_00:02:26-rados-wip-bharath7-testing-2026-02-06-0906-distro-default-trial/39119
Updated by Laura Flores about 1 month ago
- Has duplicate Bug #74784: rados/cephadm/test_monitoring_stack_basic - failed to jq -e "CephMonDown" added
Updated by Aishwarya Mathuria about 1 month ago
/a/skanta-2026-02-05_03:38:32-rados-wip-bharath2-testing-2026-02-03-0542-distro-default-trial/35658
Updated by Nitzan Mordechai about 1 month ago
- Backport changed from tentacle to tentacle, squid
Updated by Aishwarya Mathuria about 1 month ago
Seen in squid maybe because https://github.com/ceph/ceph/pull/66483 was included in the QA batch by mistake.
https://pulpito.ceph.com/yuriw-2026-02-17_20:43:43-rados-wip-yuri6-testing-2026-02-17-1732-squid-distro-default-trial/53883/
2026-02-17T21:15:44.683 INFO:teuthology.orchestra.run.trial127.stderr:+ jq -e '.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"'
2026-02-17T21:15:44.729 DEBUG:teuthology.orchestra.run:got remote process result: 4
2026-02-17T21:15:44.729 INFO:teuthology.orchestra.run.trial127.stdout:{"status":"success","data":{"alerts":[]}}
2026-02-17T21:15:44.729 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
^^^^^^^^^^^^^^
File "/home/teuthworker/src/github.com_ceph_ceph-c_d855f53b89fdcec760fd9232a5fb55ed4fb111a1/qa/tasks/cephadm.py", line 1492, in shell
_shell(
File "/home/teuthworker/src/github.com_ceph_ceph-c_d855f53b89fdcec760fd9232a5fb55ed4fb111a1/qa/tasks/cephadm.py", line 110, in _shell
return remote.run(
^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/orchestra/remote.py", line 575, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_5f66ecfb34c0370410e78b3ee641753d19da653b/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on trial127 with status 4: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:d855f53b89fdcec760fd9232a5fb55ed4fb111a1 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 6955d983-0c44-11f1-b9a6-d404e6e7d460 -- bash -c \'set -e\nset -x\nceph orch apply node-exporter\nceph orch apply grafana\nceph orch apply alertmanager\nceph orch apply prometheus\nsleep 240\nceph orch ls\nceph orch ps\nceph orch host ls\nMON_DAEMON=$(ceph orch ps --daemon-type mon -f json | jq -r \'"\'"\'last | .daemon_name\'"\'"\')\nGRAFANA_HOST=$(ceph orch ps --daemon-type grafana -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nPROM_HOST=$(ceph orch ps --daemon-type prometheus -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nALERTM_HOST=$(ceph orch ps --daemon-type alertmanager -f json | jq -e \'"\'"\'.[]\'"\'"\' | jq -r \'"\'"\'.hostname\'"\'"\')\nGRAFANA_IP=$(ceph orch host ls -f json | jq -r --arg GRAFANA_HOST "$GRAFANA_HOST" \'"\'"\'.[] | select(.hostname==$GRAFANA_HOST) | .addr\'"\'"\')\nPROM_IP=$(ceph orch host ls -f json | jq -r --arg PROM_HOST "$PROM_HOST" \'"\'"\'.[] | select(.hostname==$PROM_HOST) | .addr\'"\'"\')\nALERTM_IP=$(ceph orch host ls -f json | jq -r --arg ALERTM_HOST "$ALERTM_HOST" \'"\'"\'.[] | select(.hostname==$ALERTM_HOST) | .addr\'"\'"\')\n# check each host node-exporter metrics endpoint is responsive\nALL_HOST_IPS=$(ceph orch host ls -f json | jq -r \'"\'"\'.[] | .addr\'"\'"\')\nfor ip in $ALL_HOST_IPS; do\n curl -s http://${ip}:9100/metric\ndone\n# check grafana endpoints are responsive and database health is okay\ncurl -k -s https://${GRAFANA_IP}:3000/api/health\ncurl -k -s https://${GRAFANA_IP}:3000/api/health | jq -e \'"\'"\'.database == "ok"\'"\'"\'\n# stop mon daemon in order to trigger an alert\nceph orch daemon stop $MON_DAEMON\nsleep 120\n# check prometheus endpoints are responsive and mon down alert is firing\ncurl -s http://${PROM_IP}:9095/api/v1/status/config\ncurl -s http://${PROM_IP}:9095/api/v1/status/config | jq -e \'"\'"\'.status == "success"\'"\'"\'\ncurl -s http://${PROM_IP}:9095/api/v1/alerts\ncurl -s http://${PROM_IP}:9095/api/v1/alerts | jq -e \'"\'"\'.data | .alerts | .[] | select(.labels | .alertname == "CephMonDown") | .state == "firing"\'"\'"\'\n# check alertmanager endpoints are responsive and mon down alert is active\ncurl -s http://${ALERTM_IP}:9093/api/v2/status\ncurl -s http://${ALERTM_IP}:9093/api/v2/alerts\ncurl -s http://${ALERTM_IP}:9093/api/v2/alerts | jq -e \'"\'"\'.[] | select(.labels | .alertname == "CephMonDown") | .status | .state == "active"\'"\'"\'\n\''
2026-02-17T21:15:44.731 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
Updated by Sridhar Seshasayee 26 days ago
/a/skanta-2026-02-22_05:18:48-rados-wip-bharath21-testing-2026-02-20-1039-distro-default-trial/63350
Updated by Upkeep Bot 16 days ago
- Status changed from Fix Under Review to Pending Backport
- Merge Commit set to 7d9f8f3b5f2112299079105c5582c6208348002d
- Fixed In set to v20.3.0-5831-g7d9f8f3b5f
- Upkeep Timestamp set to 2026-03-05T06:50:59+00:00
Updated by Upkeep Bot 16 days ago
- Copied to Backport #75344: tentacle: Prometheus module experiences connection issues related to cherrypy added
Updated by Upkeep Bot 16 days ago
- Copied to Backport #75345: squid: Prometheus module experiences connection issues related to cherrypy added
Updated by Sridhar Seshasayee 8 days ago
Looks like the fix is merged. The following run did not have this fix:
/a/skanta-2026-03-04_23:53:38-rados-wip-bharath1-testing-2026-03-04-1011-distro-default-trial/85634