Project

General

Profile

Actions

Bug #74641

open

Rocky10 - Port not free causes Dashboard module to fail

Added by Nitzan Mordechai about 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:
Tags:

Description

/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23558/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23550/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23562/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23548/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23566/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23552/teuthology.log
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23563/teuthology.log

2026-01-28T18:17:47.892 DEBUG:teuthology.orchestra.run.trial156:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:b62a951ffc48a50e41d23e63ed6b312afb1c1621-rockylinux-10 shell --fsid ace7839e-fc74-11f0-a5ba-d404e6e7d460 -- ceph health --format=json
2026-01-28T18:17:47.977 INFO:teuthology.orchestra.run.trial156.stderr:Inferring config /var/lib/ceph/ace7839e-fc74-11f0-a5ba-d404e6e7d460/mon.trial156/config
2026-01-28T18:17:48.313 INFO:teuthology.orchestra.run.trial156.stdout:
2026-01-28T18:17:48.313 INFO:teuthology.orchestra.run.trial156.stdout:{"status":"HEALTH_ERR","checks":{"MGR_MODULE_ERROR":{"severity":"HEALTH_ERR","summary":{"message":"Module 'dashboard' has failed: Timeout('Port 7150 not free on 10.20.193.156.')","count":1},"muted":false}},"mutes":[]}
2026-01-28T18:17:48.362 INFO:tasks.cephadm:Teardown begin
2026-01-28T18:17:48.363 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_c433f1062990a0488dc29a553589bc609a460691/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_660eda5fa6898da366b8ea2702a37a8ae8b19d19/qa/tasks/cephadm.py", line 2021, in task
    healthy(ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_660eda5fa6898da366b8ea2702a37a8ae8b19d19/qa/tasks/ceph.py", line 1557, in healthy
    manager.wait_until_healthy(timeout=300)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_660eda5fa6898da366b8ea2702a37a8ae8b19d19/qa/tasks/ceph_manager.py", line 3382, in wait_until_healthy
    assert time.time() - start < timeout, \
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: timeout expired in wait_until_healthy

Related issues 1 (1 open0 closed)

Related to Ceph QA - QA Run #74752: wip-rocky10-branch-of-the-day-2026-02-03-1770151121QA TestingActions
Actions #1

Updated by Nitzan Mordechai about 2 months ago

  • Description updated (diff)
Actions #2

Updated by Nizamudeen A about 2 months ago

7150 is not a port dashboard uses. Its a port opened by cephadm for its agent: https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/agent.py#L55

This is another issue that is similar to what I mentioned here: https://tracker.ceph.com/issues/74643#note-2.

cc: @Ernesto Puerta @Adam King

Actions #3

Updated by Laura Flores about 1 month ago

/a/yaarit-2026-02-05_17:05:15-rados:cephadm-wip-rocky10-branch-of-the-day-2026-02-03-1770151121-distro-default-trial/
6 jobs: ['37008', '37012', '36996', '37010', '37004', '36998']

Actions #4

Updated by Laura Flores about 1 month ago

  • Tags set to rocky10
  • Subject changed from Port 7150 not free on .. to Rocky10 - Port not free causes Dashboard module to fail
Actions #5

Updated by Laura Flores about 1 month ago

  • Related to QA Run #74752: wip-rocky10-branch-of-the-day-2026-02-03-1770151121 added
Actions #6

Updated by Laura Flores about 1 month ago

Same issue, slightly different symptom:

/a/yaarit-2026-02-05_17:05:15-rados:cephadm-wip-rocky10-branch-of-the-day-2026-02-03-1770151121-distro-default-trial/36995

2026-02-05T17:23:43.095 INFO:teuthology.orchestra.run.trial069.stderr:tar: Error is not recoverable: exiting now
2026-02-05T17:23:43.096 INFO:tasks.cephadm:Checking cluster log for badness...
2026-02-05T17:23:43.097 DEBUG:teuthology.orchestra.run.trial002:> sudo grep -E '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/2d5d14c6-02b6-11f1-abb8-d404e6e7d460/ceph.log | grep -E CEPHADM_ | grep -E -v '\(MDS_ALL_DOWN\)' | grep -E -v '\(MDS_UP_LESS_THAN_MAX\)' | grep -E -v MON_DOWN | grep -E -v 'mons down' | grep -E -v 'mon down' | grep -E -v 'out of quorum' | grep -E -v CEPHADM_STRAY_DAEMON | grep -E -v CEPHADM_FAILED_DAEMON | grep -E -v CEPHADM_AGENT_DOWN | grep -E -v PG_DEGRADED | head -n 1
2026-02-05T17:23:43.132 INFO:teuthology.orchestra.run.trial002.stdout:2026-02-05T17:17:29.520291+0000 mon.a (mon.0) 903 : cluster [WRN] Health check failed: Failed to place 1 daemon(s) (CEPHADM_DAEMON_PLACE_FAIL)

/a/yaarit-2026-02-05_17:05:15-rados:cephadm-wip-rocky10-branch-of-the-day-2026-02-03-1770151121-distro-default-trial/36995/remote/trial002/log/2d5d14c6-02b6-11f1-abb8-d404e6e7d460/ceph-mon.a.log.gz

2026-02-05T17:22:30.662+0000 7f93f8a116c0 20 mon.a@0(leader).mgrstat health checks:
{
    "CEPHADM_DAEMON_PLACE_FAIL": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Failed to place 1 daemon(s)",
            "count": 1
        },
        "detail": [
            {
                "message": "Failed while placing grafana.a on trial069: dashboard set-grafana-api-ssl-verify failed: Module 'dashboard' has experienced an error and cannot handle commands: Timeout('Port 7150 not free on 10.20.193.2.') retval: -5" 
            }
        ]
    },
    "MGR_MODULE_ERROR": {
        "severity": "HEALTH_ERR",
        "summary": {
            "message": "Module 'dashboard' has failed: Timeout('Port 7150 not free on 10.20.193.2.')",
            "count": 1
        },
        "detail": [
            {
                "message": "Module 'dashboard' has failed: Timeout('Port 7150 not free on 10.20.193.2.')" 
            }
        ]
    }

Actions #7

Updated by Nitzan Mordechai about 1 month ago

/a/yaarit-2026-02-10_02:34:33-rados-wip-rocky10-branch-of-the-day-2026-02-09-1770676549-distro-default-trial/42184

Actions

Also available in: Atom PDF