Bug #74543: Rocky10 - AttributeError in dashboard module - mgr - Ceph

Actions

Copy link

Bug #74543

closed

Rocky10 - AttributeError in dashboard module

Added by Laura Flores about 2 months ago. Updated about 1 month ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Tags:

rocky10

Description

I suspect there is a problem related to the updated python module CLI commands (from https://github.com/ceph/ceph/pull/66467):

/a/lflores-2026-01-23_19:07:45-rados-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/15277/teuthology.log

2026-01-23T19:25:30.106 INFO:teuthology.orchestra.run.trial037.stderr:Inferring config /var/lib/ceph/4e37bd6d-f890-11f0-a001-d404e6e7d460/mon.trial037/config
2026-01-23T19:25:30.332 INFO:journalctl@ceph.mon.trial037.trial037.stdout:Jan 23 19:25:30 trial037 ceph-mon[16423]: pgmap v194: 1 pgs: 1 active+clean; 577 KiB data, 214 MiB used, 5.5 TiB / 5.5 TiB avail
2026-01-23T19:25:30.379 INFO:teuthology.orchestra.run.trial037.stdout:
2026-01-23T19:25:30.379 INFO:teuthology.orchestra.run.trial037.stdout:{"status":"HEALTH_ERR","checks":{"MGR_MODULE_ERROR":{"severity":"HEALTH_ERR","summary":{"message":"Module 'dashboard' has failed: AttributeError(\"'NoneType' object has no attribute 'fileno'\")","count":1},"muted":false}},"mutes":[]}
2026-01-23T19:25:30.523 INFO:tasks.cephadm:Teardown begin
2026-01-23T19:25:30.523 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_f3df5a3a3153828dd8c0051ee630419067f719ee/teuthology/contextutil.py", line 32, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_b62a951ffc48a50e41d23e63ed6b312afb1c1621/qa/tasks/cephadm.py", line 2021, in task
    healthy(ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_b62a951ffc48a50e41d23e63ed6b312afb1c1621/qa/tasks/ceph.py", line 1557, in healthy
    manager.wait_until_healthy(timeout=300)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_b62a951ffc48a50e41d23e63ed6b312afb1c1621/qa/tasks/ceph_manager.py", line 3382, in wait_until_healthy
    assert time.time() - start < timeout, \
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: timeout expired in wait_until_healthy

/a/lflores-2026-01-23_19:07:45-rados-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/15277/remote/trial037/log/4e37bd6d-f890-11f0-a001-d404e6e7d460/ceph-mgr.trial037.bxobyz.log.gz

2026-01-23T19:19:33.710+0000 7fe7e8cd7640  0 [volumes DEBUG cephadm.ssh] Running command: /usr/bin/python3 /var/lib/ceph/4e37bd6d-f890-11f0-a001-d404e6e7d460/cephadm.28c3f35c2df4dce642fc3b20d4655cff605ec3edf9b9faf9d9baeebc4d477ac5 --timeout 895 gather-facts
2026-01-23T19:19:33.789+0000 7fe7eece3640  0 [volumes ERROR root] Failed to run cephadm http server: AttributeError("'NoneType' object has no attribute 'fileno'")
2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.trial037.bxobyz: AttributeError("'NoneType' object has no attribute 'fileno'")
2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 dashboard.serve:
2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/module.py", line 367, in serve
    cherrypy.engine.start()
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
    raise e_info
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: AttributeError("'NoneType' object has no attribute 'fileno'")

It looks like the Dashboard may have not been initialized properly.

Note that this comes from work that is not yet merged to main. This ticket is only meant to track issues related to Rocky10 development.

Related issues 5 (4 open — 1 closed)

Actions

Copy link

Updated by Laura Flores about 2 months ago

Assignee set to Samuel Just

@Samuel Just FYI

Actions

Copy link

Updated by Laura Flores about 2 months ago

Related to QA Run #74540: wip-rocky10-branch-of-the-day-2026-01-23-1769128778 added

Actions

Copy link

Updated by Laura Flores about 2 months ago

Related to Bug #74042: ceph-mgr: modules need independent CLICommand types added

Actions

Copy link

Updated by Yaarit Hatuka about 2 months ago

Blocks Bug #73930: ceph-mgr modules rely on deprecated python subinterpreters added

Actions

Copy link

Updated by Nitzan Mordechai about 2 months ago · Edited

I think there is something wrong in the lab, all cephadm tests are failing, most of the time there are firewall messages:

2026-01-23 19:18:54,699 7f3f705bbe00 INFO Verifying port 0.0.0.0:8765 ...
2026-01-23 19:18:54,699 7f3f705bbe00 INFO Verifying port 0.0.0.0:8443 ...
2026-01-23 19:18:54,834 7f3f705bbe00 DEBUG Non-zero exit code 1 from systemctl reset-failed ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460@mgr.trial037.bxobyz
2026-01-23 19:18:54,834 7f3f705bbe00 DEBUG systemctl: stderr Failed to reset failed state of unit ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460@mgr.trial037.bxobyz.service: Unit ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460@mgr.trial037.bxobyz.service not loaded.
2026-01-23 19:18:54,946 7f3f705bbe00 DEBUG systemctl: stderr Created symlink /etc/systemd/system/ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460.target.wants/ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460@mgr.trial037.bxobyz.service → /etc/systemd/system/ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460@.service.
2026-01-23 19:18:55,107 7f3f705bbe00 QUIET Non-zero exit code 1 from systemctl is-enabled firewalld.service
2026-01-23 19:18:55,107 7f3f705bbe00 QUIET systemctl: stdout disabled
2026-01-23 19:18:55,113 7f3f705bbe00 QUIET Non-zero exit code 3 from systemctl is-active firewalld.service
2026-01-23 19:18:55,113 7f3f705bbe00 QUIET systemctl: stdout inactive
2026-01-23 19:18:55,113 7f3f705bbe00 DEBUG firewalld.service is not enabled
2026-01-23 19:18:55,113 7f3f705bbe00 DEBUG Not possible to enable service <ceph>. firewalld.service is not available
2026-01-23 19:18:55,120 7f3f705bbe00 QUIET Non-zero exit code 1 from systemctl is-enabled firewalld.service
2026-01-23 19:18:55,120 7f3f705bbe00 QUIET systemctl: stdout disabled
2026-01-23 19:18:55,126 7f3f705bbe00 QUIET Non-zero exit code 3 from systemctl is-active firewalld.service
2026-01-23 19:18:55,126 7f3f705bbe00 QUIET systemctl: stdout inactive
2026-01-23 19:18:55,126 7f3f705bbe00 DEBUG firewalld.service is not enabled
2026-01-23 19:18:55,126 7f3f705bbe00 DEBUG Not possible to open ports <[9283, 8765, 8443]>. firewalld.service is not available

and sometimes ssh errors that show; I couldn't find other os test that have the same errors.

Actions

Copy link

Updated by Yaarit Hatuka about 2 months ago

Subject changed from Rocky10 - AttributeError in dashboard module to AttributeError in dashboard module

To clarify, this happens on a centos 9 build, not rocky10:
https://pulpito.ceph.com/lflores-2026-01-23_19:07:45-rados-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/15277/

Actions

Copy link

Updated by Laura Flores about 2 months ago

Description updated (diff)

Actions

Copy link

Updated by Laura Flores about 2 months ago

Description updated (diff)

Actions

Copy link

Updated by Samuel Just about 2 months ago

This is a failure in cherrypy, presumably when it's trying to open a port for the dashboard api endpoint.

I think @Nitzan Mordechai is probably right, failing to open the api firewall port is probably preventing cherrypy from creating the endpoint. This is probably a lab problem.

Actions

Copy link

#10

Updated by Samuel Just about 2 months ago

Assignee deleted (~~Samuel Just~~)

Actions

Copy link

#11

Updated by David Galloway about 2 months ago

firewalld is installed on both CentOS 9 and Rocky 10 FOG images. If the test needs it, the test should enable it.

Actions

Copy link

#12

Updated by Laura Flores about 2 months ago · Edited

Tags deleted (~~rocky10~~)

Actions

Copy link

#13

Updated by Laura Flores about 2 months ago · Edited

I'm still not 100% convinced this is a lab problem, as this issue seems exclusive to the Rocky10 runs we've scheduled.

The issue specifically affects `rados/cephadm` tests.

I found the same test from different runs:
Both have the same test description - rados:cephadm/osds/{0-distro/ubuntu_22.04 0-nvme-loop 1-start 2-ops/rm-zap-flag}

1. failed - https://pulpito.ceph.com/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23558/ (from Nitzan's latest Rocky10 run; failed)
2. passed - https://pulpito.ceph.com/skanta-2026-01-26_08:54:40-rados-wip-bharath4-testing-2026-01-26-1300-distro-default-trial/17654/ (from a main test batch from two days ago, based on the tip of main; passed)

The mgr log in the failed test shows:
/a/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23558/remote/trial156/log/ace7839e-fc74-11f0-a5ba-d404e6e7d460/ceph-mgr.trial156.bgeath.log.gz

2026-01-28T18:11:50.965+0000 7fd0680d66c0  0 [volumes DEBUG cephadm.ssh] Running command: which python3
2026-01-28T18:11:50.969+0000 7fd0680d66c0  0 [volumes DEBUG cephadm.ssh] Running command: /usr/bin/python3 /var/lib/ceph/ace7839e-fc74-11f0-a5ba-d404e6e7d460/cephadm.48cb751085350688b020692af8ee56776d51bbfef4b32972490a0c04b10667b3 --image quay.ceph.io/ceph-ci/ceph@sha256:91cae0a98ff138bf404f0c27e753f19d6d7ad1d9e0e36d77100ce5dd29a11eb3 --timeout 895 ceph-volume --fsid ace7839e-fc74-11f0-a5ba-d404e6e7d460 -- inventory --format=json-pretty --filter-for-batch
2026-01-28T18:11:51.405+0000 7fd18775b6c0 10 mgr tick tick
2026-01-28T18:11:51.405+0000 7fd18775b6c0 20 mgr send_beacon active
2026-01-28T18:11:51.405+0000 7fd18775b6c0 15 mgr send_beacon noting RADOS client for blocklist: libcephsqlite,v2:10.20.193.156:0/914997891
2026-01-28T18:11:51.405+0000 7fd18775b6c0 15 mgr send_beacon noting RADOS client for blocklist: rbd_support,v2:10.20.193.156:0/4102514893
2026-01-28T18:11:51.405+0000 7fd18775b6c0 15 mgr send_beacon noting RADOS client for blocklist: volumes,v2:10.20.193.156:0/1257920429
2026-01-28T18:11:51.405+0000 7fd18775b6c0 10 mgr send_beacon sending beacon as gid 14219
2026-01-28T18:11:51.409+0000 7fd07813a6c0 10 mgr.server tick
2026-01-28T18:11:51.409+0000 7fd07813a6c0  1 mgr.server send_report Not sending PG status to monitor yet, waiting for OSDs
2026-01-28T18:11:51.409+0000 7fd07813a6c0 20 mgr.server adjust_pgs
2026-01-28T18:11:51.409+0000 7fd07813a6c0 10 mgr.server operator() creating_or_unknown 0 max_creating 1024 left 1024
2026-01-28T18:11:51.409+0000 7fd07813a6c0 20 mgr.server operator() misplaced_ratio 0 degraded_ratio 0 inactive_pgs_ratio 0 unknown_pgs_ratio 0; target_max_misplaced_ratio 0.05
2026-01-28T18:11:51.425+0000 7fd18775b6c0  1 -- 10.20.193.156:0/1164199203 --> [v2:10.20.193.156:3300/0,v1:10.20.193.156:6789/0] -- mgrbeacon mgr.trial156.bgeath(ace7839e-fc74-11f0-a5ba-d404e6e7d460,14219, [v2:10.20.193.156:6800/2864471361,v1:10.20.193.156:6801/2864471361], 1) -- 0x55d542d32000 con 0x55d5425b2000
2026-01-28T18:11:51.425+0000 7fd05a86f6c0  0 [volumes ERROR root] Failed to start engine: Timeout('Port 7150 not free on 10.20.193.156.')
2026-01-28T18:11:51.425+0000 7fd05a86f6c0 20 mgr ~Gil Destroying new thread state 0x55d54299ad80
2026-01-28T18:11:51.529+0000 7fd0658b96c0 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.trial156.bgeath: Timeout('Port 7150 not free on 10.20.193.156.')
2026-01-28T18:11:51.529+0000 7fd0658b96c0 -1 dashboard.serve:
2026-01-28T18:11:51.529+0000 7fd0658b96c0 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/module.py", line 367, in serve
    cherrypy.engine.start()
  File "/lib/python3.12/site-packages/cherrypy/process/wspbus.py", line 282, in start
    raise e_info
  File "/lib/python3.12/site-packages/cherrypy/process/wspbus.py", line 267, in start
    self.publish('start')
  File "/lib/python3.12/site-packages/cherrypy/process/wspbus.py", line 247, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: Timeout('Port 7150 not free on 10.20.193.156.')

Given `volumes ERROR root`, we can guess that this issue originates in the volumes module. There are some cephadm commands being run from within the volumes module (not sure if I'm explaining that exactly right, but I mean that the commands are prefixed by "volumes DEBUG cephadm.ssh"). Then, this leads to the volumes error "Failed to start engine".

In the passing test, there is a similar sequence of commands being run, but they are coming from cephadm, not volumes (note the prefix "cephadm DEBUG cephadm.ssh"). Then there is no failure to start engine coming from volumes.
/a/skanta-2026-01-26_08:54:40-rados-wip-bharath4-testing-2026-01-26-1300-distro-default-trial/17654/remote/trial152/log/bda12de9-faa1-11f0-b4a3-d404e6e7d460/ceph-mgr.trial152.ijinex.log.gz

2026-01-26T10:29:30.474+0000 7f06fe2a4640  0 [cephadm DEBUG cephadm.ssh] Running command: which python3
2026-01-26T10:29:30.478+0000 7f06fe2a4640  0 [cephadm DEBUG cephadm.ssh] Running command: /usr/bin/python3 /var/lib/ceph/bda12de9-faa1-11f0-b4a3-d404e6e7d460/cephadm.0781e12d68101ffaf42cd6756aef2e5c5b3a0a7dd5b61244e45f0793e8aa2f2d --image quay.ceph.io/ceph-ci/ceph@sha256:b83d371843358d00d9ece513e2b367ce1c27b276bf6cfc9ec5b1cc75d4947cf5 --timeout 895 ceph-volume --fsid bda12de9-faa1-11f0-b4a3-d404e6e7d460 -- inventory --format=json-pretty --filter-for-batch
2026-01-26T10:29:30.686+0000 7f07f24ee640  1 --2- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >>  conn(0x1384c800 0x13657b80 unknown :-1 s=NONE pgs=0 gs=0 cs=0 l=0 c_cookie=0 s_cookie=0 reconnecting=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).accept
2026-01-26T10:29:30.686+0000 7f07f24ee640  1 --2- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >>  conn(0x1384c800 0x13657b80 unknown :-1 s=BANNER_ACCEPTING pgs=0 gs=0 cs=0 l=0 c_cookie=0 s_cookie=0 reconnecting=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0
2026-01-26T10:29:30.686+0000 7f07f24ee640 10 mgr.server ms_handle_fast_authentication ms_handle_fast_authentication new session 0x13685200 con 0x1384c800 entity client.admin addr
2026-01-26T10:29:30.686+0000 7f07f24ee640 10 mgr.server ms_handle_fast_authentication  session 0x13685200 client.admin has caps allow * 'allow *'
2026-01-26T10:29:30.686+0000 7f07f24ee640  1 --2- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >> 10.20.193.192:0/1920175680 conn(0x1384c800 0x13657b80 secure :-1 s=READY pgs=2 gs=6 cs=0 l=1 c_cookie=0 s_cookie=0 reconnecting=0 rev1=1 crypto rx=0x13085950 tx=0x1324f980 comp rx=0 tx=0).ready entity=client.14235 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2026-01-26T10:29:30.846+0000 7f07f24ee640  1 -- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >> 10.20.193.192:0/1920175680 conn(0x1384c800 msgr2=0x13657b80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 62
2026-01-26T10:29:30.846+0000 7f07f24ee640  1 -- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >> 10.20.193.192:0/1920175680 conn(0x1384c800 msgr2=0x13657b80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2026-01-26T10:29:30.846+0000 7f07f24ee640  1 --2- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >> 10.20.193.192:0/1920175680 conn(0x1384c800 0x13657b80 secure :-1 s=READY pgs=2 gs=6 cs=0 l=1 c_cookie=0 s_cookie=0 reconnecting=0 rev1=1 crypto rx=0x13085950 tx=0x1324f980 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1
2026-01-26T10:29:30.846+0000 7f07f24ee640  1 --2- [v2:10.20.193.152:6800/2778966694,v1:10.20.193.152:6801/2778966694] >> 10.20.193.192:0/1920175680 conn(0x1384c800 0x13657b80 secure :-1 s=READY pgs=2 gs=6 cs=0 l=1 c_cookie=0 s_cookie=0 reconnecting=0 rev1=1 crypto rx=0x13085950 tx=0x1324f980 comp rx=0 tx=0).stop
2026-01-26T10:29:30.890+0000 7f07ebce1640 10 mgr tick tick
2026-01-26T10:29:30.890+0000 7f07ebce1640 20 mgr send_beacon active
2026-01-26T10:29:30.890+0000 7f07ebce1640 15 mgr send_beacon noting RADOS client for blocklist: libcephsqlite,v2:10.20.193.152:0/1183411494
2026-01-26T10:29:30.890+0000 7f07ebce1640 15 mgr send_beacon noting RADOS client for blocklist: rbd_support,v2:10.20.193.152:0/52349559
2026-01-26T10:29:30.890+0000 7f07ebce1640 15 mgr send_beacon noting RADOS client for blocklist: volumes,v2:10.20.193.152:0/1867992189
2026-01-26T10:29:30.890+0000 7f07ebce1640 10 mgr send_beacon sending beacon as gid 14221
2026-01-26T10:29:30.894+0000 7f070db43640 10 mgr.server tick
2026-01-26T10:29:30.894+0000 7f070db43640  1 mgr.server send_report Not sending PG status to monitor yet, waiting for OSDs

Back to the failed test, the first mention of `cephadm.ssh` occurs after ActivePyModule::dispatch_remote is called, where the "opening connection" task is started in volumes:

2026-01-28T18:11:20.009+0000 7f1b1c8926c0 20 mgr dispatch_remote Calling cephadm.apply...
2026-01-28T18:11:20.009+0000 7f1b240bd6c0  0 [volumes DEBUG cephadm.ssh] Opening connection to root@10.20.193.156 with ssh options '-F /tmp/cephadm-conf-y1737fqx -i /tmp/cephadm-identity-ijoyqnb1'

In the passing test, ActivePyModule::dispatch_remote getting called leads to the "opening connection" task starting in cephadm:

2026-01-26T10:28:54.410+0000 7f431d57d640 20 mgr dispatch_remote Calling cephadm.apply...
2026-01-26T10:28:54.410+0000 7f43265cf640  0 [cephadm DEBUG cephadm.ssh] Opening connection to root@10.20.193.152 with ssh options '-F /tmp/cephadm-conf-w4pbdf2m -i /tmp/cephadm-identity-3pcnrxgs'

This is not a complete RCA, but I think there is something here, especially since one of the PRs that we are testing in the Rocky10 work modifies ActivePyModule::dispatch_remote: https://github.com/ceph/ceph/pull/66244

Before we dismiss this as a lab problem, we should investigate these patterns and rule out any of the Python changes we've introduced.

Actions

Copy link

#14

Updated by Laura Flores about 2 months ago

Tags set to rocky10

Actions

Copy link

#15

Updated by Laura Flores about 2 months ago

@Nitzan Mordechai @Samuel Just What do you think? ^

Actions

Copy link

#16

Updated by Samuel Just about 2 months ago

[volumes ERROR root] Failed to start engine: Timeout('Port 7150 not free on 10.20.193.156.') really looks like cherrypy is trying to bind to a port that isn't available.

I'd guess that something in the rocky10 image in the lab machine is preventing this port from binding. The next step would be log into such a rocky10 node and see what's going on. I'll try to find time to do that tomorrow.

Actions

Copy link

#17

Updated by Samuel Just about 2 months ago

Assignee set to Samuel Just

Actions

Copy link

#18

Updated by Nitzan Mordechai about 2 months ago

@Laura Flores i took another look and it looks like the port binding is also related to all of those issues. (We have a few more trackers for ports binding)

the mgr shows AttributeError("'NoneType' object has no attribute 'fileno'") at 19:19:33.790
and in cephadm logs we can see that we are getting the SSL certificate at 19:19:13

but the journalctl shows:

Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: Exception in thread HTTPServer Thread-62:
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: Traceback (most recent call last):
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.run()
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib64/python3.9/threading.py", line 917, in run
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self._target(*self._args, **self._kwargs)
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.httpserver.start()
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cheroot/server.py", line 1844, in start
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.prepare()
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cheroot/server.py", line 1806, in prepare
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self._connections = connections.ConnectionManager(self)
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cheroot/connections.py", line 131, in __init__
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     server.socket.fileno(),
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: AttributeError: 'NoneType' object has no attribute 'fileno'
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: 2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.trial037.bxobyz: AttributeError("'NoneType' object has no attribute 'fileno'")
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: 2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 dashboard.serve:
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: 2026-01-23T19:19:33.790+0000 7fe7e74d4640 -1 Traceback (most recent call last):
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/usr/share/ceph/mgr/dashboard/module.py", line 367, in serve
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     cherrypy.engine.start()
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     raise e_info
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.publish('start')
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     raise exc
Jan 23 19:19:33 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: cherrypy.process.wspbus.ChannelFailures: AttributeError("'NoneType' object has no attribute 'fileno'")

i checked before that and found that the port is already in used:

Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: Exception in thread HTTPServer Thread-63:
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: Traceback (most recent call last):
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.run()
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib64/python3.9/threading.py", line 917, in run
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self._target(*self._args, **self._kwargs)
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cherrypy/process/servers.py", line 225, in _start_http_thread
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.httpserver.start()
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cheroot/server.py", line 1844, in start
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     self.prepare()
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:   File "/lib/python3.9/site-packages/cheroot/server.py", line 1799, in prepare
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]:     raise socket.error(msg)
Jan 23 19:19:34 trial037 ceph-4e37bd6d-f890-11f0-a001-d404e6e7d460-mgr-trial037-bxobyz[16700]: OSError: No socket could be created -- (('10.20.193.37', 8765): [Errno 98] Address already in use)

Actions

Copy link

#19

Updated by Samuel Just about 2 months ago

I'm focusing on two cases:
- sjust-2026-01-29_23:19:32-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/26892 -- rerun of https://pulpito.ceph.com/nmordech-2026-01-28_16:21:31-rados:cephadm-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/23558/ above, failure
- https://pulpito.ceph.com/skanta-2026-01-26_08:54:40-rados-wip-bharath4-testing-2026-01-26-1300-distro-default-trial/17654/ -- Laura's example above

I think Laura is right that the behavior is related to one of my branches. Specifically, I suspect it's the one that forces all modules into the same interpreter.

First interesting difference:

In the failing variant:

2026-01-29T23:29:34.557+0000 7f21e59cf640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150

In the passing variant:

2026-01-26T10:28:52.294+0000 7f4329dd6640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150

The logger name is wrong. MgrModuleLoggingMixin globally clears and resets the root logger. Volume ends up winning because it seems to be the last to be initialized. These log lines, however, are in fact from the cephadm module. This'll need to be fixed.

The second problem is more complicated -- here's the sequence from the passing test on main:

grep 'Cephadm agent\|0 ceph version' ceph-mgr.trial152.ijinex.log
2026-01-26T10:28:42.782+0000 7f242d7eefc0  0 ceph version 20.3.0-4930-g25c8ee62 (25c8ee62b04466881bb7a6a5cd9d1226cad7776a) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-26T10:28:48.514+0000 7f441b872fc0  0 ceph version 20.3.0-4930-g25c8ee62 (25c8ee62b04466881bb7a6a5cd9d1226cad7776a) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-26T10:28:52.294+0000 7f4329dd6640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150
2026-01-26T10:28:59.722+0000 7ff15f476fc0  0 ceph version 20.3.0-4930-g25c8ee62 (25c8ee62b04466881bb7a6a5cd9d1226cad7776a) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-26T10:29:02.798+0000 7ff06d9da640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150
2026-01-26T10:29:26.178+0000 7f07f3547fc0  0 ceph version 20.3.0-4930-g25c8ee62 (25c8ee62b04466881bb7a6a5cd9d1226cad7776a) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-26T10:29:29.282+0000 7f0701aab640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150

This test seems to start this manager instance 4 times and starts up the cephadm http server at port 7150 3 times. Failing run:

2026-01-29T23:29:28.793+0000 7feec3bc5fc0  0 ceph version 20.3.0-4942-gb62a951f (b62a951ffc48a50e41d23e63ed6b312afb1c1621) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-29T23:29:32.601+0000 7f22c8634fc0  0 ceph version 20.3.0-4942-gb62a951f (b62a951ffc48a50e41d23e63ed6b312afb1c1621) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-29T23:29:34.557+0000 7f21e59cf640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150
2026-01-29T23:29:42.081+0000 7fced891dfc0  0 ceph version 20.3.0-4942-gb62a951f (b62a951ffc48a50e41d23e63ed6b312afb1c1621) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-29T23:29:43.321+0000 7fcdf5cb8640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150
2026-01-29T23:30:04.169+0000 7fa26b111fc0  0 ceph version 20.3.0-4942-gb62a951f (b62a951ffc48a50e41d23e63ed6b312afb1c1621) tentacle (dev - RelWithDebInfo), process ceph-mgr, pid 7
2026-01-29T23:30:05.497+0000 7fa1884ac640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150

Same pattern (though with the incorrect logger name). First cherrypy start for each. Failing:

2026-01-29T23:29:34.557+0000 7f21e59cf640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150
2026-01-29T23:29:34.561+0000 7f21e59cf640  0 [volumes INFO cherrypy.error] [29/Jan/2026:23:29:34] ENGINE Bus STARTING
2026-01-29T23:29:34.661+0000 7f21e59cf640  0 [volumes INFO cherrypy.error] [29/Jan/2026:23:29:34] ENGINE Serving on http://10.20.193.46:8765
2026-01-29T23:29:34.769+0000 7f21e59cf640  0 [volumes INFO cherrypy.error] [29/Jan/2026:23:29:34] ENGINE Serving on https://10.20.193.46:7150
2026-01-29T23:29:34.769+0000 7f21e59cf640  0 [volumes INFO cherrypy.error] [29/Jan/2026:23:29:34] ENGINE Bus STARTED
2026-01-29T23:29:34.769+0000 7f21e59cf640  0 [volumes DEBUG root] Cherrypy engine started.
2026-01-29T23:29:34.769+0000 7f21e59cf640  0 [volumes DEBUG root] _kick_serve_loop

Passing:

2026-01-26T10:28:52.294+0000 7f4329dd6640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150
2026-01-26T10:28:52.294+0000 7f4329dd6640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:28:52] ENGINE Bus STARTING
2026-01-26T10:28:52.294+0000 7f4329dd6640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:28:52] ENGINE Bus STARTING
2026-01-26T10:28:52.402+0000 7f4329dd6640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:28:52] ENGINE Serving on https://10.20.193.152:7150
2026-01-26T10:28:52.402+0000 7f4329dd6640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:28:52] ENGINE Serving on https://10.20.193.152:7150
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:28:52] ENGINE Serving on http://10.20.193.152:8765
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:28:52] ENGINE Serving on http://10.20.193.152:8765
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:28:52] ENGINE Bus STARTED
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:28:52] ENGINE Bus STARTED
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 [cephadm DEBUG root] Cherrypy engine started.
2026-01-26T10:28:52.506+0000 7f4329dd6640  0 [cephadm DEBUG root] _kick_serve_loop

Basically similar. Final one, passing:

2026-01-26T10:29:29.282+0000 7f0701aab640  0 [cephadm DEBUG root] Cephadm agent endpoint using 7150
2026-01-26T10:29:29.282+0000 7f0701aab640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:29:29] ENGINE Bus STARTING
2026-01-26T10:29:29.282+0000 7f0701aab640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:29:29] ENGINE Bus STARTING
2026-01-26T10:29:29.386+0000 7f0701aab640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:29:29] ENGINE Serving on http://10.20.193.152:8765
2026-01-26T10:29:29.386+0000 7f0701aab640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:29:29] ENGINE Serving on http://10.20.193.152:8765
2026-01-26T10:29:29.494+0000 7f0701aab640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:29:29] ENGINE Serving on https://10.20.193.152:7150
2026-01-26T10:29:29.494+0000 7f0701aab640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:29:29] ENGINE Serving on https://10.20.193.152:7150
2026-01-26T10:29:29.494+0000 7f0701aab640  0 [cephadm INFO cherrypy.error] [26/Jan/2026:10:29:29] ENGINE Bus STARTED
2026-01-26T10:29:29.494+0000 7f0701aab640  0 log_channel(cephadm) log [INF] : [26/Jan/2026:10:29:29] ENGINE Bus STARTED
2026-01-26T10:29:29.494+0000 7f0701aab640  0 [cephadm DEBUG root] Cherrypy engine started.
2026-01-26T10:29:29.494+0000 7f0701aab640  0 [cephadm DEBUG root] _kick_serve_loop

failing:

2026-01-29T23:30:05.497+0000 7fa1884ac640  0 [volumes DEBUG root] Cephadm agent endpoint using 7150
[no further lines for thread 7fa1884ac640]
...
2026-01-29T23:30:06.961+0000 7fa1763fa640  0 [volumes ERROR root] Failed to start engine: Timeout('Port 7150 not free on 10.20.193.46.')
...
2026-01-29T23:30:07.129+0000 7fa182460640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.trial046.uzjxgw: Timeout('Port 7150 not free on 10.20.193.46.')
...
2026-01-29T23:30:07.129+0000 7fa182460640 -1 dashboard.serve:
2026-01-29T23:30:07.129+0000 7fa182460640 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/module.py", line 367, in serve
    cherrypy.engine.start()
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
    raise e_info
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: Timeout('Port 7150 not free on 10.20.193.46.')

It seems to have worked fine the first two times, but failed to bind the final time. Perhaps the prior instance of ceph-mgr was actually still alive? Not sure yet.

Actions

Copy link

#20

Updated by Nizamudeen A about 2 months ago

based on the comments that I made here: https://tracker.ceph.com/issues/74643#note-2, I think its starts to fail as soon as agent and dashboard starts running together from what I can see. so they might be messing up with each other because of the global share config of cherrypy?
https://docs.cherrypy.dev/en/stable/config.html#global-config

Actions

Copy link

#21

Updated by Samuel Just about 2 months ago

Yeah, I agree that https://tracker.ceph.com/issues/74643 could be the same problem.

Actions

Copy link

#22

Updated by Laura Flores about 2 months ago

Related to Bug #74605: Rocky10 - ERROR: test_sql_autocommit1 (tasks.mgr.test_devicehealth.TestDeviceHealth.test_sql_autocommit1) added

Actions

Copy link

#23

Updated by Laura Flores about 2 months ago

Subject changed from AttributeError in dashboard module to Rocky10 - AttributeError in dashboard module

Actions

Copy link

#24

Updated by Nizamudeen A about 1 month ago

Status changed from New to Duplicate

I am closing this as the duplicate of https://tracker.ceph.com/issues/74643.

Actions

Copy link

#25

Updated by Gandhi Bashyam about 1 month ago

Analysis Summary

This issue appears to be documentation / tracking related rather than a functional defect.

Based on review of the issue details and available context, there is no indication of a runtime failure, user-visible regression, or incorrect behavior in the affected component. The bug primarily serves to document an observed condition / gap rather than expose a code-level fault.

Impact Assessment
• User-visible impact: None
• Functional impact: None
• CI / QA impact: None observed
• Severity: Minor

Notes
• No reproduction steps or failure logs indicate a breaking condition.
• Suitable to track as a documentation or informational issue unless additional evidence emerges.

Actions

Copy link

#26

Updated by Yaarit Hatuka about 1 month ago

Is duplicate of Bug #74643: cherrypy.process.wspbus.ChannelFailures: TypeError('certfile should be a valid filesystem path') added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Tags

Custom queries

Bug #74543

Rocky10 - AttributeError in dashboard module

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Yaarit Hatuka about 2 months ago

Updated by Nitzan Mordechai about 2 months ago · Edited

Updated by Yaarit Hatuka about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by David Galloway about 2 months ago

Updated by Laura Flores about 2 months ago · Edited

Updated by Laura Flores about 2 months ago · Edited

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by Nitzan Mordechai about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by Nizamudeen A about 2 months ago

Updated by Samuel Just about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Laura Flores about 2 months ago

Updated by Nizamudeen A about 1 month ago

Updated by Gandhi Bashyam about 1 month ago

Updated by Yaarit Hatuka about 1 month ago