Project

General

Profile

Actions

Bug #70669

open

ERROR: test_list_enabled_module: cephfs resource temporarily unavailable

Added by Laura Flores 12 months ago. Updated 5 days ago.

Status:
Fix Under Review
Priority:
High
Assignee:
Category:
Build, CI, Dependencies & Tools
Target version:
% Done:

0%

Source:
Q/A
Backport:
tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

/a/yuriw-2025-03-20_14:48:30-rados-wip-yuri3-testing-2025-03-18-0732-distro-default-smithi/8199834

teuthology.log

025-03-20T19:28:08.012 INFO:tasks.mgr.dashboard.test_mgr_module:Trying to reach the REST API endpoint
2025-03-20T19:28:08.012 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.115:7791/api/mgr/module
2025-03-20T19:28:08.014 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2025-03-20T19:28:08.645 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:08.641+0000 7fed45d67640 -1 client.0 error registering admin socket command: (17) File exists
2025-03-20T19:28:08.645 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:08.641+0000 7fed45d67640 -1 client.0 error registering admin socket command: (17) File exists
2025-03-20T19:28:08.645 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:08.641+0000 7fed45d67640 -1 client.0 error registering admin socket command: (17) File exists
2025-03-20T19:28:08.645 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:08.641+0000 7fed45d67640 -1 client.0 error registering admin socket command: (17) File exists
2025-03-20T19:28:08.645 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:08.641+0000 7fed45d67640 -1 client.0 error registering admin socket command: (17) File exists
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.z: Timeout('Port 8443 not free on ::.')
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 dashboard.serve:
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 Traceback (most recent call last):
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:  File "/usr/share/ceph/mgr/dashboard/module.py", line 359, in serve
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:    cherrypy.engine.start()
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
2025-03-20T19:28:09.892 INFO:tasks.ceph.mgr.z.smithi115.stderr:    raise e_info
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:    self.publish('start')
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:    raise exc
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:cherrypy.process.wspbus.ChannelFailures: Timeout('Port 8443 not free on ::.')
2025-03-20T19:28:09.893 INFO:tasks.ceph.mgr.z.smithi115.stderr:
2025-03-20T19:28:11.558 INFO:tasks.ceph.mon.a.smithi037.stderr:2025-03-20T19:28:11.555+0000 7f34b7849640 -1 log_channel(cluster) log [ERR] : Health check failed: Module 'dashboard' has failed: Timeout('Port 8443 not free on ::.') (MGR_MODULE_ERROR)
...
2025-03-20T19:29:10.896 INFO:tasks.cephfs_test_runner:test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ERROR
2025-03-20T19:29:10.897 INFO:tasks.cephfs_test_runner:
2025-03-20T19:29:10.897 INFO:tasks.cephfs_test_runner:======================================================================
2025-03-20T19:29:10.897 INFO:tasks.cephfs_test_runner:ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
2025-03-20T19:29:10.897 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8e60109c2ce9ac81275a6501eacf5f84a082ec68/virtualenv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:    conn = connection.create_connection(
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8e60109c2ce9ac81275a6501eacf5f84a082ec68/virtualenv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:    raise err
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8e60109c2ce9ac81275a6501eacf5f84a082ec68/virtualenv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:    sock.connect(sa)
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8e60109c2ce9ac81275a6501eacf5f84a082ec68/virtualenv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 590, in connect
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:    self._internal_connect(address)
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8e60109c2ce9ac81275a6501eacf5f84a082ec68/virtualenv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 634, in _internal_connect
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:    raise _SocketError(err, strerror(err))
2025-03-20T19:29:10.898 INFO:tasks.cephfs_test_runner:ConnectionRefusedError: [Errno 111] Connection refused

mgr.y.log

2025-03-20T19:11:04.700+0000 7fe3caea2640  0 [dashboard ERROR viewcache] Error while calling fn=<function CephFSClients.get at 0x7fe452b47670> ex=error in mds_command2: Resource temporarily unavailable [Errno 11]
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run
    val = self.fn(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/cephfs.py", line 659, in get
    ret = CephService.send_command('mds', 'session ls', srv_spec='{0}:0'.format(self.fscid))
  File "/usr/share/ceph/mgr/dashboard/services/ceph_service.py", line 334, in send_command
    mgr.send_command(result, srv_type, srv_spec, json.dumps(argdict), "", inbuf=inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1905, in send_command
    self.cephfs.mds_command2(wrapped_result, svc_id, command, inbuf, one_shot=one_shot)
  File "cephfs.pyx", line 2336, in cephfs.LibCephFS.mds_command2
cephfs.WouldBlock: error in mds_command2: Resource temporarily unavailable [Errno 11]
2025-03-20T19:11:04.700+0000 7fe3caea2640  0 [dashboard DEBUG viewcache] execution of <function CephFSClients.get at 0x7fe452b47670> finished in: -1742497864.6996083
2025-03-20T19:11:04.700+0000 7fe3d41f4640  0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
    ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
    return func(*vpath, **params)
  File "/usr/share/ceph/mgr/dashboard/controllers/cephfs.py", line 139, in clients
    return self._clients(fs_id, suppress_client_ls_errors=flag)
  File "/usr/share/ceph/mgr/dashboard/controllers/cephfs.py", line 390, in _clients
    status, clients = cephfs_clients.get(suppress_get_errors)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 245, in wrapper
    return rvc.run(fn, args, kwargs)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 227, in run
    raise self.exception
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run
    val = self.fn(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/cephfs.py", line 659, in get
    ret = CephService.send_command('mds', 'session ls', srv_spec='{0}:0'.format(self.fscid))
  File "/usr/share/ceph/mgr/dashboard/services/ceph_service.py", line 334, in send_command
    mgr.send_command(result, srv_type, srv_spec, json.dumps(argdict), "", inbuf=inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1905, in send_command
    self.cephfs.mds_command2(wrapped_result, svc_id, command, inbuf, one_shot=one_shot)
  File "cephfs.pyx", line 2336, in cephfs.LibCephFS.mds_command2
cephfs.WouldBlock: error in mds_command2: Resource temporarily unavailable [Errno 11]
...
2025-03-20T19:12:27.125+0000 7fe3d39f3640  0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
    return handler(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
    ret = func(*args, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
    return func(*vpath, **params)
  File "/usr/share/ceph/mgr/dashboard/controllers/cephfs.py", line 530, in rm_tree
    cfs.rm_dir(path)
  File "/usr/share/ceph/mgr/dashboard/services/cephfs.py", line 157, in rm_dir
    self.cfs.rmdir(path)
  File "cephfs.pyx", line 1375, in cephfs.LibCephFS.rmdir
cephfs.ObjectNotEmpty: error in rmdir /pictures: Directory not empty [Errno 39]

Recent attempted fix: https://tracker.ceph.com/issues/62972


Related issues 3 (2 open1 closed)

Related to Dashboard - Bug #62972: ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)Pending BackportPedro González Gómez

Actions
Related to Dashboard - Bug #75358: Health check failed: Module 'dashboard' has failed: Port 8443 not freeFix Under ReviewAfreen Misbah

Actions
Has duplicate Ceph - Bug #72272: test_list_enabled_module: Socket error - Address already in use Duplicate

Actions
Actions #1

Updated by Laura Flores 12 months ago

  • Related to Bug #62972: ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) added
Actions #2

Updated by Afreen Misbah 12 months ago

  • Category set to Build, CI, Dependencies & Tools
  • Target version set to v20.0.0
Actions #3

Updated by Laura Flores 12 months ago

/a/yuriw-2025-03-21_20:26:29-rados-wip-yuri7-testing-2025-03-21-0821-distro-default-smithi/8202153

Actions #4

Updated by Bill Scales 12 months ago

Looking at /a/yuriw-2025-03-20_14:48:30-rados-wip-yuri3-testing-2025-03-18-0732-distro-default-smithi/8199834

teuthology test times out an API call - here is the start and end of the retires:

2025-03-20T19:28:08.014 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
..
2025-03-20T19:29:10.908 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))

Mgr Z was active at the time:

/a/yuriw-2025-03-20_14:48:30-rados-wip-yuri3-testing-2025-03-18-0732-distro-default-smithi/8199834/remote/smithi115/log/ceph-mgr.z.log.gz

It decides to respawn here:

2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr handle_mgr_map respawning because set of enabled modules changed!
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  e: 'ceph-mgr'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  0: 'ceph-mgr'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  1: '-f'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  2: '--cluster'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  3: 'ceph'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  4: '-i'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  5: 'z'
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn respawning with exe /usr/bin/ceph-mgr
2025-03-20T19:28:04.989+0000 7f01f1ecb640  1 mgr respawn  exe_path /proc/self/exe

So expect it to be unavailable for a few seconds while it reloads

Restart looks like it gets into problems starting dashboard:

2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.
z: Timeout('Port 8443 not free on ::.')
2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 dashboard.serve:
2025-03-20T19:28:09.888+0000 7fed58f0d640 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/module.py", line 359, in serve
    cherrypy.engine.start()
  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
    raise e_info
  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
    self.publish('start')
  File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
    raise exc
cherrypy.process.wspbus.ChannelFailures: Timeout('Port 8443 not free on ::.')

Which results in a health check error:

2025-03-20T19:28:10.569+0000 7fed5f71a640 20 mgr.server operator() health checks:
{
    "MGR_MODULE_ERROR": {
        "severity": "HEALTH_ERR",
        "summary": {
            "message": "Module 'dashboard' has failed: Timeout('Port 8443 not free on ::.')",
            "count": 1
        },
        "detail": [
            {
                "message": "Module 'dashboard' has failed: Timeout('Port 8443 not free on ::.')" 
            }
        ]
    }
}

I'm guessing that failing to start the dashboard is why the API isn't running even thought the error above is for port 8443 rather than the API port 7791.

Neither of the other MGR's have tried to become active, so it can't be them that are using port 8443, so it is presumably the prior instance of Mgr Z's use of port 8443 that is stopping the new instance binding to the port. Is the dashboard setting SO_REUSEADDR / SO_REUSEPORT (https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ) on its sockets?

Actions #5

Updated by Laura Flores 12 months ago

/a/yuriw-2025-03-27_15:03:25-rados-wip-yuri7-testing-2025-03-26-1605-distro-default-smithi/8213285

Actions #6

Updated by Shraddha Agrawal 12 months ago

/a/skanta-2025-03-23_11:08:44-rados-wip-bharath15-testing-2025-03-22-1514-distro-default-smithi/8204183

Actions #7

Updated by Jaya Prakash 12 months ago

/a/yuriw-2025-03-22_14:06:08-rados-wip-yuri2-testing-2025-03-21-0820-distro-default-smithi
2 jobs: ['8202560', '8202575']

Actions #8

Updated by Jaya Prakash 12 months ago

/a/yuriw-2025-03-21_20:27:12-rados-wip-yuri2-testing-2025-03-21-0820-distro-default-smithi
2 jobs: ['8201834', '8202213']

Actions #9

Updated by Laura Flores 12 months ago

/a/skanta-2025-04-04_06:10:17-rados-wip-bharath10-testing-2025-04-03-2112-distro-default-smithi/8223708

Actions #10

Updated by Laura Flores 12 months ago

/a/teuthology-2025-04-06_20:00:14-rados-main-distro-default-smithi/8226610

Actions #11

Updated by Laura Flores 12 months ago

Hey @Afreen Misbah, any updates on this?

Actions #12

Updated by Afreen Misbah 12 months ago

Hi Laura,

Will check this, I was OOO past month came back this week.
Give me this week time, thanks

Actions #13

Updated by Afreen Misbah 12 months ago

  • Status changed from New to In Progress
  • Priority changed from Normal to High
Actions #14

Updated by Laura Flores 11 months ago

/a/skanta-2025-04-05_15:49:33-rados-wip-bharath8-testing-2025-04-05-1439-distro-default-smithi/8225451

Actions #15

Updated by Kamoltat (Junior) Sirivadhna 11 months ago

/a/skanta-2025-04-09_05:31:19-rados-wip-bharath17-testing-2025-04-08-0602-distro-default-smithi/

[8233190, 8233200]

Actions #16

Updated by Laura Flores 11 months ago

/a/lflores-2025-04-11_15:58:32-rados-wip-lflores-testing-2025-04-10-2245-distro-default-smithi/8235742

Actions #17

Updated by Afreen Misbah 11 months ago

  • Source set to Q/A
Actions #18

Updated by Laura Flores 11 months ago

/a/lflores-2025-04-11_19:10:45-rados-wip-lflores-testing-3-2025-04-11-1140-distro-default-smithi/8236162

Actions #19

Updated by Laura Flores 11 months ago

/a/lflores-2025-04-15_00:57:49-rados-wip-lflores-testing-5-2025-04-14-1635-distro-default-gibba/8241446

Actions #20

Updated by Shraddha Agrawal 11 months ago

/a/yuriw-2025-04-22_22:07:13-rados-wip-yuri5-testing-2025-04-22-1252-distro-default-smithi/8254088

Actions #22

Updated by Jaya Prakash 10 months ago

/a/akupczyk-2025-05-22_09:41:31-rados-aclamk-testing-phoebe-2025-05-21-1802-distro-default-smithi/8294019

Actions #23

Updated by Aishwarya Mathuria 10 months ago

/a/teuthology-2025-06-01_20:00:15-rados-main-distro-default-smithi/8304175
/a/teuthology-2025-06-01_20:00:15-rados-main-distro-default-smithi/8304026

Actions #24

Updated by Laura Flores 10 months ago

/a/skanta-2025-06-03_11:03:28-rados-wip-bharath1-testing-2025-06-02-2052-distro-default-smithi/8307608

Actions #25

Updated by Ronen Friedman 9 months ago

Hi, @Afreen Misbah
Any update regarding this issue? and did you notice Bill Scale's note about, suggesting checking that reuse-address is set?
Thanks.

Actions #26

Updated by Laura Flores 9 months ago

/a/teuthology-2025-05-18_22:00:03-rados-tentacle-distro-default-smithi/8288410

Actions #27

Updated by Jaya Prakash 9 months ago

akupczyk-2025-06-09_15:27:19-rados-aclamk-testing-ganymede-2025-06-09-0715-tentacle-distro-default-smithi/8317327

Actions #28

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-06-18_11:22:46-rados-wip-yuri-testing-2025-06-12-0731-distro-default-smithi/8334512

Actions #29

Updated by Sridhar Seshasayee 9 months ago

skanta-2025-06-17_15:42:09-rados-wip-bharath13-testing-2025-06-17-1621-distro-default-smithi
2 Jobs: [8332772, 8332911]

Actions #30

Updated by Kamoltat (Junior) Sirivadhna 9 months ago

= /a/skanta-2025-06-07_04:15:58-rados-wip-bharath8-testing-2025-06-02-1508-distro-default-smithi/8312618 = /a/skanta-2025-06-07_04:15:58-rados-wip-bharath8-testing-2025-06-02-1508-distro-default-smithi/8312625

Actions #31

Updated by Sridhar Seshasayee 9 months ago

skanta-2025-06-16_03:59:33-rados-wip-bharath10-testing-2025-06-15-0841-distro-default-smithi
2 jobs: [8330485, 8330497]

Actions #32

Updated by Afreen Misbah 9 months ago

Apologies did not check this back, let me revert by Monday on this.

Actions #33

Updated by Shraddha Agrawal 9 months ago

/a/yuriw-2025-06-19_19:29:56-rados-wip-yuri11-testing-2025-06-19-0935-distro-default-smithi/8337397

Actions #34

Updated by Laura Flores 9 months ago

/a/skanta-2025-06-03_11:03:28-rados-wip-bharath1-testing-2025-06-02-2052-distro-default-smithi/8307608

Actions #35

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-06-25_05:53:02-rados-wip-bharath4-testing-2025-06-24-0841-distro-default-smithi/8347692

Actions #36

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-06-26_23:33:10-rados-wip-bharath6-testing-2025-06-26-1840-tentacle-distro-default-smithi/8352367

Actions #37

Updated by Sridhar Seshasayee 9 months ago

/a/yuriw-2025-07-01_21:01:48-rados-wip-yuri11-testing-2025-07-01-1146-tentacle-distro-default-smithi/8365747

Actions #38

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-07-04_23:32:34-rados-wip-bharath13-testing-2025-07-04-0559-distro-default-smithi/8370623

Actions #39

Updated by Shraddha Agrawal 9 months ago

/a/skanta-2025-07-03_10:29:59-rados-wip-bharath5-testing-2025-06-30-2106-distro-default-smithi/8368531

Actions #40

Updated by Laura Flores 8 months ago

/a/yuriw-2025-07-07_19:32:23-rados-wip-yuri4-testing-2025-07-07-0811-tentacle-distro-default-smithi/8374225

Actions #41

Updated by Shraddha Agrawal 8 months ago

/a/yuriw-2025-07-10_01:00:46-rados-wip-yuri-testing-2025-07-09-1458-tentacle-distro-default-smithi/8379455

Actions #42

Updated by Jaya Prakash 8 months ago

Hi @Afreen Misbah, any updates on this?

Actions #43

Updated by Naveen Naidu 8 months ago

/a/skanta-2025-07-11_12:12:50-rados-wip-bharath2-testing-2025-07-11-0437-distro-default-smithi/

2 jobs: ['8381362', '8381575']

Actions #44

Updated by Laura Flores 8 months ago

/a/yuriw-2025-07-10_23:00:33-rados-wip-yuri5-testing-2025-07-10-0913-distro-default-smithi/8380911

Actions #45

Updated by Afreen Misbah 8 months ago

Hi,

So, this issue happens when iostat module enabled -> mgr respawned -> dashboard restart (attempts to bind to 8443) -> Port 8443 unavailable -> and API cant access.
Now the question:

Is the dashboard setting SO_REUSEADDR / SO_REUSEPORT (https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ) on its sockets?

Dashboard uses cherrypy which internally uses cheroot server and SO_REUSEADDR on by default and no option to toggle that: https://github.com/cherrypy/cheroot/blob/587d1fa5e19eb1337d2b9c22995783761628c4b1/cheroot/server.py#L2158-L2168

And for SO_REUSEPORT there is an option to set that (off by default) but present in latest cherrypy versions. So off now in dashboard.


This is a race condition between process restart and OS port release. It is worsened by rapid restarts, slow cleanup on shared/flaky test hosts like Teuthology,
It's not an inherent server bug or misbehavior in the dashboard service. This failure is a test/lifecycle/environment flake.

We need to find a suitable time window here, probably increasing the time to wait for mgr re-initialization and port to be assigned to dashboard.
Current is sleep for 3 sec and then timeout for 30 sec. Perhaps increasing the sleep time to 15 secs and if required increasing timeout.
We can do in parts to test this if former works and if not with latter.

Actions #46

Updated by Afreen Misbah 8 months ago

  • Priority changed from High to Low
Actions #47

Updated by Shraddha Agrawal 8 months ago

/a/skanta-2025-07-13_23:08:24-rados-wip-bharath4-testing-2025-07-13-0539-distro-default-smithi/8384536

Actions #48

Updated by Connor Fawcett 8 months ago

  • Related to Bug #72272: test_list_enabled_module: Socket error - Address already in use added
Actions #49

Updated by Connor Fawcett 8 months ago

@Afreen Misbah I've raised a new tracker for a similar failure with a slightly different exception ("Address already in use" vs. "Port X not free..."), would you be able to have a look and see if it should be duped to this please? https://tracker.ceph.com/issues/72272

Actions #50

Updated by Connor Fawcett 8 months ago

/a/skanta-2025-07-19_23:59:58-rados-wip-bharath5-testing-2025-07-18-0518-distro-default-smithi/8397502

Actions #52

Updated by Laura Flores 8 months ago · Edited

Afreen Misbah wrote in #note-51:

yes @Connor Fawcett , its dup https://tracker.ceph.com/issues/72272#note-3

@Afreen Misbah see my comment on the above tracker- I wanted to clarify something before we mark it as a dupe.

https://tracker.ceph.com/issues/72272#note-4

Actions #53

Updated by Laura Flores 8 months ago

/a/skanta-2025-07-26_06:22:18-rados-wip-bharath9-testing-2025-07-26-0628-distro-default-smithi/8407583

Actions #54

Updated by Shraddha Agrawal 8 months ago

/a/skanta-2025-07-26_22:27:26-rados-wip-bharath7-testing-2025-07-26-0611-tentacle-distro-default-smithi/8409768

Actions #55

Updated by Laura Flores 8 months ago

/a/yuriw-2025-07-28_23:36:09-rados-tentacle-release-distro-default-smithi/8413697

Actions #56

Updated by Kamoltat (Junior) Sirivadhna 7 months ago

/a/yuriw-2025-07-28_18:11:33-rados-wip-yuri2-testing-2025-07-24-0816-tentacle-distro-default-smithi/

['8412132', '8412123']

Actions #57

Updated by Laura Flores 7 months ago

/a/skanta-2025-08-05_03:51:26-rados-wip-bharath9-testing-2025-08-05-0506-distro-default-smithi/8424337

Actions #58

Updated by Kamoltat (Junior) Sirivadhna 7 months ago

/a/teuthology-2025-08-10_20:00:42-rados-main-distro-default-smithi/
['8434986', '8434839']

Actions #59

Updated by Connor Fawcett 7 months ago

/a/skanta-2025-08-14_03:18:47-rados-wip-bharath4-testing-2025-08-13-0949-tentacle-distro-default-smithi/8442207
/a/skanta-2025-08-14_03:18:47-rados-wip-bharath4-testing-2025-08-13-0949-tentacle-distro-default-smithi/8442199

Actions #60

Updated by Sridhar Seshasayee 7 months ago

/a/skanta-2025-08-24_23:24:05-rados-wip-bharath9-testing-2025-08-24-1258-tentacle-distro-default-smithi/
[8461789, 8461802]

Actions #61

Updated by Aishwarya Mathuria 7 months ago

/a/skanta-2025-08-21_23:24:45-rados-wip-bharath7-testing-2025-08-19-0959-distro-default-smithi/8457151
/a/skanta-2025-08-21_23:24:45-rados-wip-bharath7-testing-2025-08-19-0959-distro-default-smithi/8457140

Actions #62

Updated by Connor Fawcett 7 months ago

/a/skanta-2025-08-31_23:44:30-rados-wip-bharath4-testing-2025-08-31-1138-distro-default-smithi/8474718
/a/skanta-2025-08-31_23:44:30-rados-wip-bharath4-testing-2025-08-31-1138-distro-default-smithi/8474708

Actions #63

Updated by Connor Fawcett 6 months ago

/a/yuriw-2025-09-06_15:55:33-rados-wip-yuri3-testing-2025-09-04-1437-tentacle-distro-default-smithi/8484446
/a/yuriw-2025-09-06_15:55:33-rados-wip-yuri3-testing-2025-09-04-1437-tentacle-distro-default-smithi/8484387

Actions #64

Updated by Laura Flores 6 months ago

/a/yuriw-2025-09-12_19:42:42-rados-wip-yuri3-testing-2025-09-12-0906-distro-default-smithi/8496668

Actions #65

Updated by Laura Flores 6 months ago

/a/yuriw-2025-09-15_20:16:05-rados-wip-yuri-testing-2025-09-15-1029-tentacle-distro-default-smithi/8501710

Actions #66

Updated by Laura Flores 6 months ago

/a/yuriw-2025-09-18_00:46:21-rados-wip-yuri3-testing-2025-09-17-1535-tentacle-distro-default-smithi/8507629

Actions #67

Updated by Laura Flores 6 months ago

/a/skanta-2025-09-18_23:59:11-rados-wip-bharath4-testing-2025-09-18-1250-distro-default-smithi/8510979

Actions #68

Updated by Laura Flores 6 months ago

/a/yuriw-2025-09-18_21:29:32-rados-tentacle-release-distro-default-smithi/8510384

Actions #69

Updated by Casey Bodley 6 months ago

  • Priority changed from Low to High

@Afreen Misbah @Ernesto Puerta api tests have been broken in teuthology for 6 months, why is this low priority? please maintain your tests. this has been blocking https://github.com/ceph/ceph/pull/63306 for rgw, but surely the rados/dashboard suite is important too

Actions #70

Updated by Jaya Prakash 6 months ago

/a/akupczyk-2025-09-25_13:54:10-rados-aclamk-wip-ifed-fix-70390-distro-default-smithi/8520359

Actions #71

Updated by Afreen Misbah 5 months ago · Edited

It was a low priority because we cannot reproduce elsewhere and not always visible .
More here https://tracker.ceph.com/issues/70669?#note-45

I did not know that rgw PR was waiting on that.

Actions #72

Updated by Afreen Misbah 5 months ago

  • Pull request ID set to 65868

Hey @Casey Bodley raised a PR, will try to get by Friday but might get delayed by Monday.
Hope that works for you or needed earlier ?

Thanks

Actions #73

Updated by Nitzan Mordechai 5 months ago

/a/skanta-2025-10-09_23:11:22-rados-wip-bharath3-testing-2025-10-09-0519-distro-default-smithi ['8543837', '8543850']

Actions #74

Updated by Kamoltat (Junior) Sirivadhna 5 months ago

/a/skanta-2025-09-08_23:33:07-rados-wip-bharath2-testing-2025-09-07-1916-distro-default-smithi

[8488709, 8488724]

Actions #75

Updated by Afreen Misbah 5 months ago

Update:

- The timeout for 15 sec did not work.
- Putting to test with timeout inc to 30 sec

PR is updated will be picked in next test run. Thanks

Actions #76

Updated by Kamoltat (Junior) Sirivadhna 5 months ago

suite watcher: tracker is in progress, PR is currently being tested.

Actions #77

Updated by Laura Flores 5 months ago

/a/skanta-2025-10-09_23:38:36-rados-wip-bharath7-testing-2025-10-09-2128-distro-default-smithi/8544046

Actions #78

Updated by Sridhar Seshasayee 5 months ago · Edited

/a/skanta-2025-10-24_12:45:03-rados-wip-bharath9-testing-2025-10-14-1426-tentacle-distro-default-smithi/
[8567560, 8567567]

Actions #79

Updated by Neha Ojha 4 months ago

  • Status changed from In Progress to Fix Under Review
Actions #80

Updated by Laura Flores 4 months ago

/a/lflores-2025-11-20_15:55:17-rados-wip-lflores-testing-3-2025-11-19-1856-tentacle-distro-default-smithi/8616004

Actions #81

Updated by Laura Flores 4 months ago

/a/lflores-2025-12-02_17:29:40-rados-wip-lflores-testing-4-2025-12-01-1527-distro-default-smithi/8636039

Actions #82

Updated by Laura Flores 3 months ago

/a/skanta-2025-12-09_12:02:32-rados-wip-bharath4-testing-2025-12-09-1249-tentacle-distro-default-smithi/8647045

Actions #83

Updated by Sridhar Seshasayee 3 months ago

skanta-2025-12-03_02:50:04-rados-wip-bharath5-testing-2025-12-02-1511-distro-default-smithi/
[8638382,8638399]

Actions #84

Updated by Laura Flores about 2 months ago

/a/lflores-2026-01-21_20:56:39-rados-main-distro-default-trial/11970

Actions #85

Updated by Laura Flores about 2 months ago

/a/lflores-2026-01-23_19:07:45-rados-wip-rocky10-branch-of-the-day-2026-01-23-1769128778-distro-default-trial/15295

Actions #86

Updated by Aishwarya Mathuria about 2 months ago

/a/yuriw-2026-01-27_16:21:32-rados-wip-yuri10-testing-2026-01-22-2036-tentacle-distro-default-trial/21938/

Actions #87

Updated by Laura Flores about 2 months ago

  • Related to deleted (Bug #72272: test_list_enabled_module: Socket error - Address already in use )
Actions #88

Updated by Laura Flores about 2 months ago

  • Has duplicate Bug #72272: test_list_enabled_module: Socket error - Address already in use added
Actions #89

Updated by Laura Flores about 2 months ago

/a/lflores-2026-01-26_23:21:06-rados-wip-yuri12-testing-2026-01-22-2045-distro-default-trial/19102

Actions #90

Updated by Sridhar Seshasayee about 2 months ago

/a/skanta-2026-01-27_05:35:03-rados-wip-bharath1-testing-2026-01-26-1242-distro-default-trial
2 jobs: ['19771', '19743']

Actions #91

Updated by Connor Fawcett about 1 month ago · Edited

/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19846
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19845
/a/skanta-2026-01-27_07:02:07-rados-wip-bharath3-testing-2026-01-26-1323-distro-default-trial/19886

Actions #92

Updated by Laura Flores about 1 month ago

/a/yuriw-2026-02-03_16:00:06-rados-wip-yuri4-testing-2026-02-02-2122-distro-default-trial/31794

Actions #93

Updated by Jaya Prakash about 1 month ago

/a/jayaprakash-2026-02-06_12:54:34-rados-jaya-bs-testing-05-02-2026-distro-default-trial/38186

Actions #94

Updated by Sridhar Seshasayee 26 days ago

/a/skanta-2026-02-22_05:18:48-rados-wip-bharath21-testing-2026-02-20-1039-distro-default-trial/
2 jobs: ['63341', '63352']

Actions #95

Updated by Lee Sanders 26 days ago

/a/skanta-2026-02-07_14:54:11-rados-wip-bharath5-testing-2026-02-06-2052-distro-default-trial/39485

Actions #96

Updated by Jaya Prakash 25 days ago

2 jobs: ['62161', '62022']
/a/yuriw-2026-02-21_00:21:18-rados-wip-yuri3-testing-2026-02-19-1610-tentacle-distro-default-trial

Actions #97

Updated by Laura Flores 15 days ago

  • Related to Bug #75358: Health check failed: Module 'dashboard' has failed: Port 8443 not free added
Actions #98

Updated by Naveen Naidu 15 days ago

/a/yuriw-2026-03-02_18:34:01-rados-wip-yuri3-testing-2026-03-02-1622-distro-default-trial
2 jobs: ['76702', '76564']

Actions #99

Updated by Afreen Misbah 12 days ago

  • Pull request ID changed from 65868 to 67728
Actions #100

Updated by Afreen Misbah 12 days ago

Another attempt to fix this https://github.com/ceph/ceph/pull/67728

Actions #101

Updated by Nitzan Mordechai 11 days ago

/a/yuriw-2026-03-09_20:52:18-rados-wip-rocky10-branch-of-the-day-2026-03-09-1773078259-distro-default-trial/
3 jobs: ['95681', '95352', '95545']

Actions #102

Updated by Afreen Misbah 11 days ago

Here is the PR https://github.com/ceph/ceph/pull/67728 if someone can schedule a QA run , that will be helpful.

Thanks

Actions #103

Updated by Aishwarya Mathuria 10 days ago

/a/skanta-2026-03-07_15:39:05-rados-wip-bharath4-testing-2026-03-05-1456-tentacle-distro-default-trial/93053
/a/skanta-2026-03-07_15:39:05-rados-wip-bharath4-testing-2026-03-05-1456-tentacle-distro-default-trial/93026

Actions #104

Updated by Sridhar Seshasayee 8 days ago

The associated PR is reviewed and marked with 'needs-qa'.

Noting failures here for tracking:
/a/skanta-2026-03-04_23:53:38-rados-wip-bharath1-testing-2026-03-04-1011-distro-default-trial
2 jobs: ['85616', '85637']

Actions #105

Updated by Kamoltat (Junior) Sirivadhna 8 days ago

/a/skanta-2026-02-18_08:32:17-rados-wip-bharath18-testing-2026-02-13-0856-distro-default-trial/
[55827, 55846]

Actions #106

Updated by Jaya Prakash 5 days ago

/a/jayaprakash-2026-03-06_10:20:34-rados-wip-jaya-bs-testing-06-03-2025-distro-default-trial/90378

Actions

Also available in: Atom PDF