Bug #62972
openERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
0%
Description
'ceph API tests' failing on pull requests, ex. https://jenkins.ceph.com/job/ceph-api/62085/
2023-09-24 20:14:07,886.886 INFO:__main__:
2023-09-24 20:14:07,886.886 INFO:__main__:----------------------------------------------------------------------
2023-09-24 20:14:07,886.886 INFO:__main__:Traceback (most recent call last):
2023-09-24 20:14:07,886.886 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
2023-09-24 20:14:07,886.886 INFO:__main__: conn = connection.create_connection(
2023-09-24 20:14:07,886.886 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
2023-09-24 20:14:07,887.887 INFO:__main__: raise err
2023-09-24 20:14:07,887.887 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
2023-09-24 20:14:07,887.887 INFO:__main__: sock.connect(sa)
2023-09-24 20:14:07,887.887 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 590, in connect
2023-09-24 20:14:07,887.887 INFO:__main__: self._internal_connect(address)
2023-09-24 20:14:07,887.887 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 634, in _internal_connect
2023-09-24 20:14:07,887.887 INFO:__main__: raise _SocketError(err, strerror(err))
2023-09-24 20:14:07,887.887 INFO:__main__:ConnectionRefusedError: [Errno 111] Connection refused
2023-09-24 20:14:07,887.887 INFO:__main__:
2023-09-24 20:14:07,887.887 INFO:__main__:During handling of the above exception, another exception occurred:
2023-09-24 20:14:07,887.887 INFO:__main__:
2023-09-24 20:14:07,887.887 INFO:__main__:Traceback (most recent call last):
2023-09-24 20:14:07,887.887 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen
2023-09-24 20:14:07,887.887 INFO:__main__: httplib_response = self._make_request(
2023-09-24 20:14:07,888.888 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 403, in _make_request
2023-09-24 20:14:07,888.888 INFO:__main__: self._validate_conn(conn)
2023-09-24 20:14:07,888.888 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1053, in _validate_conn
2023-09-24 20:14:07,888.888 INFO:__main__: conn.connect()
2023-09-24 20:14:07,888.888 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connection.py", line 363, in connect
2023-09-24 20:14:07,888.888 INFO:__main__: self.sock = conn = self._new_conn()
2023-09-24 20:14:07,888.888 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
2023-09-24 20:14:07,888.888 INFO:__main__: raise NewConnectionError(
2023-09-24 20:14:07,888.888 INFO:__main__:urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 111] Connection refused
2023-09-24 20:14:07,888.888 INFO:__main__:
2023-09-24 20:14:07,888.888 INFO:__main__:During handling of the above exception, another exception occurred:
2023-09-24 20:14:07,888.888 INFO:__main__:
2023-09-24 20:14:07,888.888 INFO:__main__:Traceback (most recent call last):
2023-09-24 20:14:07,888.888 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2023-09-24 20:14:07,889.889 INFO:__main__: resp = conn.urlopen(
2023-09-24 20:14:07,889.889 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen
2023-09-24 20:14:07,889.889 INFO:__main__: retries = retries.increment(
2023-09-24 20:14:07,889.889 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
2023-09-24 20:14:07,889.889 INFO:__main__: raise MaxRetryError(_pool, url, error or ResponseError(cause))
2023-09-24 20:14:07,889.889 INFO:__main__:urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.21.5.33', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2023-09-24 20:14:07,889.889 INFO:__main__:
2023-09-24 20:14:07,889.889 INFO:__main__:During handling of the above exception, another exception occurred:
2023-09-24 20:14:07,889.889 INFO:__main__:
2023-09-24 20:14:07,889.889 INFO:__main__:Traceback (most recent call last):
2023-09-24 20:14:07,889.889 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_mgr_module.py", line 58, in test_list_enabled_module
2023-09-24 20:14:07,889.889 INFO:__main__: data = self._get('/api/mgr/module')
2023-09-24 20:14:07,889.889 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 341, in _get
2023-09-24 20:14:07,889.889 INFO:__main__: return cls._request(url, 'GET', params=params, version=version,
2023-09-24 20:14:07,890.890 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 313, in _request
2023-09-24 20:14:07,890.890 INFO:__main__: cls._resp = cls._session.get(url, params=params, verify=False,
2023-09-24 20:14:07,890.890 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
2023-09-24 20:14:07,890.890 INFO:__main__: return self.request("GET", url, **kwargs)
2023-09-24 20:14:07,890.890 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2023-09-24 20:14:07,890.890 INFO:__main__: resp = self.send(prep, **send_kwargs)
2023-09-24 20:14:07,890.890 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2023-09-24 20:14:07,890.890 INFO:__main__: r = adapter.send(request, **kwargs)
2023-09-24 20:14:07,890.890 INFO:__main__: File "/tmp/tmp.W1S8prJF6l/venv/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
2023-09-24 20:14:07,890.890 INFO:__main__: raise ConnectionError(e, request=request)
2023-09-24 20:14:07,890.890 INFO:__main__:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='172.21.5.33', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2023-09-24 20:14:07,890.890 INFO:__main__:
Cannot find device "ceph-brx"
2023-09-24 20:14:07,906.906 INFO:__main__:
2023-09-24 20:14:07,906.906 INFO:__main__:----------------------------------------------------------------------
2023-09-24 20:14:07,906.906 INFO:__main__:Ran 92 tests in 1113.274s
2023-09-24 20:14:07,907.907 INFO:__main__:
2023-09-24 20:14:07,907.907 INFO:__main__:
Updated by Casey Bodley over 2 years ago
still failing, ex. https://jenkins.ceph.com/job/ceph-api/64336/
Updated by Casey Bodley over 2 years ago
still failing in https://jenkins.ceph.com/job/ceph-api/65558/
Updated by Casey Bodley over 2 years ago
still failing, ex. https://jenkins.ceph.com/job/ceph-api/67358/
Updated by Pedro González Gómez over 2 years ago
- Assignee changed from Laura Paduano to Pedro González Gómez
Updated by Venky Shankar about 2 years ago
Updated by Prashant D about 2 years ago
Updated by Casey Bodley about 2 years ago
Updated by Casey Bodley about 2 years ago
Updated by Casey Bodley about 2 years ago
Updated by Casey Bodley almost 2 years ago
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago · Edited
@Pedro González Gómez
Hi Pedro,
The core team has recently introduced a rotating role called "Watchers."
Each week, a different team member takes on the responsibility of overseeing the RADOS main suite.
Our primary task is to monitor the main suite, track issues,
and assign them to the appropriate individuals to ensure they are resolved promptly.
The ultimate goal is to maintain the suite in optimal condition, keeping it as green as possible.
I've been assigned as the Watcher for this week.
Can you please give us an update on the status of the fix for this tracker? as it has been making our RADOS suite red. Thank you!
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
- Tags set to main-failures
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
- Status changed from New to In Progress
Updated by Nizamudeen A almost 2 years ago
Last time I was looking into the logs of this particular failure and many others I couldn't find an error traceback in the logs on why the IPs are suddenly unavailable. The active mgr logs abruptly stopped like its dead and same with standby. So there was not a lot to investigate. I am not sure how the test suite work but is there a possibility of another test run taking over the current one and just kills the current one? If not, we'll need to check this somehow else...
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-07-05_14:04:08-rados-wip-yuri3-testing-2024-07-01-1610-distro-default-smithi/7788580
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-07-05_14:06:45-rados-wip-yuri12-testing-2024-06-26-0904-distro-default-smithi/7788693
Updated by Casey Bodley almost 2 years ago
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-07-17_13:32:02-rados-wip-yuri12-testing-2024-07-16-1122-distro-default-smithi/7805889
Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2024-07-16_01:05:51-rados-wip-yuri6-testing-2024-07-15-1335-distro-default-smithi/7803122
Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2024-07-17_13:35:08-rados-wip-yuri10-testing-2024-07-15-1330-distro-default-smithi/7805752
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-07-23_19:38:12-rados-wip-yuri5-testing-2024-07-23-0804-distro-default-smithi/7814447
Updated by Nitzan Mordechai almost 2 years ago
/a/yuriw-2024-07-31_14:27:44-rados-wip-yuri7-testing-2024-07-30-0859-distro-default-smithi/7828638
/a/yuriw-2024-07-31_14:27:44-rados-wip-yuri7-testing-2024-07-30-0859-distro-default-smithi/7828626
Updated by Nitzan Mordechai almost 2 years ago
/a/yuriw-2024-08-01_19:44:25-rados-wip-yuri6-testing-2024-07-30-0851-distro-default-smithi/7830360
/a/yuriw-2024-08-01_19:44:25-rados-wip-yuri6-testing-2024-07-30-0851-distro-default-smithi/7830370
Updated by Nitzan Mordechai over 1 year ago
@Pedro González Gómez that issue still occur every run: https://pulpito.ceph.com/nmordech-2024-08-04_12:25:22-rados:dashboard-main-distro-default-smithi/
looks like the issue is with the dashboard module:
2024-08-04T13:34:28.185 INFO:tasks.ceph.mgr.y.smithi167.stderr:2024-08-04T13:34:28.185+0000 7f2f19bda640 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard' while running on mgr.y: Timeout('Port 8443 not free on ::.')
2024-08-04T13:34:28.185 INFO:tasks.ceph.mgr.y.smithi167.stderr:2024-08-04T13:34:28.185+0000 7f2f19bda640 -1 dashboard.serve:
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr:2024-08-04T13:34:28.185+0000 7f2f19bda640 -1 Traceback (most recent call last):
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: File "/usr/share/ceph/mgr/dashboard/module.py", line 602, in serve
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: cherrypy.engine.start()
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 283, in start
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: raise e_info
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 268, in start
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: self.publish('start')
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: File "/usr/lib/python3.9/site-packages/cherrypy/process/wspbus.py", line 248, in publish
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr: raise exc
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr:cherrypy.process.wspbus.ChannelFailures: Timeout('Port 8443 not free on ::.')
2024-08-04T13:34:28.186 INFO:tasks.ceph.mgr.y.smithi167.stderr:
2024-08-04T13:34:28.399 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
2024-08-04T13:34:28.401 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2024-08-04T13:34:33.402 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
2024-08-04T13:34:33.404 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2024-08-04T13:34:38.404 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
2024-08-04T13:34:38.409 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2024-08-04T13:34:43.410 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
2024-08-04T13:34:43.412 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2024-08-04T13:34:48.413 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
2024-08-04T13:34:48.415 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2024-08-04T13:34:53.416 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.15.106:7789/api/mgr/module
looks like there are 2 mgr that trying to get that port, in that case for example: /a/nmordech-2024-08-04_12:25:22-rados:dashboard-main-distro-default-smithi/7835336
the test starts 2 mgr on smithi168, mgr.z getting port 8443, mgr.y will show the error..
but the failure that caused the test to fail is related to restful API (I don't know if they related)
2024-08-04T13:34:55.030 INFO:tasks.cephfs_test_runner:test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ERROR
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:======================================================================
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_NitzanMordhai_ceph_35ddc73a82e6f555dbcbe7f17bf1cf9698efe0b2/qa/tasks/mgr/dashboard/test_mgr_module.py", line 57, in test_list_enabled_module
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: self.wait_until_rest_api_accessible()
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_NitzanMordhai_ceph_35ddc73a82e6f555dbcbe7f17bf1cf9698efe0b2/qa/tasks/mgr/dashboard/test_mgr_module.py", line 32, in wait_until_rest_api_accessible
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: self.wait_until_true(_check_connection, timeout=30)
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_NitzanMordhai_ceph_35ddc73a82e6f555dbcbe7f17bf1cf9698efe0b2/qa/tasks/ceph_test_case.py", line 339, in wait_until_true
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
2024-08-04T13:34:55.031 INFO:tasks.cephfs_test_runner:
2024-08-04T13:34:55.032 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-08-04T13:34:55.032 INFO:tasks.cephfs_test_runner:Ran 90 tests in 1385.919s
2024-08-04T13:34:55.032 INFO:tasks.cephfs_test_runner:
2024-08-04T13:34:55.032 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
from mgr.x the restful didn't even started: [restful WARNING root] server not running: no certificate configured
Updated by Casey Bodley over 1 year ago
Updated by Nitzan Mordechai over 1 year ago
/a/yuriw-2024-08-14_14:17:10-rados-wip-yuri-testing-2024-08-13-0839-distro-default-smithi/7855302/
/a/yuriw-2024-08-14_14:17:10-rados-wip-yuri-testing-2024-08-13-0839-distro-default-smithi/7855070/
Updated by Casey Bodley over 1 year ago
Updated by Aishwarya Mathuria over 1 year ago
/a/teuthology-2024-08-25_20:00:17-rados-main-distro-default-smithi/7871753
/a/teuthology-2024-08-25_20:00:17-rados-main-distro-default-smithi/7871587
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-28_23:20:36-rados-wip-yuri4-testing-2024-08-28-1359-distro-default-smithi/7879407
Updated by Nizamudeen A over 1 year ago · Edited
Opened a PR to debug and fix it: https://github.com/ceph/ceph/pull/59530
Updated by Nitzan Mordechai over 1 year ago
Nizamudeen A wrote in #note-31:
Opened a PR to debug and fix it: https://github.com/ceph/ceph/pull/59530
@Nizamudeen A i also checked those logs few days ago, it looks like the exception is actually because of few managers that using the same port, the MgrModuleTest class doesn't have SetUp and doesn't call the _assign_port that supposed to assign unique port for each manager.
when you have that in place, that test complete ok, but then there are few more failures
Updated by Nizamudeen A over 1 year ago
@Nitzan Mordechai with the fix that we provided, the issue was not reproducible at all in any of the runs that we did. And to your question I think @Ernesto Puerta should be the best person to answer it correctly than me but for me the issue mostly seemed like the tests getting stopped before it finishes the mgr module restart. btw I merged my PR so I'll mark this for backport. If the error is still reproducible then we can look more.
Updated by Nizamudeen A over 1 year ago
- Status changed from In Progress to Pending Backport
- Pull request ID set to 59530
Updated by Nizamudeen A over 1 year ago
- Backport changed from squid to squid, reef, quincy
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67900: reef: ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67901: squid: ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67902: quincy: ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Nitzan Mordechai over 1 year ago
Nizamudeen A wrote in #note-33:
@Nitzan Mordechai with the fix that we provided, the issue was not reproducible at all in any of the runs that we did. And to your question I think @Ernesto Puerta should be the best person to answer it correctly than me but for me the issue mostly seemed like the tests getting stopped before it finishes the mgr module restart. btw I merged my PR so I'll mark this for backport. If the error is still reproducible then we can look more.
@Nizamudeen A thank you for the quick fix! let me run some teuthology tests and i'll let you know! @Laura Flores FYI
Updated by Aishwarya Mathuria over 1 year ago
/a/skanta-2024-08-31_23:59:00-rados-wip-bharath2-testing-2024-08-31-2129-distro-default-smithi/7884011/
/a/skanta-2024-08-31_23:59:00-rados-wip-bharath2-testing-2024-08-31-2129-distro-default-smithi/7884184
Updated by Ernesto Puerta over 1 year ago
Nitzan Mordechai wrote in #note-32:
@Nizamudeen A i also checked those logs few days ago, it looks like the exception is actually because of few managers that using the same port, the MgrModuleTest class doesn't have SetUp and doesn't call the _assign_port that supposed to assign unique port for each manager.
when you have that in place, that test complete ok, but then there are few more failures
Nitzan, the MgrModuleTestCase inherits from DashboardTestCase which in its setUpClass includes:
@classmethod
def setUpClass(cls):
super(DashboardTestCase, cls).setUpClass()
cls._assign_ports("dashboard", "ssl_server_port")
cls._load_module("dashboard")
I've seen that in other test classes (mgr/test_prometheus, mgr/test_selftest) they put that inside a setUp method instead. Another difference is that MgrModuleTestCase defines MGRS_REQUIRED = 1, while DashboardTestCaset sets that to 2.
That said, this is not constantly failing, so could it be that some other vstart/dashboard instance was running at the same time as this test?
We could try to make _assign_ports smarter and check in advance if a port is available or not, instead of waiting for the module start/reload, and even dynamically reallocate ports if they're not free.
Updated by Aishwarya Mathuria over 1 year ago
/a/skanta-2024-09-10_15:30:27-rados-wip-bharath7-testing-2024-09-10-1409-distro-default-smithi/7899425
/a/skanta-2024-09-10_15:30:27-rados-wip-bharath7-testing-2024-09-10-1409-distro-default-smithi/7899256
Updated by Nitzan Mordechai over 1 year ago
Ernesto Puerta wrote in #note-43:
Nitzan Mordechai wrote in #note-32:
@Nizamudeen A i also checked those logs few days ago, it looks like the exception is actually because of few managers that using the same port, the MgrModuleTest class doesn't have SetUp and doesn't call the _assign_port that supposed to assign unique port for each manager.
when you have that in place, that test complete ok, but then there are few more failures
Nitzan, the
MgrModuleTestCaseinherits fromDashboardTestCasewhich in itssetUpClassincludes:
[...]I've seen that in other test classes (mgr/test_prometheus, mgr/test_selftest) they put that inside a
setUpmethod instead. Another difference is thatMgrModuleTestCasedefinesMGRS_REQUIRED = 1, whileDashboardTestCasetsets that to 2.That said, this is not constantly failing, so could it be that some other vstart/dashboard instance was running at the same time as this test?
We could try to make
_assign_portssmarter and check in advance if a port is available or not, instead of waiting for the module start/reload, and even dynamically reallocate ports if they're not free.
actually, it does constantly fail when you have more then 1 mgr running, it will always assign the same port (def. one) for all the mgr (ssl_server_port). and if _assign_ports was called, it would have been assign the correct port for each mgr.
i agree that _assign_ports should be more smarter and check in advance! if you need any more calibration in that area, I would be glad to help!
thanks for checking this!
Updated by Casey Bodley over 1 year ago
still failing on main https://jenkins.ceph.com/job/ceph-api/82767/
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-15_14:06:51-rados-wip-yuri8-testing-2024-10-14-1103-distro-default-smithi/7948101
Updated by Laura Flores over 1 year ago
/a/hyelloji-2024-09-10_17:16:14-rados-wip-hemanth-testing-2024-09-10-0723-distro-default-smithi/7900050
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-10-13_19:06:13-rados-wip-yuri4-testing-2024-10-13-0836-distro-default-smithi/7945031
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-23_23:17:32-rados-wip-yuri13-testing-2024-10-23-0743-distro-default-smithi/7963674
Updated by Nizamudeen A over 1 year ago · Edited
Merged one more PR per https://tracker.ceph.com/issues/62972#note-32.
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-16_14:57:58-rados-wip-yuri2-testing-2024-10-15-0703-distro-default-smithi/7950878
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-11-13_00:17:56-rados-wip-yuri6-testing-2024-11-12-1317-distro-default-smithi/7992352
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-11-20_16:10:40-rados-wip-yuri2-testing-2024-11-15-0902-distro-default-smithi/8001614
Updated by Casey Bodley over 1 year ago
- Status changed from Pending Backport to New
@Pedro González Gómez since this is still failing regularly, i've moved the status back to New
https://jenkins.ceph.com/job/ceph-api/86139/
Traceback (most recent call last):
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
ret = func(*args, **kwargs)
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
return func(*vpath, **params)
File "/home/jenkins-build/build/workspace/ceph-api/src/pybind/mgr/dashboard/controllers/orchestrator.py", line 22, in _inner
raise DashboardException(code='orchestrator_status_unavailable', # pragma: no cover
dashboard.exceptions.DashboardException: Orchestrator is unavailable
2024-12-09 13:59:09,353.353 INFO:__main__:----------------------------------------------------------------------
2024-12-09 13:59:09,353.353 INFO:__main__:Traceback (most recent call last):
2024-12-09 13:59:09,353.353 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connection.py", line 199, in _new_conn
2024-12-09 13:59:09,353.353 INFO:__main__: sock = connection.create_connection(
2024-12-09 13:59:09,353.353 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
2024-12-09 13:59:09,353.353 INFO:__main__: raise err
2024-12-09 13:59:09,353.353 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
2024-12-09 13:59:09,353.353 INFO:__main__: sock.connect(sa)
2024-12-09 13:59:09,353.353 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 586, in connect
2024-12-09 13:59:09,353.353 INFO:__main__: self._internal_connect(address)
2024-12-09 13:59:09,354.354 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 630, in _internal_connect
2024-12-09 13:59:09,354.354 INFO:__main__: raise _SocketError(err, strerror(err))
2024-12-09 13:59:09,354.354 INFO:__main__:ConnectionRefusedError: [Errno 111] Connection refused
2024-12-09 13:59:09,354.354 INFO:__main__:
2024-12-09 13:59:09,354.354 INFO:__main__:The above exception was the direct cause of the following exception:
2024-12-09 13:59:09,354.354 INFO:__main__:
2024-12-09 13:59:09,354.354 INFO:__main__:Traceback (most recent call last):
2024-12-09 13:59:09,354.354 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
2024-12-09 13:59:09,354.354 INFO:__main__: response = self._make_request(
2024-12-09 13:59:09,354.354 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
2024-12-09 13:59:09,354.354 INFO:__main__: raise new_e
2024-12-09 13:59:09,355.355 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
2024-12-09 13:59:09,355.355 INFO:__main__: self._validate_conn(conn)
2024-12-09 13:59:09,355.355 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
2024-12-09 13:59:09,355.355 INFO:__main__: conn.connect()
2024-12-09 13:59:09,355.355 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connection.py", line 693, in connect
2024-12-09 13:59:09,355.355 INFO:__main__: self.sock = sock = self._new_conn()
2024-12-09 13:59:09,355.355 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connection.py", line 214, in _new_conn
2024-12-09 13:59:09,355.355 INFO:__main__: raise NewConnectionError(
2024-12-09 13:59:09,355.355 INFO:__main__:urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 111] Connection refused
2024-12-09 13:59:09,355.355 INFO:__main__:
2024-12-09 13:59:09,355.355 INFO:__main__:The above exception was the direct cause of the following exception:
2024-12-09 13:59:09,355.355 INFO:__main__:
2024-12-09 13:59:09,356.356 INFO:__main__:Traceback (most recent call last):
2024-12-09 13:59:09,356.356 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
2024-12-09 13:59:09,356.356 INFO:__main__: resp = conn.urlopen(
2024-12-09 13:59:09,356.356 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
2024-12-09 13:59:09,356.356 INFO:__main__: retries = retries.increment(
2024-12-09 13:59:09,356.356 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
2024-12-09 13:59:09,356.356 INFO:__main__: raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
2024-12-09 13:59:09,356.356 INFO:__main__:urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.21.5.38', port=7790): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2024-12-09 13:59:09,356.356 INFO:__main__:
2024-12-09 13:59:09,356.356 INFO:__main__:During handling of the above exception, another exception occurred:
2024-12-09 13:59:09,356.356 INFO:__main__:
2024-12-09 13:59:09,357.357 INFO:__main__:Traceback (most recent call last):
2024-12-09 13:59:09,357.357 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_mgr_module.py", line 62, in test_list_enabled_module
2024-12-09 13:59:09,357.357 INFO:__main__: data = self._get('/api/mgr/module')
2024-12-09 13:59:09,357.357 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 340, in _get
2024-12-09 13:59:09,357.357 INFO:__main__: return cls._request(url, 'GET', params=params, version=version,
2024-12-09 13:59:09,357.357 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 312, in _request
2024-12-09 13:59:09,357.357 INFO:__main__: cls._resp = cls._session.get(url, params=params, verify=False,
2024-12-09 13:59:09,357.357 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
2024-12-09 13:59:09,357.357 INFO:__main__: return self.request("GET", url, **kwargs)
2024-12-09 13:59:09,357.357 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2024-12-09 13:59:09,357.357 INFO:__main__: resp = self.send(prep, **send_kwargs)
2024-12-09 13:59:09,358.358 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2024-12-09 13:59:09,358.358 INFO:__main__: r = adapter.send(request, **kwargs)
2024-12-09 13:59:09,358.358 INFO:__main__: File "/tmp/tmp.28xa596zTM/venv/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
2024-12-09 13:59:09,358.358 INFO:__main__: raise ConnectionError(e, request=request)
2024-12-09 13:59:09,358.358 INFO:__main__:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='172.21.5.38', port=7790): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2024-12-09 13:59:09,358.358 INFO:__main__:
2024-12-09 13:59:09,358.358 INFO:__main__:> ip netns list
2024-12-09 13:59:09,363.363 INFO:__main__:> sudo ip link delete ceph-brx
Cannot find device "ceph-brx"
2024-12-09 13:59:09,380.380 INFO:__main__:
2024-12-09 13:59:09,380.380 INFO:__main__:----------------------------------------------------------------------
2024-12-09 13:59:09,381.381 INFO:__main__:Ran 95 tests in 1462.079s
2024-12-09 13:59:09,381.381 INFO:__main__:
2024-12-09 13:59:09,381.381 INFO:__main__:
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-12-03_16:16:51-rados-wip-yuri6-testing-2024-12-02-1528-distro-default-smithi/8018963
Updated by Casey Bodley over 1 year ago
Updated by Casey Bodley over 1 year ago
Updated by Casey Bodley over 1 year ago
Updated by Shraddha Agrawal over 1 year ago
/a/skanta-2024-10-24_23:59:35-rados-wip-bharath3-testing-2024-10-23-1509-distro-default-smithi/7965779
Updated by Aishwarya Mathuria over 1 year ago
/a/skanta-2024-12-05_07:34:09-rados-wip-bharath2-testing-2024-12-04-1214-distro-default-smithi/8021705
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-12-18_15:56:21-rados-wip-yuri6-testing-2024-12-17-1653-distro-default-smithi/8043316
Updated by Naveen Naidu over 1 year ago
/a/skanta-2024-12-11_23:59:30-rados-wip-bharath9-testing-2024-12-10-1652-distro-default-smithi/
2 jobs: [8031148, 8031124]
Updated by Laura Flores over 1 year ago
- Has duplicate Bug #69569: tasks.cephfs_test_runner:ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) added
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-11-18_15:14:17-rados-wip-yuri3-testing-2024-11-14-0857-distro-default-smithi/7997970
Updated by Shraddha Agrawal over 1 year ago
/a/skanta-2024-12-26_01:49:37-rados-wip-bharath12-testing-2024-12-24-0842-distro-default-smithi/8053514
Updated by Naveen Naidu over 1 year ago
/a/yuriw-2025-01-18_14:21:47-rados-wip-yuri3-testing-2025-01-16-1509-distro-default-smithi/
2 jobs: ['8082252', '8082268']
Updated by Aishwarya Mathuria over 1 year ago
/a/skanta-2025-01-26_07:44:24-rados-wip-bharath9-testing-2025-01-25-0527-distro-default-smithi
'8094209', '8094192'
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-01-31_15:46:33-rados-wip-yuri5-testing-2025-01-30-1311-distro-default-smithi/8107084
Updated by Adam Kupczyk about 1 year ago
/a/akupczyk-2025-02-03_16:48:13-rados-aclamk-testing-nauvoo-2025-01-29-1806-b-distro-default-smithi/8112082
Updated by Shraddha Agrawal about 1 year ago
/a/skanta-2025-01-28_15:23:42-rados-wip-bharath11-testing-2025-01-27-1602-distro-default-smithi/8098982
Updated by Shraddha Agrawal about 1 year ago
/a/skanta-2025-02-05_10:08:28-rados-wip-bharath3-testing-2025-02-03-2127-distro-default-smithi/8115715
Updated by Bill Scales about 1 year ago
https://jenkins.ceph.com/job/ceph-api/89095/ on main
I think I know what is causing the test to fail - the symptom is that test_list_enabled in tasks.mgr.dashboard.test_mgr_module.MgrModuleTest is failing with connection refused when trying to use the dashboard API.
2025-02-09 16:44:41,466.466 INFO:__main__:ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
At exactly the same time as this test ran and the connections were refused I saw this log in all 3 mgr's (x,y and z):
2025-02-09T16:44:00.314+0000 7fc37441d640 0 [dashboard INFO root] Engine started... 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr handle_mgr_map respawning because set of enabled modules changed! 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn e: './bin/ceph-mgr' 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn 0: './bin/ceph-mgr' 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn 1: '-i' 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn 2: 'x' 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn 3: '-f' 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn respawning with exe /home/jenkins-build/build/workspace/ceph-api/build/bin/ceph-mgr 2025-02-09T16:44:39.386+0000 7fc454adb640 1 mgr respawn exe_path /proc/self/exe 2025-02-09T16:44:39.606+0000 7f4c9213f240 -1 WARNING: all dangerous and experimental features are enabled. 2025-02-09T16:44:39.626+0000 7f4c9213f240 -1 WARNING: all dangerous and experimental features are enabled. 2025-02-09T16:44:39.626+0000 7f4c9213f240 0 ceph version Development (no_version) squid (dev), process ceph-mgr, pid 4028432 2025-02-09T16:44:39.626+0000 7f4c9213f240 -1 WARNING: all dangerous and experimental features are enabled. 2025-02-09T16:44:39.678+0000 7f4c9213f240 1 mgr[py] Loading python module 'alerts' 2025-02-09T16:44:39.790+0000 7f4c9213f240 1 mgr[py] Loading python module 'balancer' 2025-02-09T16:44:39.866+0000 7f4c9213f240 1 mgr[py] Loading python module 'cephadm' 2025-02-09T16:44:40.386+0000 7f4c9213f240 1 mgr[py] Loading python module 'cli_api' 2025-02-09T16:44:40.466+0000 7f4c9213f240 1 mgr[py] Loading python module 'crash' 2025-02-09T16:44:40.578+0000 7f4c9213f240 1 mgr[py] Loading python module 'dashboard' 2025-02-09T16:44:41.306+0000 7f4c9213f240 1 mgr[py] Loading python module 'devicehealth' 2025-02-09T16:44:41.382+0000 7f4c9213f240 1 mgr[py] Loading python module 'diskprediction_local' 2025-02-09T16:44:41.682+0000 7f4c9213f240 1 mgr[py] Loading python module 'feedback'
So the connection is being refused because the mgr is restarting. Looking slightly earlier in the test log there is a test that is disabling a module:
2025-02-09 16:44:13,450.450 INFO:__main__:> ./bin/ceph log 'Starting test tasks.mgr.dashboard.test_mgr_module.MgrModuleTest.test_disable' 2025-02-09 16:44:15,020.020 INFO:__main__:> ./bin/ceph config dump 2025-02-09 16:44:15,708.708 INFO:__main__:> ./bin/ceph health --format=json 2025-02-09 16:44:16,563.563 DEBUG:tasks.ceph_test_case:wait_until_true: success in 0s and 0 retries 2025-02-09 16:44:16,563.563 DEBUG:tasks.mgr.dashboard.helper:Request POST to https://172.21.3.227:7790/api/mgr/module/iostat/disable 2025-02-09 16:44:17,220.220 INFO:__main__:> ./bin/ceph config reset 132 2025-02-09 16:44:17,953.953 INFO:__main__:> ./bin/ceph log 'Ended test tasks.mgr.dashboard.test_mgr_module.MgrModuleTest.test_disable'
So I think the mgr is restarting itself asynchronously as a result of this prior test and if we are unlucky and try and issue a later test while the restart is in progress we get this type of failure. The tests test_list_disabled_module and test_list_enabled_module have a call to wait_until_rest_api_available which tries to make sure the REST interface is available before running the test, but this doesn't guard against the timing window where this call passes and the mgr then asynchronously restarts just as the main part of the testcase runs.
I had a quick look at the teuthology failure in the comment above and saw the same issue with the mgr respawning just as the API command was attempted. Maybe we need to try and make test_disable and test_module_enable wait for the mgr restart to occur?
Updated by Nizamudeen A about 1 year ago
@Bill Scales We did something initially but that was just adding more retries to this. Maybe the module is taking longer to reload and retry we have is not enough. Either more retires or as you said we have to explicitly check if the mgr modules are restarted somehow. But thanks for looking at it. Will raise a PR. Is there anything I can rely on to know the mgr is restarting or restarted?
Updated by Bill Scales about 1 year ago
I don't think there is a easy way to detect that the mgr has restarted. Something like a sleep 15 after disabling/enabling a module might make the test more reliable but won't completely eliminate the timing window.
However I think you could change the test to be more tolerant of the mgr restarting. Currently the test case does:
1. wait_until_rest_api_accessible() which loops making a REST API call catching connection failure exceptions until the connection is successful or a 30 second timeout is hit
2. make REST API call for test
This fails when the mgr inconveniently restarts at step 2
Perhaps you should restructure it as:
1. make REST API call for test, catch connection failure exception
2. If failure then call wait_until_rest_api_accessible() and then have a 2nd attempt at the REST API call
That should make the test survive one restart of the mgr, so long as the mgr restarts within 30 seconds. If the mgr takes longer than this to restart, or the mgr restarts more than once then the test will fail - which is fine as these are genuine failures.
Updated by Nizamudeen A about 1 year ago
Thank you Bill, opened a PR based on your comment, https://github.com/ceph/ceph/pull/61744. can you take a look? thanks
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-02-05_21:36:43-rados-wip-yuri8-testing-2025-02-04-1046-distro-default-smithi/8117377
Updated by Aishwarya Mathuria about 1 year ago
/a/lflores-2025-02-07_20:42:43-rados-wip-yuri2-testing-2025-01-31-2325-distro-default-smithi/8120806
Updated by Konstantin Shalygin about 1 year ago
- Backport changed from squid, reef, quincy to squid, reef
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
/a/yuriw-2025-02-07_15:56:23-rados-wip-yuri6-testing-2025-02-04-1046-distro-default-smithi/
8120709
8120725
Updated by Nitzan Mordechai about 1 year ago
/a/yuriw-2025-02-19_00:50:37-rados-wip-yuri5-testing-2025-02-18-1219-distro-default-smithi/8138826
/a/yuriw-2025-02-19_00:50:37-rados-wip-yuri5-testing-2025-02-18-1219-distro-default-smithi/8138983
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-02-21_18:09:35-rados-wip-pdonnell-testing-20250218.200348-debug-distro-default-smithi/8147402
Updated by Jaya Prakash about 1 year ago
/a/akupczyk-2025-02-20_15:14:45-rados-aclamk-testing-ganymede-2025-02-20-0826-distro-default-smithi/8143768
Updated by Laura Flores about 1 year ago
/a/skanta-2025-03-01_01:42:21-rados-wip-bharath3-testing-2025-03-01-0356-distro-default-smithi/8162180
Updated by Nizamudeen A about 1 year ago
- Category changed from Cluster to Build, CI, Dependencies & Tools
- Status changed from New to Pending Backport
one more fix merged for this one. https://github.com/ceph/ceph/pull/61744. Probably the newer run that includes this fix may not encounter this issue. Let's see.
Updated by Laura Flores about 1 year ago
/a/skanta-2025-03-02_12:28:27-rados-wip-bharath8-testing-2025-03-02-0552-distro-default-smithi/8164495
Updated by Nizamudeen A about 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Nitzan Mordechai about 1 year ago
/a/skanta-2025-03-13_23:19:54-rados-wip-bharath8-testing-2025-03-13-0557-distro-default-smithi/8187959
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
/a/skanta-2025-03-12_23:26:26-rados-wip-bharath4-testing-2025-03-12-0528-distro-default-smithi/8185350/
Updated by Naveen Naidu about 1 year ago
/a/skanta-2025-03-12_11:15:12-rados-wip-bharath3-testing-2025-03-12-0501-distro-default-smithi/8183819
/a/skanta-2025-03-12_11:15:12-rados-wip-bharath3-testing-2025-03-12-0501-distro-default-smithi/8183969
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-03-14_20:32:49-rados-wip-yuri7-testing-2025-03-11-0847-distro-default-smithi/8190537
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-03-14_20:21:57-rados-wip-yuri13-testing-2025-03-14-0922-distro-default-smithi/8189940
If one more occurrence, let's raise a new bug.
Updated by Laura Flores about 1 year ago
- Related to Bug #70669: ERROR: test_list_enabled_module: cephfs resource temporarily unavailable added
Updated by Laura Flores about 1 year ago
Hey @Pedro González Gómez I added a new tracker here since the issue is still occurring: https://tracker.ceph.com/issues/70669
Updated by Jaya Prakash about 1 year ago
/a/yuriw-2025-03-22_14:06:08-rados-wip-yuri2-testing-2025-03-21-0820-distro-default-smithi
2 jobs: ['8202560', '8202575']
Updated by Sridhar Seshasayee about 1 year ago
/a/skanta-2025-03-27_08:02:07-rados-wip-bharath10-testing-2025-03-27-0430-distro-default-smithi/8212809
/a/skanta-2025-03-27_08:02:07-rados-wip-bharath10-testing-2025-03-27-0430-distro-default-smithi/8212946
Updated by Sridhar Seshasayee about 1 year ago
/a/yuriw-2025-04-07_23:32:09-rados-wip-yuri13-testing-2025-04-07-1144-distro-default-smithi/8229267
/a/yuriw-2025-04-07_23:32:09-rados-wip-yuri13-testing-2025-04-07-1144-distro-default-smithi/8229504
Updated by Kamoltat (Junior) Sirivadhna about 1 year ago
/a/lflores-2025-04-14_22:30:34-rados-wip-lflores-testing-2025-04-10-2245-distro-default-gibba/
8241123
8241114
Updated by Sridhar Seshasayee about 1 year ago
/a/skanta-2025-04-22_23:21:15-rados-wip-bharath1-testing-2025-04-21-0529-distro-default-smithi/
[8254486, 8254504]
Updated by Aishwarya Mathuria about 1 year ago
/a/yuriw-2025-04-14_18:07:07-rados-wip-yuri10-testing-2025-04-08-0710-distro-default-smithi/
['8239953', '8239967']
Updated by Aishwarya Mathuria 12 months ago
/a/yuriw-2025-05-12_19:14:14-rados-wip-yuri10-testing-2025-05-12-0753-distro-default-smithi/8281641
Updated by Naveen Naidu 11 months ago
/a/yuriw-2025-05-06_17:52:35-rados-wip-yuri2-testing-2025-05-06-0729-tentacle-distro-default-smithi/8273412
/a/yuriw-2025-05-06_17:52:35-rados-wip-yuri2-testing-2025-05-06-0729-tentacle-distro-default-smithi/8273658
Updated by Aishwarya Mathuria 11 months ago
/a/skanta-2025-06-07_23:26:52-rados-wip-bharath5-testing-2025-06-02-2047-distro-default-smithi/8313579
Updated by Shraddha Agrawal 11 months ago
- Backport changed from squid, reef to squid, reef, tentacle
/a/yuriw-2025-05-20_14:56:40-rados-wip-yuri3-testing-2025-05-12-0801-tentacle-distro-default-smithi/8290621
Updated by Laura Flores 11 months ago
/a/skanta-2025-06-12_09:21:16-rados-wip-bharath2-testing-2025-06-10-0545-distro-default-smithi/8324761
Updated by Lee Sanders 10 months ago
/a/ljsanders-2025-07-01_17:21:46-rados-main-distro-default-smithi/8364093
Updated by Lee Sanders 10 months ago
/a/skanta-2025-07-05_06:21:05-rados-wip-bharath15-testing-2025-07-04-1752-distro-default-smithi/8370790
Updated by Upkeep Bot 10 months ago
- Merge Commit set to 93ba7b05d0000a0b41f91c6b2df07f167b144a6d
- Fixed In set to v19.3.0-4700-g93ba7b05d00
- Upkeep Timestamp set to 2025-07-11T21:38:29+00:00
Updated by Upkeep Bot 10 months ago
- Fixed In changed from v19.3.0-4700-g93ba7b05d00 to v19.3.0-4700-g93ba7b05d0
- Upkeep Timestamp changed from 2025-07-11T21:38:29+00:00 to 2025-07-14T23:39:58+00:00
Updated by Aishwarya Mathuria 10 months ago
Old run from before the fix went in -
/a/skanta-2025-06-29_15:00:39-rados-wip-bharath1-testing-2025-06-28-2149-distro-default-smithi
['8356808', '8356818']
Updated by Naveen Naidu 9 months ago
/a/skanta-2025-07-16_22:52:51-rados-wip-bharath8-testing-2025-07-16-1943-distro-default-smithi
2 jobs: ['8392193', '8392054']
Updated by Laura Flores 9 months ago
/a/yuriw-2025-08-14_23:11:43-rados-wip-yuri3-testing-2025-08-14-0737-tentacle-distro-default-smithi/8443837
Updated by Aishwarya Mathuria 9 months ago
/a/skanta-2025-08-14_20:27:05-rados-wip-bharath5-testing-2025-08-13-0959-distro-default-smithi/8443381
/a/skanta-2025-08-14_20:27:05-rados-wip-bharath5-testing-2025-08-13-0959-distro-default-smithi/8443396
Updated by Naveen Naidu 8 months ago
(Ping from watcher)
@Pedro González Gómez Can you please take a look at this, this issue seems to start coming up again in the runs.
Updated by Jonathan Bailey 8 months ago
/a/skanta-2025-08-05_10:12:24-rados-wip-bharath1-testing-2025-08-05-0512-distro-default-smithi/8424881
Updated by Nitzan Mordechai 8 months ago
@Nizamudeen A can you please take a look? It seems the issue occurred again after the fix was merged; maybe a new issue is needed?
Updated by Upkeep Bot 6 months ago
- Released In set to v20.2.0~2093
- Upkeep Timestamp changed from 2025-07-14T23:39:58+00:00 to 2025-11-01T01:34:22+00:00
Updated by Aishwarya Mathuria 6 months ago · Edited
@Pedro González Gómez we are still seeing this in the main run : /a/teuthology-2025-11-16_20:00:21-rados-main-distro-default-smithi/8605459
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner: resp = conn.urlopen(
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner: retries = retries.increment(
2025-11-16T22:06:06.651 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
2025-11-16T22:06:06.652 INFO:tasks.cephfs_test_runner: raise MaxRetryError(_pool, url, error or ResponseError(cause))
2025-11-16T22:06:06.652 INFO:tasks.cephfs_test_runner:urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.21.15.103', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa701212da0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2025-11-16T22:06:06.652 INFO:tasks.cephfs_test_runner:
2025-11-16T22:06:06.652 INFO:tasks.cephfs_test_runner:During handling of the above exception, another exception occurred:
2025-11-16T22:06:06.652 INFO:tasks.cephfs_test_runner:
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph_ef82492fb70b7cfb6dcb295c0e9ee1ecd6827e26/qa/tasks/mgr/dashboard/helper.py", line 345, in _get
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner: return cls._request(url, 'GET', params=params, version=version,
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph_ef82492fb70b7cfb6dcb295c0e9ee1ecd6827e26/qa/tasks/mgr/dashboard/helper.py", line 314, in _request
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner: cls._resp = cls._session.get(url, params=params, verify=False,
2025-11-16T22:06:06.653 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: return self.request("GET", url, **kwargs)
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: resp = self.send(prep, **send_kwargs)
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: r = adapter.send(request, **kwargs)
2025-11-16T22:06:06.654 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_50678303b340fce2ba64594e9c10a898221fe65f/virtualenv/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
2025-11-16T22:06:06.655 INFO:tasks.cephfs_test_runner: raise ConnectionError(e, request=request)
2025-11-16T22:06:06.655 INFO:tasks.cephfs_test_runner:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='172.21.15.103', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa701212da0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Updated by Sridhar Seshasayee 6 months ago
/a/skanta-2025-11-13_10:26:04-rados-wip-bharath3-testing-2025-11-12-2038-distro-default-smithi/8601377
Updated by Kamoltat (Junior) Sirivadhna 6 months ago
/a/skanta-2025-11-01_02:37:10-rados-wip-bharath4-testing-2025-10-31-1459-distro-default-smithi/8578623
Updated by Naveen Naidu 5 months ago
/a/skanta-2025-11-21_10:17:34-rados-wip-bharath11-testing-2025-11-21-0531-distro-default-smithi
2 jobs: ['8617725', '8617863']
Updated by Kamoltat (Junior) Sirivadhna 5 months ago · Edited
- Status changed from Resolved to Need More Info
Rados Suite Watcher: moving to need more info, pinging the assignee on slack
Updated by Upkeep Bot 5 months ago
- Status changed from Need More Info to Pending Backport
- Upkeep Timestamp changed from 2025-11-01T01:34:22+00:00 to 2025-12-10T00:58:26+00:00
Updated by Sridhar Seshasayee 2 months ago
/a/sseshasa-2026-02-26_14:56:45-rados-wip-sseshasa-testing-2026-02-26-1772100687-distro-default-trial/
['72284', '72423']
Updated by Naveen Naidu about 1 month ago
/a/skanta-2026-01-30_23:12:23-rados-wip-bharath10-testing-2026-01-30-1531-distro-default-trial
2 jobs: ['28449', '28311']
Updated by Naveen Naidu about 1 month ago
/a/skanta-2026-03-03_23:53:54-rados-wip-bharath6-testing-2026-03-03-1755-distro-default-trial
2 jobs: ['79576', '79714']
Updated by Naveen Naidu 29 days ago
/a/skanta-2026-04-01_10:28:51-rados-wip-bharath10-testing-2026-04-01-1346-distro-default-trial
2 jobs: ['129048', '128896']
Updated by Connor Fawcett 4 days ago
/a/skanta-2026-03-31_12:00:11-rados-wip-bharath12-testing-2026-03-30-1431-distro-default-trial/127724