Bug #69899
openTest failure: test_exports_on_mgr_restart (tasks.cephfs.test_nfs.TestNFS)
0%
Description
2025-01-16T17:59:38.202 INFO:tasks.cephfs_test_runner:Test export availability on restarting mgr. ... ERROR
2025-01-16T17:59:38.203 INFO:tasks.cephfs_test_runner:
2025-01-16T17:59:38.203 INFO:tasks.cephfs_test_runner:======================================================================
2025-01-16T17:59:38.203 INFO:tasks.cephfs_test_runner:ERROR: test_exports_on_mgr_restart (tasks.cephfs.test_nfs.TestNFS)
2025-01-16T17:59:38.203 INFO:tasks.cephfs_test_runner:Test export availability on restarting mgr.
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/cephfs/test_nfs.py", line 503, in test_exports_on_mgr_restart
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: self._load_module("cephadm")
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/mgr/mgr_test_case.py", line 176, in _load_module
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: cls.wait_until_true(has_restarted, timeout=30)
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/ceph_test_case.py", line 347, in wait_until_true
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:Ran 16 tests in 588.660s
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:======================================================================
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:ERROR: test_exports_on_mgr_restart (tasks.cephfs.test_nfs.TestNFS)
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:Test export availability on restarting mgr.
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/cephfs/test_nfs.py", line 503, in test_exports_on_mgr_restart
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner: self._load_module("cephadm")
2025-01-16T17:59:38.205 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/mgr/mgr_test_case.py", line 176, in _load_module
2025-01-16T17:59:38.206 INFO:tasks.cephfs_test_runner: cls.wait_until_true(has_restarted, timeout=30)
2025-01-16T17:59:38.206 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/ceph_test_case.py", line 347, in wait_until_true
2025-01-16T17:59:38.206 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2025-01-16T17:59:38.206 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
2025-01-16T17:59:38.206 INFO:tasks.cephfs_test_runner:
2025-01-16T17:59:38.206 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_42294faf23294f557d349df1bd69657772519729/teuthology/run_tasks.py", line 112, in run_tasks
manager.__enter__()
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/cephfs_test_runner.py", line 211, in task
raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_exports_on_mgr_restart (tasks.cephfs.test_nfs.TestNFS)
2025-01-16T17:59:38.381 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=8b59cad8622645adb9c9e500629b397e
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_42294faf23294f557d349df1bd69657772519729/teuthology/run_tasks.py", line 112, in run_tasks
manager.__enter__()
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_35aa12e8113161a135a96fd9b5a767dd61a532c4/qa/tasks/cephfs_test_runner.py", line 211, in task
raise RuntimeError("Test failure: {0}".format(", ".join(bad_tests)))
RuntimeError: Test failure: test_exports_on_mgr_restart (tasks.cephfs.test_nfs.TestNFS)
Updated by Dhairya Parmar about 1 year ago · Edited
cephadm module failed to load in time, have not faced it before, what PRs were part of wip-hemanth1-testing-2025-01-13-1558-reef? looks like an intermittent failure.
Updated by Nitzan Mordechai about 1 year ago
/a/yuriw-2025-02-12_15:34:28-orch-wip-yuri4-testing-2025-02-08-0943-reef-distro-default-smithi/8126631
Updated by Venky Shankar about 1 year ago
- Description updated (diff)
@Dhairya Parmar - Could you please update the actual cephadm load error?
Updated by Dhairya Parmar about 1 year ago · Edited
Venky Shankar wrote in #note-4:
@Dhairya Parmar - Could you please update the actual cephadm load error?
The stack trace is pointing to cls.wait_until_true(has_restarted, timeout=30) in method _load_module() (part of qa/tasks/mgr/mgr_test_case.py). If we look at the condition which is failing:
def has_restarted():
mgr_map = cls.mgr_cluster.get_mgr_map()
done = mgr_map['active_gid'] != initial_gid and mgr_map['available']
if done:
log.debug("Restarted after module load (new active {0}/{1})".format(
mgr_map['active_name'], mgr_map['active_gid']))
return done
Then it is basically looking for a change in mgr active_gid and the mgr (
mgr.x in this case) should be available. The active_gid and mgr availability before the ceph mgr module enable cephadm --force is:2025-01-16T17:58:59.941 INFO:teuthology.orchestra.run.smithi107.stdout: "active_gid": 14318 2025-01-16T17:58:59.942 INFO:teuthology.orchestra.run.smithi107.stdout: "available": true
after enabling cephadm, active_gid remain the same:
2025-01-16T17:59:01.564 INFO:teuthology.orchestra.run.smithi107.stdout: "active_gid": 14318 2025-01-16T17:59:01.565 INFO:teuthology.orchestra.run.smithi107.stdout: "available": true
we have a 30s timeout with each a sleep period of 5 seconds for each iteration, so in second and third iteration - active_gid didnt change and so is the availability but in later iterations the active_gid did change but so did the availability (now false):
2025-01-16T17:59:19.021 INFO:teuthology.orchestra.run.smithi107.stdout: "active_gid": 15112 2025-01-16T17:59:19.021 INFO:teuthology.orchestra.run.smithi107.stdout: "available": false
therefore done = mgr_map['active_gid'] != initial_gid and mgr_map['available'] never turns true and hence we see:
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner: raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2025-01-16T17:59:38.204 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 30s and 0 retries
Why is the mgr daemon not available is the question, i'm not sure why is that. @Venky Shankar do you need me to dig deeper?
Updated by Venky Shankar about 1 year ago
It's possible that this is related to https://tracker.ceph.com/issues/68747
Updated by Venky Shankar about 1 year ago
- Related to Bug #68747: fs:upgrade failure due to ceph-mgr not being ready added
Updated by Venky Shankar about 1 year ago
Dhairya, suggest checking if this is an instance of the related tracker I added.
Updated by Dhairya Parmar about 1 year ago
Venky Shankar wrote in #note-8:
Dhairya, suggest checking if this is an instance of the related tracker I added.
I'm not seeing any similarity as far as teuthology logs are concerned, checking mgr logs better insights.
Updated by Venky Shankar about 1 year ago
- Project changed from CephFS to Orchestrator
Dhairya Parmar wrote in #note-9:
Venky Shankar wrote in #note-8:
Dhairya, suggest checking if this is an instance of the related tracker I added.
I'm not seeing any similarity as far as teuthology logs are concerned, checking mgr logs better insights.
Yeah - this is unrelated to cephfs. Moving this under orch project for @Adam King to have a look.
Updated by Venky Shankar about 1 year ago
- Assignee changed from Dhairya Parmar to Adam King
Updated by Venky Shankar about 1 year ago
- Related to deleted (Bug #68747: fs:upgrade failure due to ceph-mgr not being ready)