Skip to content

squid: qa: Fix test_with_health_warn_with_2_active_MDSs#65798

Closed
joscollin wants to merge 3 commits intoceph:squidfrom
joscollin:wip-72280-squid
Closed

squid: qa: Fix test_with_health_warn_with_2_active_MDSs#65798
joscollin wants to merge 3 commits intoceph:squidfrom
joscollin:wip-72280-squid

Conversation

@joscollin
Copy link
Member

@joscollin joscollin commented Oct 6, 2025

backport tracker: https://tracker.ceph.com/issues/72280


backport of #64297
parent tracker: https://tracker.ceph.com/issues/71915

NOTE:
The PR pulls the following dependant qa commits:

5a7834b
d53be13

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

rishabh-d-dave and others added 3 commits October 7, 2025 04:00
MDS_CACHE_OVERSIZE warning.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 5a7834b)
tha generates MDS_CACHE_OVERSIZE warning.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d53be13)
The test intended to validate the failure of 'mds fail'
cmd on any active mds when one of them has warning.

The commit 2217002
(PR 61554) changes this behavior and allows 'mds fail'
on mds without the warning. The test should have always
failed with this commit. But the test never failed until
tested extensively because the test mostly generated
warnings for both active mdses. Occasionaly, the test
generated a warning on single mds and failed. So it's a
race. This patch fixes the same by changing the following.

 a. Changed the mds_cache_memory_limit to '50K' from '1K'
    as '1K' was to less and generating warning on both the mdses.
 b. Create a directory and pin it a single mds and open 400 files
    in the backend to create cache pressure on one mds.

Also, there are two tests with the same name as
'test_with_health_warn_with_2_active_MDSs' but in different classes
though. So changed the test name to
'test_with_health_warn_on_1_mds_with_2_active_MDSs' to avoid
confusion and indicate what the test actually does.

Fixes: https://tracker.ceph.com/issues/71915
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit f990e7d)
@joscollin
Copy link
Member Author

This PR is under test in https://tracker.ceph.com/issues/73450.

return count, mds_id

def test_with_health_warn_with_2_active_MDSs(self):
def test_with_health_warn_on_1_mds_with_2_active_MDSs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test failed in QA -

2025-10-10T22:12:59.591 INFO:tasks.cephfs_test_runner:======================================================================
2025-10-10T22:12:59.591 INFO:tasks.cephfs_test_runner:ERROR: test_with_health_warn_on_1_mds_with_2_active_MDSs (tasks.cephfs.test_admin.TestMDSFail)
2025-10-10T22:12:59.591 INFO:tasks.cephfs_test_runner:Test when a CephFS has 2 active MDSs and one of them have either
2025-10-10T22:12:59.591 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_104ea276496eeef39137386037044178d6f67f79/qa/tasks/cephfs/test_admin.py", line 2645, in test_with_health_warn_on_1_mds_with_2_active_MDSs
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    self.run_ceph_cmd(f'mds fail {mds1_id}')
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_104ea276496eeef39137386037044178d6f67f79/qa/tasks/ceph_test_case.py", line 30, in run_ceph_cmd
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    return self.mon_manager.run_cluster_cmd(**kwargs)
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_104ea276496eeef39137386037044178d6f67f79/qa/tasks/ceph_manager.py", line 1635, in run_cluster_cmd
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    return self.controller.run(**kwargs)
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_78c036dc9ad59cb33807dc7f21fda50de2f348d2/teuthology/orchestra/remote.py", line 575, in run
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_78c036dc9ad59cb33807dc7f21fda50de2f348d2/teuthology/orchestra/run.py", line 461, in run
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    r.wait()
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_78c036dc9ad59cb33807dc7f21fda50de2f348d2/teuthology/orchestra/run.py", line 161, in wait
2025-10-10T22:12:59.592 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2025-10-10T22:12:59.593 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_78c036dc9ad59cb33807dc7f21fda50de2f348d2/teuthology/orchestra/run.py", line 181, in _raise_for_status
2025-10-10T22:12:59.593 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2025-10-10T22:12:59.593 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on smithi119 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mds fail c'

See: http://qa-proxy.ceph.com/teuthology/jcollin-2025-10-10_05:57:09-fs:functional-wip-jcollin-testing-20251010.002614-squid-distro-default-smithi/8545211/teuthology.log

Copy link
Member Author

@joscollin joscollin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants