Skip to content

squid: qa: failfast mount for better performance and unblock fs volume ls#59919

Merged
mchangir merged 1 commit intoceph:squidfrom
mchangir:wip-67826-squid
Jan 3, 2025
Merged

squid: qa: failfast mount for better performance and unblock fs volume ls#59919
mchangir merged 1 commit intoceph:squidfrom
mchangir:wip-67826-squid

Conversation

@mchangir
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/67826


backport of #58547
parent tracker: https://tracker.ceph.com/issues/66009

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

During teuthology tests, the tearing down of the cluster between two
tests causes the config to be reset and a config_notify generated. This
leads to a race to create a new mount using the old fscid. But by the
time the mount is attempted the new fs gets created with a new fscid.
This situation leads to the client mount waiting for a connection
completion notification from the mds for 5 minutes (default timeout)
and eventually giving up.
However, the default teuthology command timeout is 2 minutes. So,
teuthology fails the command and declares the job as failed way before
the mount can timeout.

The resolution to this case is to lower the client mount timeout to 30
seconds so that the config_notify fails fast paving the way for
successive commands to get executed with the new fs.

An unhandled cluster warning about an unresponsive client also gets
emitted later during qa job termination which leads to teuthology
declaring the job as failed. As of now this warning seems harmless since
it is emitted during cluster cleanup phase.
So, this warning is added to the log-ignorelist section in the
snap-schedule YAML.

Fixes: https://tracker.ceph.com/issues/66009
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit daf4798)
@mchangir mchangir added this to the squid milestone Sep 23, 2024
@mchangir mchangir added the tests label Sep 23, 2024
@github-actions github-actions bot added the cephfs Ceph File System label Sep 23, 2024
@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/69030.

@mchangir
Copy link
Contributor Author

mchangir commented Jan 2, 2025

jenkins test windows

@mchangir
Copy link
Contributor Author

mchangir commented Jan 3, 2025

@mchangir mchangir merged commit 36038d7 into ceph:squid Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants