Skip to content

qa: test fio with multiple versions of NFS#62031

Merged
batrick merged 1 commit intoceph:mainfrom
dparmar18:ib70203
Mar 18, 2025
Merged

qa: test fio with multiple versions of NFS#62031
batrick merged 1 commit intoceph:mainfrom
dparmar18:ib70203

Conversation

@dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented Feb 27, 2025

This is coming from https://tracker.ceph.com/issues/69347, basically fio fails in
teuthology with v4.2 but passes with older versions (like v4.1). v4.2 on local/QE
machines has worked fine. This needs RCA, looking at libcephfs logs, it doesn't
look like it's CephFS. A thorough probe is needed, hence adding YAMLs to test
fio with different versions which also helps testing ceph client with different
version of NFS to help trace any regression.

unset.yaml is basically 4.2 for now but That may change someday or the nfs linux
driver may soon "add" things beyond 4.2 without adding an explicit version bump.

Fixes: https://tracker.ceph.com/issues/70203

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@dparmar18
Copy link
Contributor Author

@kotreshhr @vshankar do we agree on this?

@vshankar
Copy link
Contributor

vshankar commented Mar 3, 2025

@kotreshhr @vshankar do we agree on this?

I will (obviously) be biased since I proposed this (temporary) workaround so that we get nfs-gaesha coverage :)

What do others think?

@kotreshhr
Copy link
Contributor

@kotreshhr @vshankar do we agree on this?

I will (obviously) be biased since I proposed this (temporary) workaround so that we get nfs-gaesha coverage :)

What do others think?

I think it's a good idea for now to get the coverage.

# TODO: Reference: https://tracker.ceph.com/issues/70203
self.ctx.cluster.run(
args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port}',
args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port},vers=4.1',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that once we set this to 4.1, we will never restore 4.2. Can you please reference the actual discussion in https://tracker.ceph.com/issues/69347

A better way to structure this would be to have a set of yamls like:

diff --git a/qa/suites/fs/nfs/nfs-version/v4.1.yaml b/qa/suites/fs/nfs/nfs-version/v4.1.yaml
new file mode 100644
index 00000000000..729a8533d6d
--- /dev/null
+++ b/qa/suites/fs/nfs/nfs-version/v4.1.yaml
@@ -0,0 +1,2 @@
+nfs:
+  vers: 4.1
diff --git a/qa/suites/fs/nfs/nfs-version/v4.2.yaml b/qa/suites/fs/nfs/nfs-version/v4.2.yaml
new file mode 100644
index 00000000000..850afbd6187
--- /dev/null
+++ b/qa/suites/fs/nfs/nfs-version/v4.2.yaml
@@ -0,0 +1,2 @@
+nfs:
+  vers: 4.2
diff --git a/qa/tasks/cephfs/test_nfs.py b/qa/tasks/cephfs/test_nfs.py
index 4f0e258683e..26ec4f7b1af 100644
--- a/qa/tasks/cephfs/test_nfs.py
+++ b/qa/tasks/cephfs/test_nfs.py
@@ -328,10 +328,12 @@ class TestNFS(MgrTestCase):
         :param check: It denotes if i/o testing needs to be done
         '''
         tries = 3
+        config_nfs = self.ctx['config'].get('nfs', {})
+        config_nfs_vers = config_nfs.get('vers', '4.2')
         while True:
             try:
                 self.ctx.cluster.run(
-                    args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port}',
+                    args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port},vers={config_nfs_vers}',
                           f'{ip}:{pseudo_path}', '/mnt'])
                 break
             except CommandFailedError as e:

^ untested

Then you can filter out the 4.2 dimension conditionally using a postmerge script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that once we set this to 4.1, we will never restore 4.2. Can you please reference the actual discussion in https://tracker.ceph.com/issues/69347

A better way to structure this would be to have a set of yamls like:

diff --git a/qa/suites/fs/nfs/nfs-version/v4.1.yaml b/qa/suites/fs/nfs/nfs-version/v4.1.yaml
new file mode 100644
index 00000000000..729a8533d6d
--- /dev/null
+++ b/qa/suites/fs/nfs/nfs-version/v4.1.yaml
@@ -0,0 +1,2 @@
+nfs:
+  vers: 4.1
diff --git a/qa/suites/fs/nfs/nfs-version/v4.2.yaml b/qa/suites/fs/nfs/nfs-version/v4.2.yaml
new file mode 100644
index 00000000000..850afbd6187
--- /dev/null
+++ b/qa/suites/fs/nfs/nfs-version/v4.2.yaml
@@ -0,0 +1,2 @@
+nfs:
+  vers: 4.2
diff --git a/qa/tasks/cephfs/test_nfs.py b/qa/tasks/cephfs/test_nfs.py
index 4f0e258683e..26ec4f7b1af 100644
--- a/qa/tasks/cephfs/test_nfs.py
+++ b/qa/tasks/cephfs/test_nfs.py
@@ -328,10 +328,12 @@ class TestNFS(MgrTestCase):
         :param check: It denotes if i/o testing needs to be done
         '''
         tries = 3
+        config_nfs = self.ctx['config'].get('nfs', {})
+        config_nfs_vers = config_nfs.get('vers', '4.2')
         while True:
             try:
                 self.ctx.cluster.run(
-                    args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port}',
+                    args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port},vers={config_nfs_vers}',
                           f'{ip}:{pseudo_path}', '/mnt'])
                 break
             except CommandFailedError as e:

^ untested

Then you can filter out the 4.2 dimension conditionally using a postmerge script.

If we can run tests with both v4.1 and v4.2, then why do we need filter out the 4.2 dimension? I'd let that job fail, so that we never loose track of it.

Copy link
Contributor Author

@dparmar18 dparmar18 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mount option had no version mentioned before, 4.2 is latest with the distro, when in case 4.3 comes we'd another manual job to take care of i.e. to change this to 4.2 and 4.3, i think we can have yamls named old-nfs.yaml and latest-nfs.yaml and have something like:

config_nfs = self.ctx['config'].get('nfs', {})
config_nfs_vers = config_nfs.get('vers')
if config_nfs_vers <= 4.2:  # 4.2 being the latest at the time of writing this code
	args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port},vers={config_nfs_vers}',
                           f'{ip}:{pseudo_path}', '/mnt'])
else:
	args=['sudo', 'mount', '-t', 'nfs', '-o', f'port={port}, f'{ip}:{pseudo_path}', '/mnt'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually a good thing to have since we already have similar case with ceph client where we test with old client and stuff. a nice code coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So, we can do this via yaml fragments for 4.1/4.2, so we know early on when things start failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can but my point is to also have a yaml that doesnt mention the version. let the code make use of the latest version shipped with distro (which is what this func did before I added vers= option)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine too.

@dparmar18 dparmar18 force-pushed the ib70203 branch 2 times, most recently from b27b836 to d0f7cd3 Compare March 6, 2025 14:38
@dparmar18 dparmar18 changed the title qa: use NFS mount V4.1 instead of V4.2 qa: test fio with NFS v4.1 and latest version Mar 6, 2025
@dparmar18
Copy link
Contributor Author

@batrick @vshankar PTAL, once the teuthology stack is clear, i'll put this to test

@dparmar18 dparmar18 marked this pull request as draft March 7, 2025 06:21
@dparmar18 dparmar18 force-pushed the ib70203 branch 2 times, most recently from 85f12f5 to 2cbcf1d Compare March 7, 2025 08:02
@github-actions
Copy link

github-actions bot commented Mar 7, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18
Copy link
Contributor Author

dparmar18 commented Mar 7, 2025

@dparmar18 dparmar18 requested review from batrick and vshankar March 7, 2025 08:31
@@ -0,0 +1,2 @@
nfs:
vers: 4.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please name the yaml 4.1.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,2 @@
nfs:
vers: latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of testing with vers= option unset, even if it's the same as 4.2. That may change someday or the nfs linux driver may soon "add" things beyond 4.2 without adding an explicit version bump.

IOW, please add a 4.2.yaml and a unset.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@dparmar18 dparmar18 changed the title qa: test fio with NFS v4.1 and latest version qa: test fio with multiple versions of NFS Mar 12, 2025
This is coming from https://tracker.ceph.com/issues/69347, basically fio fails in
teuthology with v4.2 but passes with older versions (like v4.1). v4.2 on local/QE
machines has worked fine. This needs RCA, looking at libcephfs logs, it doesn't
look like it's CephFS. A thorough probe is needed, hence adding YAMLs to test
fio with different versions which also helps testing ceph client with different
version of NFS to help trace any regression.

unset.yaml is basically 4.2 for now but That may change someday or the nfs linux
driver may soon "add" things beyond 4.2 without adding an explicit version bump.

Fixes: https://tracker.ceph.com/issues/70203
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Co-authored-by: Patrick Donnelly <pdonnell@ibm.com>
@dparmar18 dparmar18 requested a review from batrick March 12, 2025 11:25
@dparmar18
Copy link
Contributor Author

@batrick changes incorporated, runs pinged above, look fine

@dparmar18
Copy link
Contributor Author

jenkins test make check arm64

@dparmar18
Copy link
Contributor Author

@dparmar18
Copy link
Contributor Author

@dparmar18
Copy link
Contributor Author

jenkins restest this please

@batrick
Copy link
Member

batrick commented Mar 17, 2025

4.1 https://pulpito.ceph.com/dparmar-2025-03-13_09:31:43-fs:nfs-main-testing-default-smithi/

ran fine @batrick

I will merge this because it's an improvement that works but you need to file a ticket for the intermittent failure because it will reoccur in upstream QA.

@dparmar18
Copy link
Contributor Author

4.1 https://pulpito.ceph.com/dparmar-2025-03-13_09:31:43-fs:nfs-main-testing-default-smithi/
ran fine @batrick

I will merge this because it's an improvement that works but you need to file a ticket for the intermittent failure because it will reoccur in upstream QA.

added https://tracker.ceph.com/issues/70522

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@batrick batrick merged commit f20f084 into ceph:main Mar 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants