Skip to content

qa/cephfs: upgrade xfstests_dev.py to run more tests#45960

Merged
rishabh-d-dave merged 2 commits intoceph:mainfrom
rishabh-d-dave:xfstests-dev-run-more-tests
Mar 14, 2023
Merged

qa/cephfs: upgrade xfstests_dev.py to run more tests#45960
rishabh-d-dave merged 2 commits intoceph:mainfrom
rishabh-d-dave:xfstests-dev-run-more-tests

Conversation

@rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented Apr 19, 2022

Fixes: https://tracker.ceph.com/issues/18475

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Apr 19, 2022

DNM because the PR needs to add tests that should be run regularly in nightlies.

EDIT -
Also DNM because it needs commits from PR #45950.

@rishabh-d-dave
Copy link
Contributor Author

@gregsfortytwo @jtlayton @vshankar Which tests from xfstests-dev do we wish to run in nightlies? I'll add one more commit to this PR accordingly.

@rishabh-d-dave rishabh-d-dave force-pushed the xfstests-dev-run-more-tests branch 3 times, most recently from 5cda4c1 to 75c730d Compare April 25, 2022 08:13
@gregsfortytwo
Copy link
Member

@gregsfortytwo @jtlayton @vshankar Which tests from xfstests-dev do we wish to run in nightlies? I'll add one more commit to this PR accordingly.

I think we want to run all of them, unless there are some which fail on FUSE filesystems? But that should all be handled within the xfstests suite.

@vshankar
Copy link
Contributor

vshankar commented Apr 25, 2022

@gregsfortytwo @jtlayton @vshankar Which tests from xfstests-dev do we wish to run in nightlies? I'll add one more commit to this PR accordingly.

generic/*

return p.returncode

def run_testfile(self, testdir, testfile):
return self.run_tests(f'{testdir}/{testfile}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fix the test_acls.py by using this new helper in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to do it but I wanted to figure out first what more tests we want to run.

@rishabh-d-dave
Copy link
Contributor Author

@gregsfortytwo @vshankar

@gregsfortytwo @jtlayton @vshankar Which tests from xfstests-dev do we wish to run in nightlies? I'll add one more commit to this PR accordingly.

generic/*

What about tests/ceph?

@lxbsz
Copy link
Member

lxbsz commented Apr 26, 2022

@gregsfortytwo @vshankar

@gregsfortytwo @jtlayton @vshankar Which tests from xfstests-dev do we wish to run in nightlies? I'll add one more commit to this PR accordingly.

generic/*

What about tests/ceph?

@jtlayton has one doc about the xfsdev test [1], this is what we usually will run.

[1] https://jtlayton.wordpress.com/2021/11/29/testing-the-linux-kernel-cephfs-client-with-xfstests/

@rishabh-d-dave
Copy link
Contributor Author

@lxbsz

@jtlayton has one doc about the xfsdev test [1], this is what we usually will run.

[1] https://jtlayton.wordpress.com/2021/11/29/testing-the-linux-kernel-cephfs-client-with-xfstests/

Oh, awesome. Thanks!

@vshankar
Copy link
Contributor

vshankar commented May 2, 2022

@rishabh-d-dave - right now, test_acls uses xfstests_dev, but we would want to run more xfstests not from test_acls by itself. Please do the changes w.r.t. that.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

make check failure is valid, I'll fix it after these tests finish running - http://pulpito.front.sepia.ceph.com/rishabh-2022-05-06_15:47:39-fs-wip-rishabh-fs-auth-subcmd-distro-basic-smithi/

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/tests_from_xfstests-dev.py:3:1: F401 'io.BytesIO' imported but unused
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/qa/.tox/flake8

https://jenkins.ceph.com/job/ceph-pull-requests/94981/

@rishabh-d-dave rishabh-d-dave force-pushed the xfstests-dev-run-more-tests branch 2 times, most recently from 4ab8fa9 to 796ec48 Compare May 7, 2022 04:04
@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented May 7, 2022

Quoting #45960 (comment).

make check failure is valid, I'll fix it after these tests finish running - http://pulpito.front.sepia.ceph.com/rishabh-2022-05-06_15:47:39-fs-wip-rishabh-fs-auth-subcmd-distro-basic-smithi/

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/tests_from_xfstests-dev.py:3:1: F401 'io.BytesIO' imported but unused
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/qa/.tox/flake8

https://jenkins.ceph.com/job/ceph-pull-requests/94981/

Fixed now.

@rishabh-d-dave rishabh-d-dave force-pushed the xfstests-dev-run-more-tests branch 4 times, most recently from f8f61f5 to 6a083e7 Compare May 7, 2022 18:21
@rishabh-d-dave rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label May 8, 2022
@rishabh-d-dave
Copy link
Contributor Author

@gregsfortytwo @vshankar @lxbsz @jtlayton
On our teuthology test environment, 3 out of 682 tests from xfstests-dev's generic suite failed. Here's the link - http://pulpito.front.sepia.ceph.com/rishabh-2022-05-07_18:22:10-fs-wip-rishabh-fs-auth-subcmd-distro-basic-smithi/

Are any of these failures (copied below) already known? I'll start investigating into these failure and try fixing them otherwise.

@vshankar
Copy link
Contributor

@rishabh-d-dave This would need a retest once #44240 is merged.

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Mar 8, 2023

http://pulpito.front.sepia.ceph.com/rishabh-2023-03-08_14:29:19-fs:functional-wip-rishabh-fs-qa-workunit-quota.sh-4-distro-default-smithi/

EDIT -
Test ran fine.

New changes are working fine.
Log -

2023-03-08T16:08:47.655 INFO:tasks.cephfs.xfstests_dev:Command return value: 0
2023-03-08T16:08:47.655 INFO:tasks.cephfs.xfstests_dev:=========================================================Total number of tests failed = 4=========================================================
2023-03-08T16:08:47.656 DEBUG:teuthology.orchestra.run.smithi130:> sudo userdel --force --remove fsgqa
2023-03-08T16:08:48.583 DEBUG:teuthology.orchestra.run.smithi130:> sudo userdel --force --remove 123456-fsgqa
2023-03-08T16:08:49.379 DEBUG:teuthology.orchestra.run.smithi130:> sudo groupdel fsgqa

@vshankar
Copy link
Contributor

vshankar commented Mar 8, 2023

@rishabh-d-dave you mentioned about ~40 failures with ceph-fuse. Could you link me the run for that?

@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Mar 8, 2023

@rishabh-d-dave you mentioned about ~40 failures with ceph-fuse. Could you link me the run for that?

It' in the comment just above. The log had double entries. So the total number of failures is actually 20. Following are tracker tickets created for these failures -

fuse: https://tracker.ceph.com/issues/58937
fuse: https://tracker.ceph.com/issues/58945
kernel: https://tracker.ceph.com/issues/58938

@rishabh-d-dave
Copy link
Contributor Author

make check failure looks unrelated - https://jenkins.ceph.com/job/ceph-pull-requests/111838/. Running this Ceph CI job one more time.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

1 similar comment
@vshankar
Copy link
Contributor

vshankar commented Mar 9, 2023

@rishabh-d-dave you mentioned about ~40 failures with ceph-fuse. Could you link me the run for that?

It' in the comment just above. The log had double entries. So the total number of failures is actually 20. Following are tracker tickets created for these failures -

fuse: https://tracker.ceph.com/issues/58937 kernel: https://tracker.ceph.com/issues/58938

I left a comment in tracker 58937. I might be misinterpreting the whole thing, but I cannot find a failing teuthology job with ceph-fuse.

@lxbsz
Copy link
Member

lxbsz commented Mar 9, 2023

@rishabh-d-dave you mentioned about ~40 failures with ceph-fuse. Could you link me the run for that?

It' in the comment just above. The log had double entries. So the total number of failures is actually 20. Following are tracker tickets created for these failures -
fuse: https://tracker.ceph.com/issues/58937 kernel: https://tracker.ceph.com/issues/58938

I left a comment in tracker 58937. I might be misinterpreting the whole thing, but I cannot find a failing teuthology job with ceph-fuse.

This PR doesn't fail it when there have failure tests. So the qa teuthology jobs will always pass.

@rishabh-d-dave IMO you can just skip those cases currently, which maybe buggy in xfstests-dev. And you can add then back one by one after you fixed them in future.

else:
try:
self.assertEqual(p.returncode, 0)
line = 'Passed all 0 tests'
Copy link
Member

@lxbsz lxbsz Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Passed all ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then just do:

self.assertIn(line, stdout)

should be enough ??

As I remembered I never seen Passed all 0 tests yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't exactly recall when I saw Passed all 0 tests but it does occur sometimes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false check anyway.

If there is one test and it fails, the output will be:

...
    ------------------------------------------------
    +---------------------------------------------dd--
     fsx.3 : -d -N numops -l filelen -S 0 -x
     -----------------------------------------------
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests

If there have more than one tests, the output will be:

...
    ------------------------------------------------
    +---------------------------------------------dd--
     fsx.3 : -d -N numops -l filelen -S 0 -x
     -----------------------------------------------
generic/123 3s ...  4s
Ran: generic/075 generic/123
Failures: generic/075
Failed 1 of 2 tests

If both fail it will be:

...
     -----------------------------------------------
     fsx.3 : -A -d -N numops -l filelen -S 0 -x
    ------------------------------------------------
    +--------------------------------------------dd---
Ran: generic/075 generic/112
Failures: generic/075 generic/112
Failed 2 of 2 tests

For group test it will be:

...
Failures: generic/112
Failed 1 of 568 tests

Please see http://qa-proxy.ceph.com/teuthology/xiubli-2023-02-23_03:24:22-fs:fscrypt-wip-fscrypt-20230215-0834-distro-default-smithi/7185216/teuthology.log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what happens normally. But in certain cases "" gets printed (I guess I saw it when ceph-fuse wasn't support and tests for acls was triggered.).

Signed-off-by: Rishabh Dave <ridave@redhat.com>
@rishabh-d-dave
Copy link
Contributor Author

Optimizations, nothing more.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

Signed-off-by: Rishabh Dave <ridave@redhat.com>
Copy link
Member

@lxbsz lxbsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@vshankar
Copy link
Contributor

@rishabh-d-dave Please create trackers for the test failures. Otherwise LGTM.

@rishabh-d-dave
Copy link
Contributor Author

@vshankar

@rishabh-d-dave Please create trackers for the test failures. Otherwise LGTM.

The tracker for test failurs were created few days ago -

fuse: https://tracker.ceph.com/issues/58945
kernel: https://tracker.ceph.com/issues/58938

@vshankar
Copy link
Contributor

@vshankar

@rishabh-d-dave Please create trackers for the test failures. Otherwise LGTM.

The tracker for test failurs were created few days ago -

fuse: https://tracker.ceph.com/issues/58945 kernel: https://tracker.ceph.com/issues/58938

Ah, since its assigned, I didn't see those dueing bug scrub.

@vshankar
Copy link
Contributor

@vshankar

@rishabh-d-dave Please create trackers for the test failures. Otherwise LGTM.

The tracker for test failurs were created few days ago -
fuse: https://tracker.ceph.com/issues/58945 kernel: https://tracker.ceph.com/issues/58938

Ah, since its assigned, I didn't see those during bug scrub.

@vshankar
Copy link
Contributor

@rishabh-d-dave you can go ahead and merge.

@rishabh-d-dave
Copy link
Contributor Author

Tests ran fine.

kclient: http://pulpito.front.sepia.ceph.com/rishabh-2023-03-14_07:28:57-fs:functional-wip-rishabh-fs-auth-subcmd-printauth-distro-default-smithi

fuse: http://pulpito.front.sepia.ceph.com/rishabh-2023-03-14_07:29:12-fs:functional-wip-rishabh-fs-auth-subcmd-printauth-distro-default-smithi

The tests from xfstests-dev repo were executed partially so that code from this PR could be test in its entirety in lesser time. Here's the patch that was applied on top of this PR branch to reduce the duration -

diff --git a/qa/tasks/cephfs/xfstests_dev.py b/qa/tasks/cephfs/xfstests_dev.py
index 5de2ddb1568..541a6575842 100644
--- a/qa/tasks/cephfs/xfstests_dev.py
+++ b/qa/tasks/cephfs/xfstests_dev.py
@@ -359,8 +359,12 @@ class XFSTestsDev(CephFSTestCase):
 
         testfiles = [f for f in testfiles if f.isdigit()]
 
+        i = 0
         for testfile in testfiles:
+            if i > 130:
+                break
             self.run_testfile(testdir, testfile)
+            i += 1
 
         log.info('========================================================='
                  f'Total number of tests failed = {self.total_tests_failed}'

The failure of the tests above is not due to the patch on this PR. Tickets have been to investigate deeper into this -

fuse: https://tracker.ceph.com/issues/58945
kernel: https://tracker.ceph.com/issues/58938

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System needs-review tests wip-rishabh-testing Rishabh's testing label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants