Skip to content

qa: move nfs (mgr/nfs) related tests to fs suite#52708

Merged
adk3798 merged 2 commits intoceph:mainfrom
vshankar:wip-62236
Sep 11, 2023
Merged

qa: move nfs (mgr/nfs) related tests to fs suite#52708
adk3798 merged 2 commits intoceph:mainfrom
vshankar:wip-62236

Conversation

@vshankar
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/62236

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@vshankar vshankar added cephfs Ceph File System orchestrator labels Jul 31, 2023
@vshankar vshankar requested review from a team and adk3798 July 31, 2023 12:27
@vshankar vshankar requested a review from a team as a code owner July 31, 2023 12:27
- ceph orch apply mds a
- cephfs_test_runner:
modules:
- tasks.cephfs.test_nfs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to remove the test from the orch/cephadm suite as part of this? I get also wanting to have it in the fs suite, but this has caught some failures for us in the past. We have some tests that run in both the rados and orch/cephadm suite for similar reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That was not the intent. I basically forgot to ask you the question in this PR since I had to rush out for some work. Sorry!

How do you want to do this? Have a symlink pointer from orch:cephadm:nfs to fs:nfs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a symlink like that works. As long as both suites are running this test_nfs task.

@vshankar
Copy link
Contributor Author

vshankar commented Aug 2, 2023

Fixes: https://tracker.ceph.com/issues/62236
Signed-off-by: Venky Shankar <vshankar@redhat.com>
orch:cephadm would like to keep running NFS tests.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
@adk3798
Copy link
Contributor

adk3798 commented Aug 2, 2023

https://pulpito.ceph.com/vshankar-2023-08-02_02:56:22-orch:cephadm-main-testing-default-smithi/

@adk3798

okay, that's a pretty rough run. I haven't done a run in a little bit due to the ganesha build issue, so unsure if there's now something wrong with the suite or just the run with that build in particular. I don't think any main builds have been successful the last few days.

@vshankar
Copy link
Contributor Author

vshankar commented Aug 2, 2023

https://pulpito.ceph.com/vshankar-2023-08-02_02:56:22-orch:cephadm-main-testing-default-smithi/
@adk3798

okay, that's a pretty rough run. I haven't done a run in a little bit due to the ganesha build issue, so unsure if there's now something wrong with the suite or just the run with that build in particular. I don't think any main builds have been successful the last few days.

Command failed on smithi044 with status 1: "grep '^nvme_loop' /proc/modules || sudo modprobe nvme_loop && sudo mkdir -p /sys/kernel/config/nvmet/hosts/hostnqn && sudo mkdir -p /sys/kernel/config/nvmet/ports/1 && echo loop | sudo tee /sys/kernel/config/nvmet/ports/1/addr_trtype"

Something is really messed up with sepia IMO.

@vshankar
Copy link
Contributor Author

vshankar commented Aug 3, 2023

OK, so todays run failed with the same mess as mentioned before - I'm going to check if this is due to this change.

@adk3798
Copy link
Contributor

adk3798 commented Aug 3, 2023

OK, so todays run failed with the same mess as mentioned before - I'm going to check if this is due to this change.

worth a check. I had a test run yesterday that went much better https://pulpito.ceph.com/adking-2023-08-03_00:50:31-orch:cephadm-wip-adk4-testing-2023-08-02-1559-distro-default-smithi/

@vshankar
Copy link
Contributor Author

vshankar commented Aug 9, 2023

OK, so todays run failed with the same mess as mentioned before - I'm going to check if this is due to this change.

worth a check. I had a test run yesterday that went much better https://pulpito.ceph.com/adking-2023-08-03_00:50:31-orch:cephadm-wip-adk4-testing-2023-08-02-1559-distro-default-smithi/

The failure is

2023-08-08T10:09:11.123 DEBUG:teuthology.orchestra.run.smithi086:> ! mount | grep -v devtmpfs | grep -q /dev/vg_nvme/lv_4
2023-08-08T10:09:11.170 DEBUG:teuthology.orchestra.run.smithi086:> grep '^nvme_loop' /proc/modules || sudo modprobe nvme_loop && sudo mkdir -p /sys/kernel/config/nvmet/hosts/hostnqn && sudo mkdir -p /sys/kernel/config/nvmet/ports/1 && echo loop | sudo tee /sys/kernel/config/nvmet/ports/1/addr_trtype
2023-08-08T10:09:11.239 INFO:teuthology.orchestra.run.smithi086.stderr:modprobe: FATAL: Module nvme_loop not found in directory /lib/modules/6.5.0-rc4-ga7fb1265323d
2023-08-08T10:09:11.241 DEBUG:teuthology.orchestra.run:got remote process result: 1

How is the above failure related to this change :/

@adk3798
Copy link
Contributor

adk3798 commented Sep 11, 2023

https://pulpito.ceph.com/adking-2023-09-07_12:40:41-orch:cephadm-wip-adk-testing-2023-09-06-1611-distro-default-smithi/

3 failures

  • 1 failure in test_nfs task. This test had been blocked from running properly for a while due to https://tracker.ceph.com/issues/55986 which was recently resolved. It seems that it's just generally a bit broken currently and will need some more work. But shouldn't block merging the set of PRs in the run.
  • 1 failure deploying jaeger-tracing. Known issue https://tracker.ceph.com/issues/59704
  • 1 strange failure in the mgr-nfs-upgrade sequence. It was failing redeploying the first mgr as part of the upgrade. Interactive reruns allowed me to find the issue was
2023-09-08 19:03:19,673 7f017b1f1b80 DEBUG Determined image: 'quay.ceph.io/ceph-ci/ceph@sha256:29eb1b22bdc86e11facd8e3b821e546994d614ae2a0aec9d47234c7aede558d5'
2023-09-08 19:03:19,693 7f017b1f1b80 INFO Redeploy daemon mgr.smithi012.wqsagl ...
2023-09-08 19:06:22,875 7f017b1f1b80 INFO Non-zero exit code 1 from systemctl daemon-reload
2023-09-08 19:06:22,875 7f017b1f1b80 INFO systemctl: stderr Failed to reload daemon: Connection timed out

which is particularly odd because systemctl daemon-reload isn't even a command specific to the mgr's systemd unit. If it had been starting the systemd unit for the mgr, it could maybe be traced back to something with the mgr in the current build, but for whatever reason it was timing out during the daemon-reload. I would have considered it a weird one off if it wasn't for the fact that it reproduced 3 times in a row. Not really sure what to make of it. But either way I don't think we should hold up other PRs merging for it. Will just need some more investigation in the future.

Overall, I think we can merge the PRs from the run.

@adk3798
Copy link
Contributor

adk3798 commented Sep 11, 2023

https://pulpito.ceph.com/adking-2023-09-07_12:40:41-orch:cephadm-wip-adk-testing-2023-09-06-1611-distro-default-smithi/

3 failures

  • 1 failure in test_nfs task. This test had been blocked from running properly for a while due to https://tracker.ceph.com/issues/55986 which was recently resolved. It seems that it's just generally a bit broken currently and will need some more work. But shouldn't block merging the set of PRs in the run.
  • 1 failure deploying jaeger-tracing. Known issue https://tracker.ceph.com/issues/59704
  • 1 strange failure in the mgr-nfs-upgrade sequence. It was failing redeploying the first mgr as part of the upgrade. Interactive reruns allowed me to find the issue was
2023-09-08 19:03:19,673 7f017b1f1b80 DEBUG Determined image: 'quay.ceph.io/ceph-ci/ceph@sha256:29eb1b22bdc86e11facd8e3b821e546994d614ae2a0aec9d47234c7aede558d5'
2023-09-08 19:03:19,693 7f017b1f1b80 INFO Redeploy daemon mgr.smithi012.wqsagl ...
2023-09-08 19:06:22,875 7f017b1f1b80 INFO Non-zero exit code 1 from systemctl daemon-reload
2023-09-08 19:06:22,875 7f017b1f1b80 INFO systemctl: stderr Failed to reload daemon: Connection timed out

which is particularly odd because systemctl daemon-reload isn't even a command specific to the mgr's systemd unit. If it had been starting the systemd unit for the mgr, it could maybe be traced back to something with the mgr in the current build, but for whatever reason it was timing out during the daemon-reload. I would have considered it a weird one off if it wasn't for the fact that it reproduced 3 times in a row. Not really sure what to make of it. But either way I don't think we should hold up other PRs merging for it. Will just need some more investigation in the future.

Overall, I think we can merge the PRs from the run.

@vshankar do you need to test anything with the fs suite on this PR? The orch run I did with it looked fine to me.

@vshankar
Copy link
Contributor Author

@vshankar do you need to test anything with the fs suite on this PR? The orch run I did with it looked fine to me.

Nothing to be tested in fs suite - good to merge then. Thx, @adk3798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants