qa: fix "no orch backend set" in nfs suite by dparmar18 · Pull Request #53594 · ceph/ceph

dparmar18 · 2023-09-22T08:27:31Z

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

vshankar · 2023-09-22T14:28:40Z

@dparmar18 For commit 0576645, could you explain the issue and how does the change fix it?

dparmar18 · 2023-09-22T21:49:52Z

Okay so here's the story:

basically, post #52708 merge; NFS tests started failing due to:

2023-09-07T13:06:57.204 INFO:teuthology.orchestra.run.smithi143.stderr:Error ENOENT: No orchestrator configured (try `ceph orch set backend`)

This happened because the cluster was instantiated just the way it is done with other sub-suites in fs suite, but the issue with this way here is that we would not be able to run NFS commands because the cluster would complain that it has "no backend set" and setting the backend is not a straightforward task because let's say we run ceph orch set backend cephadm in the test cases class itself(i.e. in setUp()) then it would complain:

2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:  File "/usr/share/ceph/mgr/cephadm/module.py", line 3062, in _apply_service_spec
2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:    raise OrchestratorError((f'The maximum number of {spec.service_type} daemons allowed with {host_count} hosts is {host_count*max_count} ({host_count}x{max_count}).'
2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:orchestrator._interface.OrchestratorError: The maximum number of nfs daemons allowed with 0 hosts is 0 (0x10). This limit can be adjusted by changing the mgr/cephadm/max_count_per_host config option

So in order to overcome this we need cephadm to not only set the backend but also bootstrap the cluster and add the hosts; that's why we had been using qa/tasks/cephadm.py for carrying out NFS tests i.e. using the below in the YAML which was in the orch/cephadm suite before:

tasks:
- install:
- cephadm:

So let's say even if we add the above YAML tasks in the current setup, it is going to fail because the cluster would be instantiated by qa/tasks/ceph.py and then qa/tasks/cephadm.py would try to orchestrate which would fail [0]:

'Namespace' object has no attribute 'bootstrapped'

qa/tasks/ceph.py is used in begin dir's 0-install.yaml in the NFS sub-suite therefore using begin's steps and qa/tasks/cephadm.py is a disaster while just using begin is even worse.

So it is mandatory to orchestrate the cluster using cephadm tasks and we have already been doing it in the cephadm sub-suite in fs -> fs/cephadm has ditto setup to what was in the orch/cephadm to run NFS tests. Long story short, we can't carry out NFS tests using fs suite's way of constructing the cluster and basically need to follow steps mentioned in the fs/cephadm and/or what is being done in orch/cephadm suite.

One more question one may ask: Having the cephadm tasks is fine but why bootstrap MDS clusters using:

- cephadm.shell:
    host.a:
      - ceph orch apply mds a

and not just mention two MDSs in the yaml file itself directly?

Well I did try it out and as expected the job failed [1], because we changed the way we bootstrap the MDSs and thus we would have a CephFS cephfs now and some test cases in test_nfs.py creates a CephFS(so now we have two FSs) and uses ceph-fuse to mount a fs (without --client_fs) and then check for a particular path when creating CephFS exports, here it will pick the default fs i.e. cephfs and will lead to unwanted failures because the path doesn't exist there. Now one may say we can pass --client_fs in the test cases right? Yeah we can but that is not worth the effort when we can just follow the way we've been bootstrapping the MDSs with ceph orch apply mds a instead of directly mentioning the MDSs in YAML.

To support all of these, here are two successful jobs:
http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_14:16:34-fs:nfs-wip-62870-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_15:26:51-fs:nfs-wip-62870-distro-default-smithi/

[0] http://pulpito.front.sepia.ceph.com/dparmar-2023-09-21_14:36:37-fs:nfs-fix-nfs-apply-err-reporting-distro-default-smithi/
[1] http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_12:42:29-fs:nfs-wip-62870-distro-default-smithi/

dparmar18 · 2023-09-22T21:50:24Z

@vshankar @adk3798 ^^

Fixes: https://tracker.ceph.com/issues/62870 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>

dparmar18 · 2023-09-22T21:52:52Z

last push made no code changes, just added the Fixes line in commit message

dparmar18 · 2023-09-25T06:25:14Z

oh i forgot to mention why I removed objectstore: a) it is not needed since we're not dealing with anything complex and b) i'm following the blueprint of fs/cephadm sub-suite and also fact that orch/cephadm never had it but the tests always ran fine; this is also a supporting proof. So all over I think it is good to merge ASAP since we're currently blocked for carrying out NFS testing

vshankar · 2023-09-25T09:37:41Z

Okay so here's the story:

basically, post #52708 merge; NFS tests started failing due to:
2023-09-07T13:06:57.204 INFO:teuthology.orchestra.run.smithi143.stderr:Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
This happened because the cluster was instantiated just the way it is done with other sub-suites in fs suite, but the issue with this way here is that we would not be able to run NFS commands because the cluster would complain that it has "no backend set" and setting the backend is not a straightforward task because let's say we run ceph orch set backend cephadm in the test cases class itself(i.e. in setUp()) then it would complain:
2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:  File "/usr/share/ceph/mgr/cephadm/module.py", line 3062, in _apply_service_spec
2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:    raise OrchestratorError((f'The maximum number of {spec.service_type} daemons allowed with {host_count} hosts is {host_count*max_count} ({host_count}x{max_count}).'
2023-09-21T12:33:52.923 INFO:tasks.ceph.mgr.x.smithi002.stderr:orchestrator._interface.OrchestratorError: The maximum number of nfs daemons allowed with 0 hosts is 0 (0x10). This limit can be adjusted by changing the mgr/cephadm/max_count_per_host config option
So in order to overcome this we need cephadm to not only set the backend but also bootstrap the cluster and add the hosts; that's why we had been using qa/tasks/cephadm.py for carrying out NFS tests i.e. using the below in the YAML which was in the orch/cephadm suite before:
tasks:
- install:
- cephadm:
So let's say even if we add the above YAML tasks in the current setup, it is going to fail because the cluster would be instantiated by qa/tasks/ceph.py and then qa/tasks/cephadm.py would try to orchestrate which would fail [0]:
'Namespace' object has no attribute 'bootstrapped'
qa/tasks/ceph.py is used in begin dir's 0-install.yaml in the NFS sub-suite therefore using begin's steps and qa/tasks/cephadm.py is a disaster while just using begin is even worse.

Looks fine till here.

So it is mandatory to orchestrate the cluster using cephadm tasks and we have already been doing it in the cephadm sub-suite in fs -> fs/cephadm has ditto setup to what was in the orch/cephadm to run NFS tests. Long story short, we can't carry out NFS tests using fs suite's way of constructing the cluster and basically need to follow steps mentioned in the fs/cephadm and/or what is being done in orch/cephadm suite.

One more question one may ask: Having the cephadm tasks is fine but why bootstrap MDS clusters using:
- cephadm.shell:
    host.a:
      - ceph orch apply mds a
and not just mention two MDSs in the yaml file itself directly?

Well I did try it out and as expected the job failed [1], because we changed the way we bootstrap the MDSs and thus we would have a CephFS cephfs now and some test cases in test_nfs.py creates a CephFS(so now we have two FSs) and uses ceph-fuse to mount a fs (without --client_fs) and then check for a particular path when creating CephFS exports, here it will pick the default fs i.e. cephfs and will lead to unwanted failures because the path doesn't exist there. Now one may say we can pass --client_fs in the test cases right? Yeah we can but that is not worth the effort when we can just follow the way we've been bootstrapping the MDSs with ceph orch apply mds a instead of directly mentioning the MDSs in YAML.

Fair enough.

To support all of these, here are two successful jobs: http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_14:16:34-fs:nfs-wip-62870-distro-default-smithi/ http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_15:26:51-fs:nfs-wip-62870-distro-default-smithi/

[0] http://pulpito.front.sepia.ceph.com/dparmar-2023-09-21_14:36:37-fs:nfs-fix-nfs-apply-err-reporting-distro-default-smithi/ [1] http://pulpito.front.sepia.ceph.com/dparmar-2023-09-22_12:42:29-fs:nfs-wip-62870-distro-default-smithi/

👍 Nice work @dparmar18

qa/suites/fs/nfs/cluster/1-node.yaml

dparmar18 · 2023-09-26T10:10:56Z

can this be merged? this is blocking some PRs to be tested

vshankar · 2023-09-26T10:20:24Z

can this be merged? this is blocking some PRs to be tested

running this through qa - will be meged soon (subset test).

vshankar · 2023-09-26T15:14:21Z

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20230926.081818

dparmar18 · 2023-09-26T19:04:50Z

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20230926.081818

2023-09-26T16:43:04.340 DEBUG:teuthology.orchestra.run.smithi186:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c pull
2023-09-26T16:43:04.513 INFO:teuthology.orchestra.run.smithi186.stderr:Pulling container image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c...
2023-09-26T16:43:45.465 INFO:teuthology.orchestra.run.smithi186.stderr:Non-zero exit code 125 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c -e NODE_NAME=smithi186 quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c --version
2023-09-26T16:43:45.466 INFO:teuthology.orchestra.run.smithi186.stderr:ceph: stderr docker: Error response from daemon: failed to create task for container: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown.
2023-09-26T16:43:45.466 INFO:teuthology.orchestra.run.smithi186.stderr:Traceback (most recent call last):
2023-09-26T16:43:45.466 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
2023-09-26T16:43:45.469 INFO:teuthology.orchestra.run.smithi186.stderr:    return _run_code(code, main_globals, None,
2023-09-26T16:43:45.469 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2023-09-26T16:43:45.469 INFO:teuthology.orchestra.run.smithi186.stderr:    exec(code, run_globals)
2023-09-26T16:43:45.469 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 8193, in <module>
2023-09-26T16:43:45.469 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 8181, in main
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 1644, in _default_image
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 4083, in command_pull
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/cephadmlib/decorators.py", line 27, in _require_image
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 1635, in _infer_image
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/__main__.py", line 4136, in command_inspect_image
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/cephadmlib/container_types.py", line 400, in run
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:  File "/tmp/tmpp8p6emph.cephadm.build/cephadmlib/call_wrappers.py", line 307, in call_throws
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint ceph --init -e CONTAINER_IMAGE=quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c -e NODE_NAME=smithi186 quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c --version: docker: Error response from daemon: failed to create task for container: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown.
2023-09-26T16:43:45.470 INFO:teuthology.orchestra.run.smithi186.stderr:
2023-09-26T16:43:45.489 DEBUG:teuthology.orchestra.run:got remote process result: 1
2023-09-26T16:43:45.490 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_a2e911cf76140ce8227d2acb6dc462b727acb78c/qa/tasks/cephadm.py", line 433, in pull_image
    run.wait(
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_54e62bcbac4e53d9685e08328b790d3b20d71cae/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi186 with status 1: 'sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:a2e911cf76140ce8227d2acb6dc462b727acb78c pull'

this shouldn't have occured

dparmar18 · 2023-09-26T19:07:45Z

https://pulpito.ceph.com/dparmar-2023-09-26_19:06:28-fs:nfs-wip-62870-distro-default-smithi/

vshankar · 2023-09-27T03:59:48Z

https://pulpito.ceph.com/dparmar-2023-09-26_19:06:28-fs:nfs-wip-62870-distro-default-smithi/

I see this passes, but both of my fs:nfs runs fail which likely means we are doing something different :)

dparmar18 · 2023-09-27T09:18:30Z

https://pulpito.ceph.com/dparmar-2023-09-26_19:06:28-fs:nfs-wip-62870-distro-default-smithi/

I see this passes, but both of my fs:nfs runs fail which likely means we are doing something different :)

The error doesn't relate to the code. Some issue with the branch i guess? Rebuilding the branch might help?

vshankar · 2023-09-27T09:34:24Z

https://pulpito.ceph.com/dparmar-2023-09-26_19:06:28-fs:nfs-wip-62870-distro-default-smithi/

I see this passes, but both of my fs:nfs runs fail which likely means we are doing something different :)

The error doesn't relate to the code. Some issue with the branch i guess? Rebuilding the branch might help?

The branch is just a bunch of PRs built in shaman, not sure what can go wrong with that (only one nfs related change).

vshankar · 2023-09-27T09:35:51Z

heh - https://pulpito.ceph.com/vshankar-2023-09-26_14:09:41-fs-wip-vshankar-testing-20230926.081818-testing-default-smithi/7402468/

the from the full fs suite passed.

vshankar · 2023-09-27T09:47:01Z

heh - https://pulpito.ceph.com/vshankar-2023-09-26_14:09:41-fs-wip-vshankar-testing-20230926.081818-testing-default-smithi/7402468/

the from the full fs suite passed.

Not sure if its related to the distro in use. rhel_8 passes but no the centos or ubuntu in my run. Could you please check @dparmar18?

dparmar18 · 2023-09-27T11:16:01Z

heh - https://pulpito.ceph.com/vshankar-2023-09-26_14:09:41-fs-wip-vshankar-testing-20230926.081818-testing-default-smithi/7402468/
the from the full fs suite passed.

Not sure if its related to the distro in use. rhel_8 passes but no the centos or ubuntu in my run. Could you please check @dparmar18?

I'm currently away so won't be able check atm but the three runs i had were on ubuntu 20.04, centos 9 and rhel 8 and all of them had passed:

Ubuntu: https://pulpito.ceph.com/dparmar-2023-09-26_19:06:28-fs:nfs-wip-62870-distro-default-smithi/

Centos: https://pulpito.ceph.com/dparmar-2023-09-22_15:26:51-fs:nfs-wip-62870-distro-default-smithi/

Rhel: https://pulpito.ceph.com/dparmar-2023-09-22_14:16:34-fs:nfs-wip-62870-distro-default-smithi/

Seems like something went wrong with the builds

vshankar · 2023-09-28T04:35:19Z

Seems like something went wrong with the builds

looks more of an infra related issue to me.

vshankar · 2023-09-28T05:50:33Z

https://pulpito.ceph.com/vshankar-2023-09-28_04:36:01-fs:nfs-wip-vshankar-testing-20230926.081818-testing-default-smithi/ @dparmar18

* refs/pull/53594/head: qa: fix "no orch backend set" in nfs suite Reviewed-by: Adam King <adking@redhat.com>

github-actions bot added the cephfs Ceph File System label Sep 22, 2023

dparmar18 force-pushed the wip-62870 branch 2 times, most recently from 955b646 to 53fa1d1 Compare September 22, 2023 11:00

dparmar18 mentioned this pull request Sep 22, 2023

mgr/nfs: enable reporting err status in Responder decorator, enhance err reporting in nfs export apply #53431

Merged

14 tasks

dparmar18 force-pushed the wip-62870 branch 2 times, most recently from 76492c3 to 690394d Compare September 22, 2023 14:14

github-actions bot added nfs tests labels Sep 22, 2023

dparmar18 force-pushed the wip-62870 branch from 690394d to 6e9c921 Compare September 22, 2023 15:24

dparmar18 marked this pull request as ready for review September 22, 2023 21:50

dparmar18 requested review from a team and adk3798 September 22, 2023 21:50

qa: fix "no orch backend set" in nfs suite

9e80a71

Fixes: https://tracker.ceph.com/issues/62870 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>

dparmar18 force-pushed the wip-62870 branch from 6e9c921 to 9e80a71 Compare September 22, 2023 21:52

dparmar18 added the needs-review label Sep 22, 2023

adk3798 approved these changes Sep 22, 2023

View reviewed changes

vshankar reviewed Sep 25, 2023

View reviewed changes

qa/suites/fs/nfs/cluster/1-node.yaml Show resolved Hide resolved

vshankar added the wip-vshankar-testing3 label Sep 26, 2023

vshankar approved these changes Sep 29, 2023

View reviewed changes

vshankar removed the wip-vshankar-testing3 label Sep 29, 2023

vshankar merged commit 6d8679e into ceph:main Sep 29, 2023

vshankar added a commit to vshankar/ceph that referenced this pull request Oct 7, 2023

Merge PR ceph#53594 into wip-vshankar-testing-20230926.081818

846bb2a

* refs/pull/53594/head: qa: fix "no orch backend set" in nfs suite Reviewed-by: Adam King <adking@redhat.com>

This was referenced Oct 10, 2023

reef: qa: move nfs (mgr/nfs) related tests to fs suite #53906

Merged

quincy: qa: move nfs (mgr/nfs) related tests to fs suite #53907

Merged

Conversation

dparmar18 commented Sep 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

vshankar commented Sep 22, 2023

Uh oh!

dparmar18 commented Sep 22, 2023

Uh oh!

dparmar18 commented Sep 22, 2023

Uh oh!

dparmar18 commented Sep 22, 2023

Uh oh!

dparmar18 commented Sep 25, 2023

Uh oh!

vshankar commented Sep 25, 2023

Uh oh!

Uh oh!

dparmar18 commented Sep 26, 2023

Uh oh!

vshankar commented Sep 26, 2023

Uh oh!

vshankar commented Sep 26, 2023

Uh oh!

dparmar18 commented Sep 26, 2023

Uh oh!

dparmar18 commented Sep 26, 2023

Uh oh!

vshankar commented Sep 27, 2023

Uh oh!

dparmar18 commented Sep 27, 2023

Uh oh!

vshankar commented Sep 27, 2023

Uh oh!

vshankar commented Sep 27, 2023

Uh oh!

vshankar commented Sep 27, 2023

Uh oh!

dparmar18 commented Sep 27, 2023

Uh oh!

vshankar commented Sep 28, 2023

Uh oh!

vshankar commented Sep 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dparmar18 commented Sep 22, 2023 •

edited

Loading