Skip to content

qa/tasks: disable agent at end of iscsi test#66613

Open
ljflores wants to merge 1 commit intoceph:mainfrom
ljflores:wip-fix-iscsi-test
Open

qa/tasks: disable agent at end of iscsi test#66613
ljflores wants to merge 1 commit intoceph:mainfrom
ljflores:wip-fix-iscsi-test

Conversation

@ljflores
Copy link
Member

@ljflores ljflores commented Dec 11, 2025

The cephadm agent is sometimes deployed during orch tests (determined by these yaml files: https://github.com/ceph/ceph/tree/main/qa/suites/orch/cephadm/smoke/agent).

During the test_iscsi_container task, we execute several workunits to test iscsi. Throughout the test, if the agent was deployed, it sends metadata to a cephadm endpoint. At the end of the test, we disable cephadm and shut down the cluster. However, the agent is still deployed and trying to connect to cephadm, which it fails at since cephadm was disabled.

The solution is to disable the agent during teardown via:

ceph config set mgr mgr/cephadm/use_agent false

Once set to false, the agent service will be stopped and the daemons will be removed automatically by cephadm.

Fixes: https://tracker.ceph.com/issues/68586

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

The cephadm agent is sometimes deployed during orch tests
(determined by these yaml files: https://github.com/ceph/ceph/tree/main/qa/suites/orch/cephadm/smoke/agent).

During the `test_iscsi_container` task, we execute several workunits
to test iscsi. Throughout the test, if the agent was deployed,
it sends metadata to a cephadm endpoint. At the end of the
test, we disable cephadm and shut down the cluster. However, the agent
is still deployed and trying to connect to cephadm, which it fails at
since cephadm was disabled.

The solution is to disable the agent during teardown via:

```
ceph config set mgr mgr/cephadm/use_agent false
```

Once set to false, the agent service will be stopped and the daemons
will be removed automatically by cephadm.

Fixes: https://tracker.ceph.com/issues/68586
Signed-off-by: Laura Flores <lflores@ibm.com>
@ljflores ljflores requested a review from a team as a code owner December 11, 2025 21:56
@ljflores ljflores requested review from adk3798 and rkachach December 11, 2025 21:57
@ljflores
Copy link
Member Author

jenkins test make check

@ljflores
Copy link
Member Author

Thanks for the feedback. I'll run some tests on this once the lab migration is complete. I would rather to this than put it into one of Yuri's batches since I'm not 100% confident that this will function how I expect it to.

@ljflores
Copy link
Member Author

jenkins test make check

Copy link
Contributor

@rkachach rkachach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rkachach
Copy link
Contributor

rkachach commented Feb 6, 2026

@ljflores did you manage to a have green/good run with this change? if that's the case we can go ahead and merge the PR.

@ljflores
Copy link
Member Author

ljflores commented Feb 6, 2026

@ljflores did you manage to a have green/good run with this change? if that's the case we can go ahead and merge the PR.

Oops, I need to return to this. I had scheduled some runs that failed from lab issues. Trying again.

@ljflores
Copy link
Member Author

ljflores commented Feb 6, 2026

The test failed because teuthology deletes the client.admin.keyring before the config change command executes in my patch. I will need to rework this. I can see now that the config change needs to occur right at the end of the last task, before anything in the cluster is deleted.

https://pulpito.ceph.com/lflores-2026-02-06_22:21:29-rados:cephadm-main-distro-default-trial/38542

2026-02-06T22:34:50.516 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2026-02-06T22:34:50.532 INFO:tasks.cephadm:Teardown begin
2026-02-06T22:34:50.533 DEBUG:teuthology.orchestra.run.trial128:> sudo rm -f /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
2026-02-06T22:34:50.581 INFO:tasks.cephadm:Tearing down any agents...
2026-02-06T22:34:50.581 DEBUG:teuthology.orchestra.run.trial128:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:0b9fcd80bbdbcc604152c9a96bae005a1e16537a shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid b7a248f3-03ab-11f1-bd0f-d404e6e7d460 -- ceph config set mgr mgr/cephadm/use_agent false
2026-02-06T22:34:50.893 INFO:teuthology.orchestra.run.trial128.stderr:Inferring config /var/lib/ceph/b7a248f3-03ab-11f1-bd0f-d404e6e7d460/mon.a/config
2026-02-06T22:34:50.910 INFO:teuthology.orchestra.run.trial128.stderr:Error: statfs /etc/ceph/ceph.client.admin.keyring: no such file or directory
2026-02-06T22:34:50.936 DEBUG:teuthology.orchestra.run:got remote process result: 125

@rkachach
Copy link
Contributor

rkachach commented Feb 9, 2026

The test failed because teuthology deletes the client.admin.keyring before the config change command executes in my patch. I will need to rework this. I can see now that the config change needs to occur right at the end of the last task, before anything in the cluster is deleted.

https://pulpito.ceph.com/lflores-2026-02-06_22:21:29-rados:cephadm-main-distro-default-trial/38542

2026-02-06T22:34:50.516 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2026-02-06T22:34:50.532 INFO:tasks.cephadm:Teardown begin
2026-02-06T22:34:50.533 DEBUG:teuthology.orchestra.run.trial128:> sudo rm -f /etc/ceph/ceph.conf /etc/ceph/ceph.client.admin.keyring
2026-02-06T22:34:50.581 INFO:tasks.cephadm:Tearing down any agents...
2026-02-06T22:34:50.581 DEBUG:teuthology.orchestra.run.trial128:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:0b9fcd80bbdbcc604152c9a96bae005a1e16537a shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid b7a248f3-03ab-11f1-bd0f-d404e6e7d460 -- ceph config set mgr mgr/cephadm/use_agent false
2026-02-06T22:34:50.893 INFO:teuthology.orchestra.run.trial128.stderr:Inferring config /var/lib/ceph/b7a248f3-03ab-11f1-bd0f-d404e6e7d460/mon.a/config
2026-02-06T22:34:50.910 INFO:teuthology.orchestra.run.trial128.stderr:Error: statfs /etc/ceph/ceph.client.admin.keyring: no such file or directory
2026-02-06T22:34:50.936 DEBUG:teuthology.orchestra.run:got remote process result: 125

upssss I see, thanks @ljflores for re-testing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants