Bug #69827
openAdmin socket command not delivered to mon OR thrash_store does not work
0%
Description
Observed during:
https://pulpito.ceph.com/akupczyk-2025-02-03_16:48:13-rados-aclamk-testing-nauvoo-2025-01-29-1806-b-distro-default-smithi/8112090/
I seems like admin socket command was delivered to mon.i .
Logs indicate that mon.i is up and running.
I am unable to see in the log if command was received and processed.
Nevertheless thrash_store function fails.
2025-02-04T01:33:57.551 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.i
2025-02-04T01:33:57.551 INFO:tasks.mon_thrash.mon_thrasher:thrashing mon.i store
2025-02-04T01:33:57.551 DEBUG:teuthology.orchestra.run.smithi078:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.i sync_force --yes-i-really-mean-it
.....
2025-02-04T01:34:13.731 INFO:teuthology.orchestra.run.smithi078.stderr:Error ENXIO: problem getting command descriptions from mon.i
2025-02-04T01:34:13.735 DEBUG:teuthology.orchestra.run:got remote process result: 6
2025-02-04T01:34:13.735 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_d863f9e916bc083d1553fc081d7170efbf9c8a47/qa/tasks/mon_thrash.py", line 272, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/git.ceph.com_ceph-c_d863f9e916bc083d1553fc081d7170efbf9c8a47/qa/tasks/mon_thrash.py", line 351, in _do_thrash
self.thrash_store(mon)
File "/home/teuthworker/src/git.ceph.com_ceph-c_d863f9e916bc083d1553fc081d7170efbf9c8a47/qa/tasks/mon_thrash.py", line 195, in thrash_store
out = self.manager.raw_cluster_cmd(
File "/home/teuthworker/src/git.ceph.com_ceph-c_d863f9e916bc083d1553fc081d7170efbf9c8a47/qa/tasks/ceph_manager.py", line 1696, in raw_cluster_cmd
return self.run_cluster_cmd(**kwargs).stdout.getvalue()
......
teuthology.exceptions.CommandFailedError: Command failed on smithi078 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.i sync_force --yes-i-really-mean-it'
Updated by Radoslaw Zarzynski about 1 year ago · Edited
- Assignee set to chunmei liu
Hi Chunmei! Would you mind taking a look?
Very low priority.
Updated by Laura Flores about 1 year ago
Hey @chunmei liu have you had a chance to look at this one yet? Writing from bug scrub.
Updated by Radoslaw Zarzynski about 1 year ago
There is no many place where `ENXIO` can be generated. The only interesting from this case's perspective is in `MonClient`:
int monc_error_category::from_code(int ev) const noexcept {
// ...
case monc_errc::mon_unavailable:
return -ENXIO;
}
return -EDOM;
}
The error is triggered locally, on client which explain why the monitor is in so good shape.
void MonClient::_send_command(MonCommand *r)
{
if (r->is_tell()) {
++r->send_attempts;
if (r->send_attempts > cct->_conf->mon_client_directed_command_retry) {
_finish_command(r, monc_errc::mon_unavailable, "mon unavailable", {});
return;
}
Just a network issue?
Updated by Laura Flores 11 months ago
/a/lflores-2025-04-11_19:10:45-rados-wip-lflores-testing-3-2025-04-11-1140-distro-default-smithi/8236071
2025-04-11T22:03:32.727 INFO:teuthology.orchestra.run.smithi033.stderr:2025-04-11T22:03:32.725+0000 7f790affd640 1 -- 172.21.15.33:0/2670119760 >> v1:172.21.15.33:6789/0 conn(0x7f78fc025e50 legacy=0x7f78fc023b90 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2025-04-11T22:03:32.727 INFO:teuthology.orchestra.run.smithi033.stderr:2025-04-11T22:03:32.725+0000 7f790affd640 1 -- 172.21.15.33:0/2670119760 >> v1:172.21.15.33:6789/0 conn(0x7f78fc025e50 legacy=0x7f78fc023b90 unknown :-1 s=STATE_CLOSED l=1).mark_down
2025-04-11T22:03:32.727 INFO:teuthology.orchestra.run.smithi033.stderr:Error ENXIO: problem getting command descriptions from mon.a
2025-04-11T22:03:32.730 DEBUG:teuthology.orchestra.run:got remote process result: 6
2025-04-11T22:03:32.731 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_53db63c25c110e0783b285a87917757e2216986c/qa/tasks/mon_thrash.py", line 272, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph-c_53db63c25c110e0783b285a87917757e2216986c/qa/tasks/mon_thrash.py", line 351, in _do_thrash
self.thrash_store(mon)
File "/home/teuthworker/src/github.com_ceph_ceph-c_53db63c25c110e0783b285a87917757e2216986c/qa/tasks/mon_thrash.py", line 195, in thrash_store
out = self.manager.raw_cluster_cmd(
File "/home/teuthworker/src/github.com_ceph_ceph-c_53db63c25c110e0783b285a87917757e2216986c/qa/tasks/ceph_manager.py", line 1696, in raw_cluster_cmd
return self.run_cluster_cmd(**kwargs).stdout.getvalue()
File "/home/teuthworker/src/github.com_ceph_ceph-c_53db63c25c110e0783b285a87917757e2216986c/qa/tasks/ceph_manager.py", line 1687, in run_cluster_cmd
return self.controller.run(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_dc15ac4a651ab5968cb4ffaa9ef0ff1a02484ea2/teuthology/orchestra/remote.py", line 535, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_dc15ac4a651ab5968cb4ffaa9ef0ff1a02484ea2/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_dc15ac4a651ab5968cb4ffaa9ef0ff1a02484ea2/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_dc15ac4a651ab5968cb4ffaa9ef0ff1a02484ea2/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi033 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.a sync_force --yes-i-really-mean-it'
Updated by Radoslaw Zarzynski 11 months ago
scrub note: observing for reoccurrences.
Updated by Laura Flores 11 months ago
/a/lflores-2025-04-11_19:10:45-rados-wip-lflores-testing-3-2025-04-11-1140-distro-default-smithi/8236071
Updated by Laura Flores 9 months ago
/a/skanta-2025-06-12_09:21:16-rados-wip-bharath2-testing-2025-06-10-0545-distro-default-smithi/8324808
2025-06-14T18:41:22.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:22.347+0000 7f3c7e7fc640 20 monclient: _un_backoff reopen_interval_multipler now 1
2025-06-14T18:41:22.351 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:22.351+0000 7f3c9ed76640 1 -- 172.21.15.16:0/3998766958 >> v1:172.21.15.16:6790/0 conn(0x7f3c6408ce90 legacy=0x7f3c6408d290 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).operator() setting up a delay queue
2025-06-14T18:41:23.354 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:23.347+0000 7f3c7e7fc640 10 monclient: tick
2025-06-14T18:41:23.354 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:23.347+0000 7f3c7e7fc640 10 monclient: _check_auth_tickets
2025-06-14T18:41:23.354 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:23.347+0000 7f3c7e7fc640 20 monclient: _check_auth_rotating not needed by client.admin
2025-06-14T18:41:24.349 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:24.347+0000 7f3c7e7fc640 10 monclient: tick
2025-06-14T18:41:24.349 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:24.347+0000 7f3c7e7fc640 10 monclient: _check_auth_tickets
2025-06-14T18:41:24.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:24.347+0000 7f3c7e7fc640 20 monclient: _check_auth_rotating not needed by client.admin
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 10 monclient: tick
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 10 monclient: _check_auth_tickets
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 20 monclient: _check_auth_rotating not needed by client.admin
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 5 monclient: _check_tell_commands timeout tell command 1
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 10 monclient: _finish_command 1 = monc:6 mon unavailable
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 1 -- 172.21.15.16:0/3998766958 >> v1:172.21.15.16:6790/0 conn(0x7f3c6408ce90 legacy=0x7f3c6408d290 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c7e7fc640 1 -- 172.21.15.16:0/3998766958 >> v1:172.21.15.16:6790/0 conn(0x7f3c6408ce90 legacy=0x7f3c6408d290 unknown :-1 s=STATE_CLOSED l=1).mark_down
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:2025-06-14T18:41:25.347+0000 7f3c9f577640 1 -- 172.21.15.16:0/3998766958 reap_dead start
2025-06-14T18:41:25.350 INFO:teuthology.orchestra.run.smithi016.stderr:Error ENXIO: problem getting command descriptions from mon.b
2025-06-14T18:41:25.355 DEBUG:teuthology.orchestra.run:got remote process result: 6
2025-06-14T18:41:25.356 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_faff8cee24632b6600c7f987139f29c792e0096c/qa/tasks/mon_thrash.py", line 272, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph-c_faff8cee24632b6600c7f987139f29c792e0096c/qa/tasks/mon_thrash.py", line 351, in _do_thrash
self.thrash_store(mon)
File "/home/teuthworker/src/github.com_ceph_ceph-c_faff8cee24632b6600c7f987139f29c792e0096c/qa/tasks/mon_thrash.py", line 195, in thrash_store
out = self.manager.raw_cluster_cmd(
File "/home/teuthworker/src/github.com_ceph_ceph-c_faff8cee24632b6600c7f987139f29c792e0096c/qa/tasks/ceph_manager.py", line 1696, in raw_cluster_cmd
return self.run_cluster_cmd(**kwargs).stdout.getvalue()
File "/home/teuthworker/src/github.com_ceph_ceph-c_faff8cee24632b6600c7f987139f29c792e0096c/qa/tasks/ceph_manager.py", line 1687, in run_cluster_cmd
return self.controller.run(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/remote.py", line 535, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 461, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi016 with status 6: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mon.b sync_force --yes-i-really-mean-it'
Updated by Radoslaw Zarzynski 9 months ago
If we can exclude mon.b being the dead, it might be just a networking issue.
Updated by Laura Flores 9 months ago
/a/yuriw-2025-06-28_18:55:21-rados-wip-yuri-testing-2025-06-28-0812-distro-default-smithi/8355184
Updated by Radoslaw Zarzynski 9 months ago
- Assignee set to Aishwarya Mathuria
Hi @Aishwarya Mathuria! Would you mind taking a look?
Updated by Laura Flores 9 months ago · Edited
So far, the only occurrences have been on main.
Updated by Radoslaw Zarzynski 8 months ago
- Assignee changed from Aishwarya Mathuria to Radoslaw Zarzynski
Updated by Laura Flores 7 months ago
/a/skanta-2025-08-05_03:51:26-rados-wip-bharath9-testing-2025-08-05-0506-distro-default-smithi/8424384
Updated by Radoslaw Zarzynski 4 months ago
- Related to Bug #45761: mon_thrasher: "Error ENXIO: mon unavailable" during sync_force command leads to "failed to reach quorum size 9 before timeout expired" added