qa/tasks/quiescer: dump ops in parallel#57302
Conversation
|
Not yet tested. |
leonid-s-usov
left a comment
There was a problem hiding this comment.
Thanks for submitting this, Patrick!
leonid-s-usov
left a comment
There was a problem hiding this comment.
Looks good! It should be easy to test this with any of the upcoming teuthology batches by using this branch as the suite
|
jenkins test api |
|
jenkins test make check arm64 |
2 similar comments
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
This PR is under test in https://tracker.ceph.com/issues/65867. |
|
jenkins test make check arm64 |
qa/tasks/quiescer.py
Outdated
| try: | ||
| _ = self.fs.rank_tell(['ops', '--flags=locks', f'--path={daemon_path}'], rank=rank) | ||
| remote_dumps.append((info, remote_path)) | ||
| p = self.fs.rank_tell(['ops', '--flags=locks', f'--path={daemon_path}'], rank=rank, wait=False) |
There was a problem hiding this comment.
My mistake, I cannot do this:
2024-05-09T00:35:43.460 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't pull ops dump at '/var/run/ceph/b96c13bc-0d98-11ef-bc97-c7b262605968/ops-7749c26b-1-mds.i.json' on rank 2 (i), error: 'dict' object has no attribute 'wait'
There was a problem hiding this comment.
:'( I was looking forward to the PR... How hard is it to add the async capability to the rank_tell?
There was a problem hiding this comment.
Instead of using the helper rank_tell, probably just manually submit the command instead.
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this may take a long time to complete for each individual MDS. The entire quiesce set may timeout (and all q ops killed) before we finish dumping ops. Fixes: https://tracker.ceph.com/issues/65823 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
jenkins test make check arm64 |
seems to work as advertised now: /teuthology/pdonnell-2024-05-16_16:19:21-fs:workload-main-distro-default-smithi/7709343/teuthology.log |
Since this --flags=locks takes the mds_lock and dumps thousands of ops, this may take a long time to complete for each individual MDS. The entire quiesce set may timeout (and all q ops killed) before we finish dumping ops.
Fixes: https://tracker.ceph.com/issues/65823
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e