qa: Fix test_cephfs_mirror_restart_sync_on_blocklist#67385
qa: Fix test_cephfs_mirror_restart_sync_on_blocklist#67385
Conversation
With the new retry logic introduced in PR 7305, the sleep was removed. However, the retry mechanism currently triggers only for AssertionError, KeyError, and IndexError. The command can also fail with CommandFailedError, as observed in the blocklist test case. In this scenario, FSMirror initialization has not yet started, but the command is still issued, leading to failure. Therefore, add CommandFailedError to the list of retryable exceptions. Fixes: https://tracker.ceph.com/issues/74998 Signed-off-by: Kotresh HR <khiremat@redhat.com>
bca797d to
ef83f04
Compare
|
Fixed commit message typos |
|
|
||
|
|
||
| # Exceptions to retry in test assertions | ||
| RETRY_EXCEPTIONS = (AssertionError, KeyError, IndexError, CommandFailedError) |
There was a problem hiding this comment.
OK, so CommandFailedError due to the mirror daemon being forcefully stopped (in the test) is expected. Should we pass in a list of exceptions (from the caller) to this routine rather than doing it globally?
There was a problem hiding this comment.
I don't think that makes sense. It's time bound retry of commands, ideally we should be catching all Exception and retry until timeout as we expect within timeout, the assertions are true but I didn't want to do it and wanted to catch specific exceptions.
There was a problem hiding this comment.
I am worried about cases which we might miss. In that case, let's just catch all Exceptions so that we don't have to keep on adjusting the exception list?
There was a problem hiding this comment.
Ok. So, I rechecked the change again. It's fine to have the exception list in the retry function that got added recently.
However, I'm a bit surprised that some parts of the test does a assert check and then wraps it under try...catch. We should clean that up in a different PR.
There was a problem hiding this comment.
However, I'm a bit surprised that some parts of the test does a assert check and then wraps it under > try...catch. We should clean that up in a different PR.
I didn't get, which ones?
There was a problem hiding this comment.
There was a problem hiding this comment.
Hey no, I have added those. This is to capture the failure reason for the assert to log those in retry decorator. e.g., if a peer status command runs 14 times before achieving desired state and succeeds 15th time, the decorator logs the failure reason for those 14 times. Just a debugging helper. It's logs something like below. Not only that, it also helps logging the final assertion failure after max retries.
2026-02-18T11:10:37.168 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 1 failed (KeyError), retrying...
2026-02-18T11:10:47.445 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 2 failed (KeyError), retrying...
2026-02-18T11:10:57.735 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 3 failed (KeyError), retrying...
2026-02-18T11:11:08.017 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 4 failed (KeyError), retrying...
2026-02-18T11:11:18.444 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 5 failed (KeyError), retrying...
2026-02-18T11:11:28.860 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 6 failed (KeyError), retrying...
2026-02-18T11:11:39.894 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 7 failed (KeyError), retrying...
2026-02-18T11:11:50.567 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 8 failed (KeyError), retrying...
2026-02-18T11:12:01.103 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 9 failed (KeyError), retrying...
2026-02-18T11:12:11.476 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 10 failed (KeyError), retrying...
2026-02-18T11:12:21.963 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 11 failed (KeyError), retrying...
2026-02-18T11:12:33.205 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 12 failed (KeyError), retrying...
2026-02-18T11:12:44.720 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 13 failed (KeyError), retrying...
2026-02-18T11:12:55.852 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 14 failed (KeyError), retrying...
|
jenkins test windows |
1 similar comment
|
jenkins test windows |
|
Run using this branch along with PR #67335 |
* refs/pull/67385/head: Reviewed-by: Venky Shankar <vshankar@redhat.com>
|
This is an automated message by src/script/redmine-upkeep.py. I have resolved the following tracker ticket due to the merge of this PR: No backports are pending for the ticket. If this is incorrect, please update the tracker Update Log: https://github.com/ceph/ceph/actions/runs/22427614479 |
With new retry logic added removing the sleep with PR 7305, it retrying only when there is AssertionError, KeyError, IndexError. But it the command could also fail with CommandFailedError as in blocklist testcase, ithe FSMirror Init is not even started, but the command is issued. So add CommandFailedError exception to rety list of exceptions
Fixes: https://tracker.ceph.com/issues/74998
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.