Skip to content

qa: Fix test_cephfs_mirror_restart_sync_on_blocklist#67385

Merged
vshankar merged 1 commit intoceph:mainfrom
kotreshhr:fix-mirror-qa-blocklist
Feb 26, 2026
Merged

qa: Fix test_cephfs_mirror_restart_sync_on_blocklist#67385
vshankar merged 1 commit intoceph:mainfrom
kotreshhr:fix-mirror-qa-blocklist

Conversation

@kotreshhr
Copy link
Contributor

@kotreshhr kotreshhr commented Feb 18, 2026

With new retry logic added removing the sleep with PR 7305, it retrying only when there is AssertionError, KeyError, IndexError. But it the command could also fail with CommandFailedError as in blocklist testcase, ithe FSMirror Init is not even started, but the command is issued. So add CommandFailedError exception to rety list of exceptions

Fixes: https://tracker.ceph.com/issues/74998

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

With the new retry logic introduced in PR 7305, the sleep
was removed. However, the retry mechanism currently
triggers only for AssertionError, KeyError, and IndexError.

The command can also fail with CommandFailedError, as
observed in the blocklist test case. In this scenario,
FSMirror initialization has not yet started, but the
command is still issued, leading to failure.

Therefore, add CommandFailedError to the list of
retryable exceptions.

Fixes: https://tracker.ceph.com/issues/74998
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr
Copy link
Contributor Author

Fixed commit message typos



# Exceptions to retry in test assertions
RETRY_EXCEPTIONS = (AssertionError, KeyError, IndexError, CommandFailedError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so CommandFailedError due to the mirror daemon being forcefully stopped (in the test) is expected. Should we pass in a list of exceptions (from the caller) to this routine rather than doing it globally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that makes sense. It's time bound retry of commands, ideally we should be catching all Exception and retry until timeout as we expect within timeout, the assertions are true but I didn't want to do it and wanted to catch specific exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about cases which we might miss. In that case, let's just catch all Exceptions so that we don't have to keep on adjusting the exception list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So, I rechecked the change again. It's fine to have the exception list in the retry function that got added recently.

However, I'm a bit surprised that some parts of the test does a assert check and then wraps it under try...catch. We should clean that up in a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm a bit surprised that some parts of the test does a assert check and then wraps it under > try...catch. We should clean that up in a different PR.

I didn't get, which ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@kotreshhr kotreshhr Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey no, I have added those. This is to capture the failure reason for the assert to log those in retry decorator. e.g., if a peer status command runs 14 times before achieving desired state and succeeds 15th time, the decorator logs the failure reason for those 14 times. Just a debugging helper. It's logs something like below. Not only that, it also helps logging the final assertion failure after max retries.

2026-02-18T11:10:37.168 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 1 failed (KeyError), retrying...
2026-02-18T11:10:47.445 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 2 failed (KeyError), retrying...
2026-02-18T11:10:57.735 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 3 failed (KeyError), retrying...
2026-02-18T11:11:08.017 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 4 failed (KeyError), retrying...
2026-02-18T11:11:18.444 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 5 failed (KeyError), retrying...
2026-02-18T11:11:28.860 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 6 failed (KeyError), retrying...
2026-02-18T11:11:39.894 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 7 failed (KeyError), retrying...
2026-02-18T11:11:50.567 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 8 failed (KeyError), retrying...
2026-02-18T11:12:01.103 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 9 failed (KeyError), retrying...
2026-02-18T11:12:11.476 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 10 failed (KeyError), retrying...
2026-02-18T11:12:21.963 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 11 failed (KeyError), retrying...
2026-02-18T11:12:33.205 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 12 failed (KeyError), retrying...
2026-02-18T11:12:44.720 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 13 failed (KeyError), retrying...
2026-02-18T11:12:55.852 DEBUG:tasks.cephfs.test_mirroring:[retry_assert] check_peer_status: attempt 14 failed (KeyError), retrying...

@kotreshhr
Copy link
Contributor Author

jenkins test windows

1 similar comment
@kotreshhr
Copy link
Contributor Author

jenkins test windows

@kotreshhr
Copy link
Contributor Author

vshankar added a commit to vshankar/ceph that referenced this pull request Feb 25, 2026
* refs/pull/67385/head:

Reviewed-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar merged commit 4854247 into ceph:main Feb 26, 2026
13 checks passed
@github-actions
Copy link

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22427614479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants