Bug #54511
closedtest_pool_min_size: AssertionError: not clean before minsize thrashing starts
0%
Description
/a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6719015
2022-03-04T03:06:27.624 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 896, in test_pool_min_size
'not clean before minsize thrashing starts'
AssertionError: not clean before minsize thrashing starts
2022-03-04T03:06:27.625 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1280, in do_thrash
self._do_thrash()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 896, in test_pool_min_size
'not clean before minsize thrashing starts'
AssertionError: not clean before minsize thrashing starts
This error occurs at the early stage of `test_pool_min_size`, where it checks if all the PGs are active+clean after spending at most 60 seconds waiting for PGs to be in active+clean,
Updated by Aishwarya Mathuria almost 4 years ago
/a/yuriw-2022-03-29_21:35:32-rados-wip-yuri5-testing-2022-03-29-1152-quincy-distro-default-smithi/6767633
Updated by Radoslaw Zarzynski almost 4 years ago
Need to observe more thrashers/minsize_recovery where this issue happens.
Updated by Radoslaw Zarzynski almost 4 years ago
- Related to Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after waiting for 25 seconds added
Updated by Laura Flores almost 4 years ago
- Related to Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to down PGs added
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
/a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910103/
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
- Description updated (diff)
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
- Description updated (diff)
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 47138
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931/commits/1f6bcbb3d680d8589e498b993d2cf566480e2c3e.
Runs I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py:
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921351
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921372
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921374
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921382
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921383
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921385
Problem
We didn’t give enough buffer between starting an osd backup and actually checking for active+clean. The pgs passed ceph_manager.wait_for_recovery and ceph_manager.wait_for_clean because recovery hasn’t start yet and eventually failed at ceph_manager.is_clean(). My analysis can be found here:
https://docs.google.com/document/d/1HKQc5kO-A9c7ThYTGtUlgTliYyfy__0tFXXs2KHLsZg/edit
Solution
Time out for 60 seconds before ceph_manager.wait_for_recovery + ceph_manager.wait_for_clean.
Updated by Neha Ojha over 3 years ago
- Assignee set to Kamoltat (Junior) Sirivadhna
Updated by Kamoltat (Junior) Sirivadhna over 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to quincy, pacific
Updated by Upkeep Bot over 3 years ago
- Copied to Backport #57019: quincy: test_pool_min_size: AssertionError: not clean before minsize thrashing starts added
Updated by Upkeep Bot over 3 years ago
- Copied to Backport #57020: pacific: test_pool_min_size: AssertionError: not clean before minsize thrashing starts added
Updated by Kamoltat (Junior) Sirivadhna almost 3 years ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 8 months ago
- Merge Commit set to dc218e4b6bd050d93c061ea1a280fac9e9b4b2aa
- Fixed In set to v17.0.0-14022-gdc218e4b6bd
- Released In set to v18.2.0~1657
- Upkeep Timestamp set to 2025-07-14T06:47:31+00:00