qa: Fix OSD thrasher bugs during test clean up#65063
qa: Fix OSD thrasher bugs during test clean up#65063bill-scales wants to merge 1 commit intoceph:mainfrom
Conversation
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check |
|
jenkins test make check arm64 |
9a3610d to
758df83
Compare
|
jenkins test make check |
The OSD thrasher can get stuck for 30 minutes searching for a pool when the pools have all been deleted, it should terminate the loop if the thrasher is told to stop The OSD thrasher can cause an exception if a pool is deleted between querying the list of pools and choosing a PG from the pool. Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
758df83 to
1364bdc
Compare
|
Updated following lots of teuthology runs to test this change, the problems are occurring when looking for a PG after deleting the pool. The thrasher has already been told to stop by this point but hasn't got round to noticing that. Therefore we just need to add some checks for stopping and give up on the current error inject These runs all include this fix: |
|
jenkins test make check |
|
jenkins test api |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
@kamoltat can you review this change, its a simple reliability improvement to the OSD thrasher by getting it to check if the thrasher is being stopped and exiting the error inject. Currently at the end of tests pools are deleted and this can cause the thrasher to fail the test because it can't find any pool/PG to inject an error. This change stops those failures happening. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
The OSD thrasher can get stuck for 30 minutes searching for a pool when the pools have all been deleted, it should terminate the loop if the thrasher is told to stop
The OSD thrasher can cause an exception if a pool is deleted between querying the list of pools and choosing a PG from the pool. If the PG list is empty the thrasher should look for another pool
Fixes: https://tracker.ceph.com/issues/71917
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition