qa/clusters/crimson: increase reactors in fixed-1 cluster#66307
qa/clusters/crimson: increase reactors in fixed-1 cluster#66307shraddhaag merged 1 commit intoceph:mainfrom
Conversation
Matan-B
left a comment
There was a problem hiding this comment.
LGTM!
The PR description is well-put, adding some of the information to the commit message could be useful for future "git blame".
nit, This could be a good opportunity to also increase the reactor count in "fixed-2" clusters. However, since this PR is already tested - we can follow up on this later.
Issue: Various different tests were failing randomly due to slow ops. There was no common ground between them, it was happening across differnet object stores (seastore and bluestore) and across different tests. Cause: Since this is happening quite randomly, this is likely happening due to low reactor count. Solution: We are opting the solution to increase reactors used for testing. I've increased them to 3 from the initial 2 value. Fixes: https://tracker.ceph.com/issues/72778 Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
f8e205e to
7c5ecc1
Compare
|
@Matan-B you're right. I missed updating the inital commit message. All done now! As for the reactor count in "fixed-2" cluster, its already set to 3, thats why I didn't increase it. Please see:https://github.com/ceph/ceph/blob/main/qa/clusters/crimson/crimson-fixed-2.yaml#L9. If you think we need to increase this further, I can open up a followup PR for the same! |
|
This is an automated message by src/script/redmine-upkeep.py. I have resolved the following tracker ticket due to the merge of this PR: No backports are pending for the ticket. If this is incorrect, please update the tracker Update Log: https://github.com/ceph/ceph/actions/runs/19493209802 |
We expect "real" clusters to be deployed with much higher cpu allocated per osd (32/64). We should try to increase this value in out testing environment as much as possible. The issue is that the machines used for testing are shared and can't allocate this many cores. |
The investigation for the issue can be found here: https://tracker.ceph.com/issues/72778
TLDR: Various different tests were failing randomly due to slow ops. There was no common ground between them, it was happening across differnet object stores (seastore and bluestore) and across different tests.
We are opting the solution to increase reactors used for testing. I've increased them to 3 from the initial 2 value.
I've tested out the fix on the test that was failing the most: https://pulpito.ceph.com/shraddhaag-2025-11-18_08:29:50-crimson-rados-main-distro-crimson-debug-smithi/8609146/
Fixes: https://tracker.ceph.com/issues/72778
Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.