Skip to content

qa/clusters/crimson: increase reactors in fixed-1 cluster#66307

Merged
shraddhaag merged 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-fix-slow-ops
Nov 19, 2025
Merged

qa/clusters/crimson: increase reactors in fixed-1 cluster#66307
shraddhaag merged 1 commit intoceph:mainfrom
shraddhaag:wip-shraddhaag-fix-slow-ops

Conversation

@shraddhaag
Copy link
Contributor

@shraddhaag shraddhaag commented Nov 18, 2025

The investigation for the issue can be found here: https://tracker.ceph.com/issues/72778

TLDR: Various different tests were failing randomly due to slow ops. There was no common ground between them, it was happening across differnet object stores (seastore and bluestore) and across different tests.

We are opting the solution to increase reactors used for testing. I've increased them to 3 from the initial 2 value.

I've tested out the fix on the test that was failing the most: https://pulpito.ceph.com/shraddhaag-2025-11-18_08:29:50-crimson-rados-main-distro-crimson-debug-smithi/8609146/

Fixes: https://tracker.ceph.com/issues/72778

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@github-actions github-actions bot added the tests label Nov 18, 2025
@shraddhaag shraddhaag marked this pull request as ready for review November 18, 2025 14:13
@shraddhaag shraddhaag requested a review from Matan-B November 18, 2025 14:13
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
The PR description is well-put, adding some of the information to the commit message could be useful for future "git blame".

nit, This could be a good opportunity to also increase the reactor count in "fixed-2" clusters. However, since this PR is already tested - we can follow up on this later.

Issue: Various different tests were failing randomly due to slow
ops. There was no common ground between them, it was happening
across differnet object stores (seastore and bluestore) and
across different tests.

Cause: Since this is happening quite randomly, this is likely
happening due to low reactor count.

Solution: We are opting the solution to increase reactors used
for testing. I've increased them to 3 from the initial 2 value.

Fixes: https://tracker.ceph.com/issues/72778
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>
@shraddhaag shraddhaag force-pushed the wip-shraddhaag-fix-slow-ops branch from f8e205e to 7c5ecc1 Compare November 18, 2025 15:01
@shraddhaag
Copy link
Contributor Author

@Matan-B you're right. I missed updating the inital commit message. All done now!

As for the reactor count in "fixed-2" cluster, its already set to 3, thats why I didn't increase it. Please see:https://github.com/ceph/ceph/blob/main/qa/clusters/crimson/crimson-fixed-2.yaml#L9. If you think we need to increase this further, I can open up a followup PR for the same!

@shraddhaag shraddhaag merged commit d1eb244 into ceph:main Nov 19, 2025
13 checks passed
@github-actions
Copy link

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/19493209802

@Matan-B
Copy link
Contributor

Matan-B commented Nov 19, 2025

As for the reactor count in "fixed-2" cluster, its already set to 3, thats why I didn't increase it. Please see:https://github.com/ceph/ceph/blob/main/qa/clusters/crimson/crimson-fixed-2.yaml#L9. If you think we need to increase this further, I can open up a followup PR for the same!

We expect "real" clusters to be deployed with much higher cpu allocated per osd (32/64). We should try to increase this value in out testing environment as much as possible. The issue is that the machines used for testing are shared and can't allocate this many cores.
fixed-2 clusters are using separate 2 nodes, so there are fewer OSDs on each node - therefore this value was a bit higher in these tests. I think we should be able to also increase it. What do you think about trying to increase to 4 and 8 respectively (fixed-1, fixed-2)?

@Matan-B Matan-B moved this from Tested to Merged (Main) in Crimson Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merged (Main)

Development

Successfully merging this pull request may close these issues.

2 participants