qa: Disable OSD benchmark from running for tests. by sseshasa · Pull Request #67058 · ceph/ceph

sseshasa · 2026-01-23T10:13:11Z

Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to help prevent a cluster warning pertaining to the IOPS value not lying within a typical threshold range from being raised.

The tests can rely on the built-in static values as defined by osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough.

Fixes: https://tracker.ceph.com/issues/74501
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to help prevent a cluster warning pertaining to the IOPS value not lying within a typical threshold range from being raised. The tests can rely on the built-in static values as defined by osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough. Fixes: https://tracker.ceph.com/issues/74501 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

sseshasa · 2026-01-23T10:33:54Z

Teuthology Runs:

Rados Suite:
Below is a re-run of failed tests from https://pulpito.ceph.com/lflores-2026-01-21_20:56:39-rados-main-distro-default-trial/
with this PR included:

https://pulpito.ceph.com/sseshasa-2026-01-23_05:28:15-rados-main-distro-default-trial/

orch/cephadm Suite:
https://pulpito.ceph.com/sseshasa-2026-01-23_08:23:46-orch:cephadm-main-distro-default-trial/

Both runs do not show the OSD bench related cluster warnings.

batrick · 2026-01-23T17:58:47Z

Is this really desirable as a global config @rzarzynski @ljflores ? Shouldn't there be test coverage for the benchmark somewhere?

djgalloway · 2026-01-26T16:08:48Z

Why are we just disabling tests instead of fixing them? What does the warning mean? Have storage devices just progressed enough that that max limit needs to be increased?

rzarzynski · 2026-01-26T17:14:13Z

@batrick: I think @sseshasa has applied a global change for a global issue. The benchmark was run at every OSDs in every job to determine a fundamental constant for mClock. It's not a test per se.

@djgalloway: the reason is simple – an issue which likely is just a minor (tuning expected boundaries) generates so much noise ("Jobs: see 125 failed; Logs: https://pulpito.ceph.com/yuriw-2026-01-21_19:35:54-orch-reef-release-distro-default-trial/") that overloads reviewing of QA runs / potentially hides other problems.

I'm fine with the merge as a makeshift workaround to let Sridhar analyze the problem and come with a fix. I agree we should revert this commit ultimately.

batrick · 2026-01-27T02:22:37Z

I fear this will be forgotten. I'm not sure a revert is necessary but the mclock QA tests should have this turned on explicitly. (Anywhere else too?)

sseshasa · 2026-01-27T03:42:14Z

@batrick As @rzarzynski mentioned, the bench test was triggered as part of every test when OSD(s) are brought up. It's not associated with the teuthology job. The bench test is performed to a get an idea of the IOPS capacity of an OSD from the objectstore's perspective. The mClock scheduler eventually consumes this for allocating a specific quantum to different services on the OSD. The OSD bench test itself is an existing tool which has its own standalone test and is leveraged for mClock's purpose.

For teuthology tests, this is not a 'must have' as we don't use it to test the scheduling aspect of mClock. A generic static value is sufficient for teuthology tests to run. There are deterministic tests for mClock scheduling that we run outside of teuthology on machines whose environment is known and can control. We use CBT for this purpose.

But there are cases where the bench test throws up unrealistic IOPS measurements and to catch this a threshold range is defined based on the underlying device type. In the trial machines, this threshold was breached (>80K IOPS) leading to the cluster warning. It's apparent that the devices on these machines are significantly faster than what was present on the smithi machines.

In addition to bumping up the threshold values, which seems reasonable in this case, I am looking into OSD bench tool and possibly the fio_objectstore_tool to improve the consistency and predictability of results. The important thing to note here is that we need to use a tool that closely mimics IOs with the objectstore in place in order to get a reasonably good estimate of the IOPS capacity at the OSD layer. We can use https://tracker.ceph.com/issues/74567 to track the progress.

rzarzynski · 2026-01-27T11:25:46Z

As https://tracker.ceph.com/issues/74501 is "consumed" by this PR, it has been copied into https://tracker.ceph.com/issues/74567 yesterday – we shouldn't forget.

sseshasa · 2026-01-27T14:07:55Z

As https://tracker.ceph.com/issues/74501 is "consumed" by this PR, it has been copied into https://tracker.ceph.com/issues/74567 yesterday – we shouldn't forget.

I updated my comment above to mention the correct tracker. Thanks for pointing it out.

djgalloway · 2026-01-27T14:40:04Z

I guess my point was - even if it's not related to a test, if we're hitting this condition, users/customers could be too and best to not bury our heads in the sand. I'm happy to see a separate tracker opened to investigate alternatives to ignoring the warning.

markhpc · 2026-01-29T15:59:11Z

Agreed, I just saw this and am slightly amazed this was already merged.

github-actions bot added the tests label Jan 23, 2026

sseshasa requested review from ljflores, neha-ojha and rzarzynski January 23, 2026 10:14

ljflores requested a review from aclamk January 23, 2026 16:25

ljflores approved these changes Jan 23, 2026

View reviewed changes

sseshasa merged commit 84d5b44 into ceph:main Jan 23, 2026
22 of 26 checks passed

This was referenced Jan 23, 2026

squid: qa: Disable OSD benchmark from running for tests. #67066

Merged

reef: qa: Disable OSD benchmark from running for tests. #67067

Merged

tentacle: qa: Disable OSD benchmark from running for tests. #67068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa: Disable OSD benchmark from running for tests.#67058

qa: Disable OSD benchmark from running for tests.#67058
sseshasa merged 1 commit intoceph:mainfrom
sseshasa:wip-fix-iops-threshold-warning-74501

sseshasa commented Jan 23, 2026 •

edited

Loading

Uh oh!

sseshasa commented Jan 23, 2026

Uh oh!

Uh oh!

batrick commented Jan 23, 2026

Uh oh!

djgalloway commented Jan 26, 2026

Uh oh!

rzarzynski commented Jan 26, 2026

Uh oh!

batrick commented Jan 27, 2026

Uh oh!

sseshasa commented Jan 27, 2026 •

edited

Loading

Uh oh!

rzarzynski commented Jan 27, 2026

Uh oh!

sseshasa commented Jan 27, 2026

Uh oh!

djgalloway commented Jan 27, 2026

Uh oh!

markhpc commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

sseshasa commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

sseshasa commented Jan 23, 2026

Teuthology Runs:

Uh oh!

Uh oh!

batrick commented Jan 23, 2026

Uh oh!

djgalloway commented Jan 26, 2026

Uh oh!

rzarzynski commented Jan 26, 2026

Uh oh!

batrick commented Jan 27, 2026

Uh oh!

sseshasa commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented Jan 27, 2026

Uh oh!

sseshasa commented Jan 27, 2026

Uh oh!

djgalloway commented Jan 27, 2026

Uh oh!

markhpc commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sseshasa commented Jan 23, 2026 •

edited

Loading

sseshasa commented Jan 27, 2026 •

edited

Loading