reef: osd: Apply randomly selected scheduler type across all OSD shards by sseshasa · Pull Request #54981 · ceph/ceph

sseshasa · 2023-12-21T07:38:49Z

backport tracker: https://tracker.ceph.com/issues/63874

backport of #53524
parent tracker: https://tracker.ceph.com/issues/62171

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7)

…ards Originally, the choice of 'debug_random' for osd_op_queue resulted in the selection of a random scheduler type for each OSD shard. A more realistic scenario for testing would be the selection of the random scheduler type applied globally for all shards of an OSD. In other words, all OSD shards would employ the same scheduler type. For e.g., this scenario would be possible during upgrades when the scheduler type has changed between releases. The following changes are made as part of the commit: 1. Introduce enum class op_queue_type_t within osd_types.h that holds the various op queue types supported. This header in included by OpQueue.h. Add helper functions osd_types.cc to return the op_queue_type_t as enum or a string representing the enum member. 2. Determine the scheduler type before initializing the OSD shards in OSD class constructor. 3. Pass the determined op_queue_type_t to the OSDShard's make_scheduler() method for each shard. This ensures all shards of the OSD are initialized with the same scheduler type. 4. Rename & modify the unused OSDShard::get_scheduler_type() method to return op_queue_type_t set for the queue. 5. Introduce OpScheduler::get_type() and OpQueue::get_type() pure virtual functions and define them within the respective queue implementation. This returns a value pertaining to the op queue type. This is called by OSDShard::get_op_queue_type(). 6. Add OSD::osd_op_queue_type() method for determining the scheduler type set on the OSD shards. Since all OSD shards are set to use the same scheduler type, the shard with the lowest id is used to get the scheduler type using OSDShard::get_op_queue_type(). 7. Improve comment description related to 'osd_op_queue' option in common/options/osd.yaml.in. Call Flow -------- OSD OSDShard OpScheduler/OpQueue --- -------- ------------------- osd_op_queue_type() -> get_op_queue_type() -> get_type() Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 96df279)

…system All OSD shards are guaranteed to use the same scheduler type. Therefore, OSD::osd_op_queue_type() is used where applicable to determine the scheduler type. This results in the appropriate setting of other config options based on the randomly selected scheduler type in case the global 'osd_op_queue' config option is set to 'debug_random' (for e.g., in CI tests). Note: If 'osd_op_queue' is set to 'debug_random', the PG specific code (PGPeering, PrimaryLogPG) would continue to use the existing mechanism of querying the config option key (osd_op_queue) as before using get_val(). Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit fadc097)

Determine the op priority cutoff for an OSD and apply it on all the OSD shards, which is a more realistic scenario. Previously, the cut off value was randomized between OSD shards leading to issues in testing. The IO priority cut off is first determined before initializing the OSD shards. The cut off value is then passed to the OpScheduler implementations that are modified accordingly to apply the values during initialization. Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit bfbc6b6)

sseshasa · 2024-01-02T07:56:15Z

@ljflores Can you please approve this backport PR and include it in one of the reef batches? Thanks!

sseshasa · 2024-01-24T16:15:47Z

@ljflores Can you please approve this backport PR and include it in one of the reef batches? Thanks!

@ljflores This too needs to be included in the next reef batch for testing. Thanks!

yuriw · 2024-01-30T16:11:08Z

ref: https://trello.com/c/pc17d4LG

sseshasa · 2024-01-31T08:37:31Z

ref: https://trello.com/c/pc17d4LG

I am looking into a teuthology failure and ascertaining if it's related to this PR or an issue with the test itself.

Test Failure:
https://pulpito.ceph.com/yuriw-2024-01-26_01:08:12-rados-wip-yuri2-testing-2024-01-25-1327-reef-distro-default-smithi/7533722/

sseshasa · 2024-03-13T08:11:39Z

@ljflores @yuriw Can you please include this PR along with #56151 in the next reef batch? Thanks!

ljflores · 2024-03-25T21:19:23Z

Rados approved: https://tracker.ceph.com/projects/rados/wiki/REEF#httpstrackercephcomissues65048

sseshasa added 4 commits December 21, 2023 13:08

sseshasa requested a review from a team as a code owner December 21, 2023 07:38

sseshasa added this to the reef milestone Dec 21, 2023

sseshasa added the core label Dec 21, 2023

github-actions bot added build/ops common tests labels Dec 21, 2023

sseshasa requested review from ljflores and removed request for a team January 2, 2024 07:54

ljflores approved these changes Jan 4, 2024

View reviewed changes

ljflores added the needs-qa label Jan 4, 2024

sseshasa modified the milestones: reef, v18.2.2 Jan 25, 2024

yuriw added the wip-yuri2-testing label Jan 25, 2024

sseshasa removed the needs-qa label Jan 30, 2024

sseshasa removed this from the v18.2.2 milestone Jan 30, 2024

yuriw removed the wip-yuri2-testing label Jan 30, 2024

sseshasa added the DNM label Jan 31, 2024

sseshasa added needs-qa and removed DNM labels Mar 13, 2024

sseshasa added this to the v18.2.3 milestone Mar 20, 2024

yuriw added the wip-yuri11-testing label Mar 21, 2024

yuriw merged commit 8905165 into ceph:reef Mar 25, 2024

sseshasa deleted the wip-63874-reef branch March 26, 2024 04:36

stevefan1999-personal mentioned this pull request Jul 28, 2024

ARM64 OSD crash on v18.2.4 rook/rook#14502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reef: osd: Apply randomly selected scheduler type across all OSD shards #54981

reef: osd: Apply randomly selected scheduler type across all OSD shards #54981
yuriw merged 4 commits intoceph:reeffrom
sseshasa:wip-63874-reef

sseshasa commented Dec 21, 2023

Uh oh!

sseshasa commented Jan 2, 2024

Uh oh!

sseshasa commented Jan 24, 2024

Uh oh!

yuriw commented Jan 30, 2024

Uh oh!

sseshasa commented Jan 31, 2024 •

edited

Loading

Uh oh!

sseshasa commented Mar 13, 2024

Uh oh!

ljflores commented Mar 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sseshasa commented Dec 21, 2023

Uh oh!

sseshasa commented Jan 2, 2024

Uh oh!

sseshasa commented Jan 24, 2024

Uh oh!

yuriw commented Jan 30, 2024

Uh oh!

sseshasa commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sseshasa commented Mar 13, 2024

Uh oh!

ljflores commented Mar 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sseshasa commented Jan 31, 2024 •

edited

Loading