Skip to content

reef: osd: Apply randomly selected scheduler type across all OSD shards #54981

Merged
yuriw merged 4 commits intoceph:reeffrom
sseshasa:wip-63874-reef
Mar 25, 2024
Merged

reef: osd: Apply randomly selected scheduler type across all OSD shards #54981
yuriw merged 4 commits intoceph:reeffrom
sseshasa:wip-63874-reef

Conversation

@sseshasa
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/63874


backport of #53524
parent tracker: https://tracker.ceph.com/issues/62171

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 28a26f7)
…ards

Originally, the choice of 'debug_random' for osd_op_queue resulted in the
selection of a random scheduler type for each OSD shard. A more realistic
scenario for testing would be the selection of the random scheduler type
applied globally for all shards of an OSD. In other words, all OSD shards
would employ the same scheduler type. For e.g., this scenario would be
possible during upgrades when the scheduler type has changed between
releases.

The following changes are made as part of the commit:
 1. Introduce enum class op_queue_type_t within osd_types.h that holds the
    various op queue types supported. This header in included by OpQueue.h.
    Add helper functions osd_types.cc to return the op_queue_type_t as
    enum or a string representing the enum member.
 2. Determine the scheduler type before initializing the OSD shards in
    OSD class constructor.
 3. Pass the determined op_queue_type_t to the OSDShard's make_scheduler()
    method for each shard. This ensures all shards of the OSD are
    initialized with the same scheduler type.
 4. Rename & modify the unused OSDShard::get_scheduler_type() method to
    return op_queue_type_t set for the queue.
 5. Introduce OpScheduler::get_type() and OpQueue::get_type() pure
    virtual functions and define them within the respective queue
    implementation. This returns a value pertaining to the op queue type.
    This is called by OSDShard::get_op_queue_type().
 6. Add OSD::osd_op_queue_type() method for determining the scheduler
    type set on the OSD shards. Since all OSD shards are set to use
    the same scheduler type, the shard with the lowest id is used to
    get the scheduler type using OSDShard::get_op_queue_type().
 7. Improve comment description related to 'osd_op_queue' option in
    common/options/osd.yaml.in.

Call Flow
--------
OSD                     OSDShard                 OpScheduler/OpQueue
---                     --------                 -------------------
osd_op_queue_type() ->
                        get_op_queue_type() ->
                                                 get_type()

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 96df279)
…system

All OSD shards are guaranteed to use the same scheduler type. Therefore,
OSD::osd_op_queue_type() is used where applicable to determine the
scheduler type. This results in the appropriate setting of other config
options based on the randomly selected scheduler type in case the global
'osd_op_queue' config option is set to 'debug_random' (for e.g., in CI
tests).

Note: If 'osd_op_queue' is set to 'debug_random', the PG specific code
(PGPeering, PrimaryLogPG) would continue to use the existing mechanism of
querying the config option key (osd_op_queue) as before using get_val().

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit fadc097)
Determine the op priority cutoff for an OSD and apply it on all the OSD
shards, which is a more realistic scenario. Previously, the cut off value
was randomized between OSD shards leading to issues in testing. The IO
priority cut off is first determined before initializing the OSD shards.
The cut off value is then passed to the OpScheduler implementations that
are modified accordingly to apply the values during initialization.

Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit bfbc6b6)
@sseshasa sseshasa requested a review from a team as a code owner December 21, 2023 07:38
@sseshasa sseshasa added this to the reef milestone Dec 21, 2023
@sseshasa sseshasa added the core label Dec 21, 2023
@sseshasa sseshasa requested review from ljflores and removed request for a team January 2, 2024 07:54
@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 2, 2024

@ljflores Can you please approve this backport PR and include it in one of the reef batches? Thanks!

@sseshasa
Copy link
Contributor Author

@ljflores Can you please approve this backport PR and include it in one of the reef batches? Thanks!

@ljflores This too needs to be included in the next reef batch for testing. Thanks!

@sseshasa sseshasa modified the milestones: reef, v18.2.2 Jan 25, 2024
@sseshasa sseshasa removed the needs-qa label Jan 30, 2024
@sseshasa sseshasa removed this from the v18.2.2 milestone Jan 30, 2024
@yuriw
Copy link
Contributor

yuriw commented Jan 30, 2024

ref: https://trello.com/c/pc17d4LG

@sseshasa
Copy link
Contributor Author

sseshasa commented Jan 31, 2024

ref: https://trello.com/c/pc17d4LG

I am looking into a teuthology failure and ascertaining if it's related to this PR or an issue with the test itself.

Test Failure:
https://pulpito.ceph.com/yuriw-2024-01-26_01:08:12-rados-wip-yuri2-testing-2024-01-25-1327-reef-distro-default-smithi/7533722/

@sseshasa sseshasa added the DNM label Jan 31, 2024
@sseshasa sseshasa added needs-qa and removed DNM labels Mar 13, 2024
@sseshasa
Copy link
Contributor Author

@ljflores @yuriw Can you please include this PR along with #56151 in the next reef batch? Thanks!

@sseshasa sseshasa added this to the v18.2.3 milestone Mar 20, 2024
@ljflores
Copy link
Member

@yuriw yuriw merged commit 8905165 into ceph:reef Mar 25, 2024
@sseshasa sseshasa deleted the wip-63874-reef branch March 26, 2024 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants