reef: osd: Apply randomly selected scheduler type across all OSD shards #54981
Merged
reef: osd: Apply randomly selected scheduler type across all OSD shards #54981
Conversation
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7)
…ards
Originally, the choice of 'debug_random' for osd_op_queue resulted in the
selection of a random scheduler type for each OSD shard. A more realistic
scenario for testing would be the selection of the random scheduler type
applied globally for all shards of an OSD. In other words, all OSD shards
would employ the same scheduler type. For e.g., this scenario would be
possible during upgrades when the scheduler type has changed between
releases.
The following changes are made as part of the commit:
1. Introduce enum class op_queue_type_t within osd_types.h that holds the
various op queue types supported. This header in included by OpQueue.h.
Add helper functions osd_types.cc to return the op_queue_type_t as
enum or a string representing the enum member.
2. Determine the scheduler type before initializing the OSD shards in
OSD class constructor.
3. Pass the determined op_queue_type_t to the OSDShard's make_scheduler()
method for each shard. This ensures all shards of the OSD are
initialized with the same scheduler type.
4. Rename & modify the unused OSDShard::get_scheduler_type() method to
return op_queue_type_t set for the queue.
5. Introduce OpScheduler::get_type() and OpQueue::get_type() pure
virtual functions and define them within the respective queue
implementation. This returns a value pertaining to the op queue type.
This is called by OSDShard::get_op_queue_type().
6. Add OSD::osd_op_queue_type() method for determining the scheduler
type set on the OSD shards. Since all OSD shards are set to use
the same scheduler type, the shard with the lowest id is used to
get the scheduler type using OSDShard::get_op_queue_type().
7. Improve comment description related to 'osd_op_queue' option in
common/options/osd.yaml.in.
Call Flow
--------
OSD OSDShard OpScheduler/OpQueue
--- -------- -------------------
osd_op_queue_type() ->
get_op_queue_type() ->
get_type()
Fixes: https://tracker.ceph.com/issues/62171
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 96df279)
…system All OSD shards are guaranteed to use the same scheduler type. Therefore, OSD::osd_op_queue_type() is used where applicable to determine the scheduler type. This results in the appropriate setting of other config options based on the randomly selected scheduler type in case the global 'osd_op_queue' config option is set to 'debug_random' (for e.g., in CI tests). Note: If 'osd_op_queue' is set to 'debug_random', the PG specific code (PGPeering, PrimaryLogPG) would continue to use the existing mechanism of querying the config option key (osd_op_queue) as before using get_val(). Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit fadc097)
Determine the op priority cutoff for an OSD and apply it on all the OSD shards, which is a more realistic scenario. Previously, the cut off value was randomized between OSD shards leading to issues in testing. The IO priority cut off is first determined before initializing the OSD shards. The cut off value is then passed to the OpScheduler implementations that are modified accordingly to apply the values during initialization. Fixes: https://tracker.ceph.com/issues/62171 Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit bfbc6b6)
Contributor
Author
|
@ljflores Can you please approve this backport PR and include it in one of the reef batches? Thanks! |
ljflores
approved these changes
Jan 4, 2024
Contributor
Author
Contributor
Contributor
Author
|
I am looking into a teuthology failure and ascertaining if it's related to this PR or an issue with the test itself. Test Failure: |
Contributor
Author
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backport tracker: https://tracker.ceph.com/issues/63874
backport of #53524
parent tracker: https://tracker.ceph.com/issues/62171
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh