Skip to content

dmclock server side refactor#30650

Merged
gregsfortytwo merged 6 commits intoceph:masterfrom
athanatos:sjust/wip-dmclock-server-only
Oct 23, 2019
Merged

dmclock server side refactor#30650
gregsfortytwo merged 6 commits intoceph:masterfrom
athanatos:sjust/wip-dmclock-server-only

Conversation

@athanatos
Copy link
Contributor

This PR replaces the OpQueue abstraction with a smaller Scheduler abstraction, creates a new, simpler mclock implementation of that abstraction, and removes the old mclock OpQueue implementations as well as the PriorityQueue implementation leaving the new mClockScheduler and wpq.

@athanatos athanatos requested a review from a team as a code owner September 30, 2019 21:12
@athanatos athanatos self-assigned this Sep 30, 2019
@athanatos
Copy link
Contributor Author

@athanatos
Copy link
Contributor Author

athanatos commented Sep 30, 2019

@ivancich @liyichao This PR has a simpler version of the server side dmclock implementation. I'd like to merge it prior to adding the client side as this bit does not require protocol changes. Note, this patch removes the two existing mclock OpQueue implementations in favor of a simpler one based on a less complicated interface. Moreover, it removes PrioritizedQueue as no one appears to use it anymore. Does this seem reasonable?

https://github.com/athanatos/ceph/tree/sjust/wip-dmclock-cleanup has the client side portions. I'm holding off on it for a bit since it changes the MOSDOp encoding significantly to enable adding the mclock params to the pre-dispatch decode area.

@xiexingguo
Copy link
Member

I like it.

CC: @Yan-waller


namespace ceph {
namespace osd {
namespace scheduler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could put

namespace ceph::osd::scheduler {

{
auto id = get_scheduler_id(item);
// TODO: express cost, mclock params in terms of per-node capacity?
auto cost = 1; //std::max(item.get_cost(), 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I've specifically punted on cost here. I plan on expressing the client concept of the mClock params in terms of fractions of a single node of capacity. Those values will be transformed into actual params by multiplying and normalizing based on an online estimate from bluestore of the osd's capacity. Future work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing purposes, iops is as good as anything else.

@athanatos
Copy link
Contributor Author

retest this please

@athanatos
Copy link
Contributor Author

retest this please

@athanatos athanatos force-pushed the sjust/wip-dmclock-server-only branch 2 times, most recently from 115801b to 27ce332 Compare October 14, 2019 19:14
@athanatos
Copy link
Contributor Author

retest this please

1 similar comment
@athanatos
Copy link
Contributor Author

retest this please

@athanatos
Copy link
Contributor Author

@athanatos athanatos changed the title DNM: dmclock server side refactor dmclock server side refactor Oct 16, 2019
@athanatos
Copy link
Contributor Author

@tchaikov I think this is ready for a real review now.

Not all schedulers necessarily depend on a particular strict/non-strict
cutoff.  Instead, simply shedule at CEPH_MSG_PRIO_HIGH.

See also: d346fdc

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-dmclock-server-only branch from 27ce332 to caab2d5 Compare October 16, 2019 23:14
@athanatos athanatos removed the DNM label Oct 17, 2019
@athanatos athanatos removed the request for review from a team October 17, 2019 22:41
Copy link
Member

@ivancich ivancich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the simplifications. It was a bit unwieldy before, and I like the direction this is heading.

Comment on lines +18 to +20
namespace ceph {
namespace osd {
namespace scheduler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you combined the namespace declarations in other places (e.g., OpSchedulerItem.h) you could do so as well here.

@athanatos
Copy link
Contributor Author

@tchaikov Did you want to take a look at this, or is @ivancich 's review sufficient?

@tchaikov
Copy link
Contributor

@athanatos yeah, i did want to review this change. but haven't got a chance to do so. i am skimming through it now.

class OpQueueItem {
namespace ceph::osd::scheduler {

enum class op_scheduler_class : uint8_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove "scheduler" from the names? as we already have "scheduler" in the namespace's name. i think that also the nice thing of namespace -- it has name in it =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost did that, but even with the scheduler namespace, I found names like op_class or OpItem to be insufficiently specific without the scheduler portion. The main problem is that op/operation is a very overloaded concept in the ceph code base. I'd rather leave it like this even at the expense of a little extra verbosity.

OpQueue is overkill for mclock based schedulers.  The interface doesn't
need to externalize the _strict modifiers, the scheduler can figure that
out from the item itself.  Introduce simpler Scheduler interface and add
an adapter for the existing OpQueue based implementations.

Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-dmclock-server-only branch from caab2d5 to 3642cb2 Compare October 22, 2019 22:12
mClockScheduler schedules items based on op_scheduler_class
with configured mclock parameters.  Items which should be
scheduled immediately (op_scheduler_class::immediate) are placed
into a single queue as it's not clear that there's a reason to
differentiate among them.

A subsequent patch will add support for client provided mclock
params and dmclock request state.

Signed-off-by: Samuel Just <sjust@redhat.com>
The consensus seems to be that PrioritizedQueue is strictly worse than
WeightedPriorityQueue.

mClockClientQueue and mClockClassQueue are superceded by
mClockScheduler.

Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
@athanatos athanatos force-pushed the sjust/wip-dmclock-server-only branch from 3642cb2 to 6eaf324 Compare October 23, 2019 20:34
@gregsfortytwo gregsfortytwo merged commit e190825 into ceph:master Oct 23, 2019
sseshasa added a commit to sseshasa/ceph that referenced this pull request Oct 10, 2023
mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to ceph/ceph-ci that referenced this pull request Nov 13, 2023
mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph/ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request Dec 21, 2023
mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 28a26f7)
sseshasa added a commit to sseshasa/ceph that referenced this pull request Dec 21, 2023
mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 28a26f7)
leonidc pushed a commit to ceph/ceph-ci that referenced this pull request Feb 10, 2025
mClockPriorityQueue (mClockQueue class) is an older mClock implementation
of the OpQueue abstraction. This was replaced by a simpler implementation
of the OpScheduler abstraction as part of
ceph/ceph#30650.

The simpler implementation of mClockScheduler is being currently used.
This commit removes the unused src/common/mClockPriorityQueue.h along
with the associated unit test file: test_mclock_priority_queue.cc.

Other miscellaneous changes,
 - Remove the cmake references to the unit test file
 - Remove the inclusion of the header file in mClockScheduler.h

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 28a26f7)
(cherry picked from commit 6dffc43)

Resolves: rhbz#2330755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants