dmclock server side refactor#30650
Conversation
|
Test run: http://pulpito.ceph.com/sjust-2019-09-28_23:02:01-rados-wip-sjust-testing-distro-basic-smithi/ |
|
@ivancich @liyichao This PR has a simpler version of the server side dmclock implementation. I'd like to merge it prior to adding the client side as this bit does not require protocol changes. Note, this patch removes the two existing mclock OpQueue implementations in favor of a simpler one based on a less complicated interface. Moreover, it removes PrioritizedQueue as no one appears to use it anymore. Does this seem reasonable? https://github.com/athanatos/ceph/tree/sjust/wip-dmclock-cleanup has the client side portions. I'm holding off on it for a bit since it changes the MOSDOp encoding significantly to enable adding the mclock params to the pre-dispatch decode area. |
|
I like it. CC: @Yan-waller |
src/osd/scheduler/mClockScheduler.h
Outdated
|
|
||
| namespace ceph { | ||
| namespace osd { | ||
| namespace scheduler { |
There was a problem hiding this comment.
nit, could put
namespace ceph::osd::scheduler {| { | ||
| auto id = get_scheduler_id(item); | ||
| // TODO: express cost, mclock params in terms of per-node capacity? | ||
| auto cost = 1; //std::max(item.get_cost(), 1); |
There was a problem hiding this comment.
Note, I've specifically punted on cost here. I plan on expressing the client concept of the mClock params in terms of fractions of a single node of capacity. Those values will be transformed into actual params by multiplying and normalizing based on an online estimate from bluestore of the osd's capacity. Future work.
There was a problem hiding this comment.
For testing purposes, iops is as good as anything else.
|
retest this please |
|
retest this please |
115801b to
27ce332
Compare
|
retest this please |
1 similar comment
|
retest this please |
|
Second set of test runs: http://pulpito.ceph.com/sjust-2019-10-15_05:58:15-rados-wip-sjust-testing-distro-basic-smithi/ The failures seem unrelated. Looked at one dead one, seemed to be an IO error. |
|
@tchaikov I think this is ready for a real review now. |
Not all schedulers necessarily depend on a particular strict/non-strict cutoff. Instead, simply shedule at CEPH_MSG_PRIO_HIGH. See also: d346fdc Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
27ce332 to
caab2d5
Compare
src/osd/scheduler/OpSchedulerItem.cc
Outdated
| namespace ceph { | ||
| namespace osd { | ||
| namespace scheduler { |
There was a problem hiding this comment.
Since you combined the namespace declarations in other places (e.g., OpSchedulerItem.h) you could do so as well here.
|
@athanatos yeah, i did want to review this change. but haven't got a chance to do so. i am skimming through it now. |
| class OpQueueItem { | ||
| namespace ceph::osd::scheduler { | ||
|
|
||
| enum class op_scheduler_class : uint8_t { |
There was a problem hiding this comment.
can we remove "scheduler" from the names? as we already have "scheduler" in the namespace's name. i think that also the nice thing of namespace -- it has name in it =)
There was a problem hiding this comment.
I almost did that, but even with the scheduler namespace, I found names like op_class or OpItem to be insufficiently specific without the scheduler portion. The main problem is that op/operation is a very overloaded concept in the ceph code base. I'd rather leave it like this even at the expense of a little extra verbosity.
OpQueue is overkill for mclock based schedulers. The interface doesn't need to externalize the _strict modifiers, the scheduler can figure that out from the item itself. Introduce simpler Scheduler interface and add an adapter for the existing OpQueue based implementations. Signed-off-by: Samuel Just <sjust@redhat.com>
caab2d5 to
3642cb2
Compare
mClockScheduler schedules items based on op_scheduler_class with configured mclock parameters. Items which should be scheduled immediately (op_scheduler_class::immediate) are placed into a single queue as it's not clear that there's a reason to differentiate among them. A subsequent patch will add support for client provided mclock params and dmclock request state. Signed-off-by: Samuel Just <sjust@redhat.com>
The consensus seems to be that PrioritizedQueue is strictly worse than WeightedPriorityQueue. mClockClientQueue and mClockClassQueue are superceded by mClockScheduler. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
3642cb2 to
6eaf324
Compare
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph/ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7)
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7)
mClockPriorityQueue (mClockQueue class) is an older mClock implementation of the OpQueue abstraction. This was replaced by a simpler implementation of the OpScheduler abstraction as part of ceph/ceph#30650. The simpler implementation of mClockScheduler is being currently used. This commit removes the unused src/common/mClockPriorityQueue.h along with the associated unit test file: test_mclock_priority_queue.cc. Other miscellaneous changes, - Remove the cmake references to the unit test file - Remove the inclusion of the header file in mClockScheduler.h Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com> (cherry picked from commit 28a26f7) (cherry picked from commit 6dffc43) Resolves: rhbz#2330755
This PR replaces the OpQueue abstraction with a smaller Scheduler abstraction, creates a new, simpler mclock implementation of that abstraction, and removes the old mclock OpQueue implementations as well as the PriorityQueue implementation leaving the new mClockScheduler and wpq.