os/bluestore: make osd shard-thread do oncommits#22739
Conversation
|
My test host: 4k-8osdthread-1finisher: IOPS=51K THe idea from seastar which make callback on the same core. From the results, when workload is relative lower, has additional finisher thread has better performance. But for heavy workload, this PR show better performance(58.6/50.6=115%) and remove the additional finisher threads. |
|
@majianpeng Any chance I could get you to try using the gdb wallclock profiler on a before/after test case? You can see it here: Thanks! |
| ++slot->num_running; | ||
| list<Context *> oncommits; | ||
| if (!sdata->context_queue.empty()) { | ||
| sdata->context_queue.swap(oncommits); |
There was a problem hiding this comment.
Why not finish() the contexts here?
There was a problem hiding this comment.
at this point, it already lock(sdata->shard_lock)
a)context callback may take long time
b)context will get pg lock and will cause dead lock.
|
This is really interesting! A few initial thoughts:
Any sense of why this is slower with only 8 worker threads? |
f5096e7 to
4951286
Compare
|
@liewegas . update by your suggestion. please review. |
|
@jdurgin . ping. Thanks! |
|
@jdurgin . Have you time to review this PR? Thanks very much. |
|
I like this idea - agree we need to be careful about deadlocks. The diff --minimal is a little more readable - the op wq change is simple enough, the change in indentation level just makes the diff look bigger. @majianpeng can you rebase? Then let's run this through the rados suite and see if any fallout emerges. |
7967e6d to
0e04e66
Compare
|
@liewegas . Ping. Thanks |
|
@liewegas . Ping. |
src/common/Finisher.h
Outdated
| void swap(list<Context *>& ls) { | ||
| ls.clear(); | ||
| std::lock_guard<std::mutex> l(q_mutex); | ||
| if(!q.empty()) { |
src/os/bluestore/BlueStore.cc
Outdated
| } | ||
| for (auto c : on_applied) { | ||
| finishers[osr->shard]->queue(c); | ||
| if(on_applied.size()) { |
src/osd/OSD.h
Outdated
| } | ||
|
|
||
| void handle_oncommits(list<Context*>& oncommits) { | ||
| if (!oncommits.empty()) { |
There was a problem hiding this comment.
Does this 'if' actually help anything?
liewegas
left a comment
There was a problem hiding this comment.
Few nits, but otherwise this looks good! We need to test pretty thoroughly in QA tho
0e04e66 to
4fe8118
Compare
|
retest this please. |
|
@liewegas . update by your suggestions. But it can't pass "make check". What's mean of "Pending — Build triggered for merge commit. "? In my dev-machine, it can pass all tests. |
|
@majianpeng The lab is down, which probably include jenkins. SHould be fixed shortly! |
|
retest this please |
|
@tchaikov . Still faied for the same reason: But for bluestore mkfs failed: (22) Invalid argument. THe code don't affect those code. @tchaikov , Could you give some suggestion? Thanks! |
|
http://pulpito.ceph.com/kchai-2018-09-18_10:05:03-rados-master-distro-basic-smithi/ i am rerunning the same test on master again. it failed with the error. rerunning on ubuntu http://pulpito.ceph.com/kchai-2018-09-18_10:49:01-rados-master-distro-basic-smithi/ |
|
Have we repeated the original performance test recently? There have been a lot of changes to the code here and I would like to see that we are still seeing a sufficient benefit! |
ce496c9 to
d5627af
Compare
|
@majianpeng before we devote more resources to this improvement, could you repeat the benchmark to address Sage's concern? |
|
@tchaikov . I'm doing now by sage's request. |
@liewegas From the results, i think this PR still deserves to be merged. |
|
Sounds good! Let's proceed |
|
@majianpeng turns out we were and this test failure was addressed by #24143 |
tchaikov
left a comment
There was a problem hiding this comment.
aside from the nits, lgtm.
src/common/Finisher.h
Outdated
|
|
||
| void swap(list<Context *>& ls) { | ||
| ls.clear(); | ||
| std::lock_guard<std::mutex> l(q_mutex); |
There was a problem hiding this comment.
use scoped_lock if you please.
src/os/bluestore/BlueStore.cc
Outdated
| std::lock_guard<std::mutex> l(txc->osr->qlock); | ||
| txc->state = TransContext::STATE_KV_DONE; | ||
| finishers[txc->osr->shard]->queue(txc->oncommits); | ||
| if (txc->ch->commit_queue) |
src/os/bluestore/BlueStore.cc
Outdated
| for (auto c : on_applied) { | ||
| finishers[osr->shard]->queue(c); | ||
| if (!on_applied.empty()) { | ||
| if (c->commit_queue) |
src/os/bluestore/BlueStore.h
Outdated
| CollectionHandle open_collection(const coll_t &c) override; | ||
| CollectionHandle create_new_collection(const coll_t& cid) override; | ||
| void set_collection_commit_queue(const coll_t& cid, | ||
| class ContextQueue *commit_queue) override; |
src/os/bluestore/BlueStore.h
Outdated
|
|
||
| //pool options | ||
| pool_opts_t pool_opts; | ||
| class ContextQueue *commit_queue; |
src/os/filestore/FileStore.h
Outdated
| CollectionHandle open_collection(const coll_t& c) override; | ||
| CollectionHandle create_new_collection(const coll_t& c) override; | ||
| void set_collection_commit_queue(const coll_t& cid, | ||
| class ContextQueue *commit_queue) override { |
src/os/kstore/KStore.h
Outdated
| CollectionHandle open_collection(const coll_t& c) override; | ||
| CollectionHandle create_new_collection(const coll_t& c) override; | ||
| void set_collection_commit_queue(const coll_t& cid, | ||
| class ContextQueue *commit_queue) override { |
src/os/memstore/MemStore.h
Outdated
| CollectionHandle create_new_collection(const coll_t& c) override; | ||
|
|
||
| void set_collection_commit_queue(const coll_t& cid, | ||
| class ContextQueue *commit_queue) override { |
src/osd/OSD.h
Outdated
| ceph_assert(sdata); | ||
| Mutex::Locker l(sdata->shard_lock); | ||
| return sdata->pqueue->empty(); | ||
| if (thread_index < osd->num_shards) |
yeah, please close it. |
41dc03c to
6c583fe
Compare
|
@tchaikov . update by your comments. Thanks very much! |
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
|
Wahoo! |
Now bluestore oncommit callback exec by osd op threads.
If there are multi threads of shard, it will cause out-of order.
For example, threads_per_shard=2
Thread1 Thread2
swap_oncommits(op1_oncommit)
swap_oncommits(op2_oncommit)
OpQueueItem.run(Op3)
op2_oncommit.complete();
op1_oncommit.complete()
This make oncommits out of order.
To avoiding this, we choose a fixed thread which has the smallest
thread_index of shard to do oncommit callback function.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
@majianpeng i have the failure above in http://pulpito.ceph.com/kchai-2018-09-20_12:40:38-rados-wip-kefu-testing-2018-09-19-1821-distro-basic-smithi/3046733/ in ceph/src/os/bluestore/BlueStore.cc Line 6929 in cc424b9 so we had 14 errors not fixed by fsck. could you help take a look at this ? |
|
tracked by http://tracker.ceph.com/issues/36099 |
|
@tchaikov . i'll check this now. |
|
@rzarzynski started looking at this yesterday too |
The benefits:
a)remove finisher threads
b)avoid racing of pg lock.
For some workload, this can decreased performance.
Signed-off-by: Jianpeng Ma jianpeng.ma@intel.com