os/bluestore: create the tail when first set FLAG_OMAP#27627
os/bluestore: create the tail when first set FLAG_OMAP#27627tchaikov merged 1 commit intoceph:masterfrom cxytt:fix-add-omap-tail
Conversation
|
issues 36482 maybe the same problem |
|
retest this please |
|
@markhpc @liewegas @gregsfortytwo This is the performance improve approach of list omap, ever discussed in devel mail list. please help to review if you feel it's convenient, thanks. It would be better if there is a performance label.
|
|
I've not done much with the KVStores so I'm not real clear on the impact of doing those deletes on the |
|
Ah, good point! I didn't notice those were in kv/ and not bluestore code. I think the safer path here is to leave |
gregsfortytwo
left a comment
There was a problem hiding this comment.
As noted, needs to not break monitor.
|
@liewegas |
|
retest this please |
src/os/bluestore/BlueStore.cc
Outdated
| RWLock::RLocker l(c->lock); | ||
| bool r = o->onode.has_omap() && it && it->valid() && | ||
| it->raw_key().second <= tail; | ||
| it->raw_key().second < tail && it->raw_key().second != tail; |
There was a problem hiding this comment.
why adding "it->raw_key().second != tail"? Isn't "it->raw_key.second < tail" enough?
There was a problem hiding this comment.
There are scenarios where an error occurs, such as read_log_and_missing,So here's what we did
There was a problem hiding this comment.
@cxytt - I simply mean that last part of this expression (it->raw_key()second != tail) is redundant. True for it->raw_key().second < tail automatically results in true for ... != tail.
src/os/bluestore/BlueStore.cc
Outdated
| get_omap_header(id, &prefix); | ||
| get_omap_tail(id, &tail); | ||
| txc->t->rm_range_keys(omap_prefix, prefix, tail); | ||
| if (db->get(omap_prefix , tail, &tail_value) >= 0) { |
There was a problem hiding this comment.
Do we really need this get() call? May be unconditionally call txc->t->rmkey(omap_prefix, tail) only?
src/os/bluestore/BlueStore.cc
Outdated
| break; | ||
| } else { | ||
| dout(30) << __func__ << " got header/data " | ||
| if (it->key() <= tail) { |
There was a problem hiding this comment.
Shouldn't this be it->key() < tail to avoid duplicate tail set - we've just set tail at line 13030
src/os/bluestore/BlueStore.cc
Outdated
| const string& prefix = | ||
| newo->onode.is_pgmeta_omap() ? PREFIX_PGMETA_OMAP : PREFIX_OMAP; | ||
| dout(20) << __func__ << " copying omap data" << dendl; | ||
| if (!newo->onode.has_omap()) { |
There was a problem hiding this comment.
This check isn't needed as we unconditionally clear omap for newo above
|
This is most probably relevant to http://tracker.ceph.com/issues/36482 |
|
retest this please |
|
Found two more "slow" onodes at troublesome cluster - both are stuck at tail lookup. |
|
fsck needs to be updated: |
|
I also see a crash like this: could the enumeration code be including the tail key in the result? this test you can reproduce locally with |
|
|
Ah, the problem is that the prefix is being set based on the flag that isn't set until a few lines later. this fixes it: |
the omap iterator when listing omap use the tail of '~', when the iterator moved to the last key of the omapswe wanted, we will try to call extra next(), usually this will be another object's omap header(with '-'). IF there are some deleted key or tombstones, rocksdb will fall in the loop of FindNextUserEntryInternal until find a valid key, so it will travels all dead key in mid and read the sst file heavily. Signed-off-by: Tao Ning <ningtao@sangfor.com.cn>
|
retest this please |
|
jenkins retest this please |
|
@liewegas - do you think we can go ahead with this PR? |
|
Yep, let's retest! |
the omap iterator when listing omap use the tail of '~', when the iterator moved to the last key of the omapswe wanted, we will try to call extra next(), usually this will be another object's omap header(with '-'). IF there are some deleted key or tombstones, rocksdb will fall in the loop of
FindNextUserEntryInternaluntil find a valid key, so it will travels all dead key in mid and read the sst file heavily.Signed-off-by: ningtao ningtao@sangfor.com.cn