mds: flush mdlog if locked and still has wanted caps not satisfied #45243
mds: flush mdlog if locked and still has wanted caps not satisfied #45243vshankar merged 3 commits intoceph:masterfrom
Conversation
|
@lxbsz This would need to be backported. |
Sure, I have created one tracker: https://tracker.ceph.com/issues/54463 |
gregsfortytwo
left a comment
There was a problem hiding this comment.
Hmm. This patches the specific case we saw but I'm not sure it's the right solution. An awful lot of the time when there are file size changes, the client doing the change is the only one who needs the information, so we don't need a log flush.
@kotreshhr says you came across this commit 6844bf6 as the likely source of the bug, and building off of that seems to me like a better general approach: identifying cases where cap flushes need to be committed immediately so we can grant caps to other clients. That way, we get the benefit of not adding more piecemeal required flushes to the MDS path that end up with us just flushing constantly all the time, while making sure we don't block cross-client IO.
Does that make sense? Am I missing something?
Yeah, it makes sense. I was also thinking the same thing and I am still reading the Locker related code. I will update the PR once I find that. But IMO this patch here still make sense, because currently only when the size increases will it flush the mdlog, why not when decrease ? I think the Here is a little different, the ifile lock is in stable state |
So, we could just commit to disk on every single client op. We don't do that for all the usual batching reasons: doing individual IOs is wildly less efficient, results in less total throughput, and frequently higher average latencies than batching them up does. So the question is, when do we not batch? And the answer is: we do immediate commits when we block something that a client is waiting on. In this case, a client who asks to reduce the size is only waiting for the metadata commit; when you do a truncate or change the size downwards, it never blocks IO for the client doing the size change.
I haven't looked at the specific states the locks were in for this case; did it not transition into an unstable state when it started recalling the loner caps? That's what I naively would have expected to happen. |
Yeah, that's also what I think it should be.
Okay. Sounds reasonable. So in this case if without the commit (0bca941) we need to figure out whether will there be other cases will cause the same issue except the following case below, i need to check the code more carefully,but not sure I will find them all. I am okay to abandon the commit (0bca941) if you think that will be okay.
The problem here is that before new client requested to open the file, there had only one client was operating this file and it was the loner client, and the loner client would stay in EXCL stable lock state, and then do the journal log writing by holding the And later when a new client came to open this file and when evaling the locks, the And from the log file we can see the relevant logs: My second commit (a591756) will fix this. |
|
jenkins retest this please |
1cb9904 to
8053b52
Compare
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
I have removed this commit and will flush the mdlog when needed later in commit (0bc5fa7). |
gregsfortytwo
left a comment
There was a problem hiding this comment.
I really like this direction! 👍 Just a style guide nit and I think the third patch is no longer necessary?
| int Locker::issue_caps(CInode *in, Capability *only_cap) | ||
| int Locker::get_allowed_caps(CInode *in, Capability *cap, | ||
| int &all_allowed, int &loner_allowed, | ||
| int &xlocker_allowed) |
There was a problem hiding this comment.
I know the Ceph codebase is inconsistent about this for historical reasons, but I believe the style guide says references should be const, and out parameters should all be pointers. We do try and stick to that convention for new and updated code.
src/mds/Locker.cc
Outdated
| simple_sync(lock, need_issue); | ||
| } else { | ||
| dout(10) << " waiting for wrlock to drain" << dendl; | ||
| mds->mdlog->flush(); |
There was a problem hiding this comment.
Isn't this commit made irrelevant by flushing the log on the conflicting caps in previous patches?
There was a problem hiding this comment.
Yeah, I will drop this change.
Yeah, the second one have already cover it. For our issue it's not necessary. So if the second commit looks okay, I will drop this one then. |
Yep sounds good. Should also update the PR description and tracker ticket to be clearer that this is about caps getting stuck, not specifically the size changing to 0. |
Sure. Will do that. Thanks @gregsfortytwo |
lock_state_any is true will ignore the lock state. Signed-off-by: Xiubo Li <xiubli@redhat.com>
In _do_cap_update() if one client is releasing the Fw caps the relevant client range will be erased, and then new_max will be 0. It will skip flushing the mdlog after it submitting a journal log, which will keep holding the wrlock for the filelock. So when a new client is trying to open the file for reading, since the wrlock is locked for the filelock the file_eval() is possibly couldn't changing the lock state and at the same time if the filelock is in stable state, such as in EXECL, MIX. The mds may skip flushing the mdlog in the open related code too. We need to flush the mdlog if there still has any wanted caps couldn't be satisfied and has any lock for the filelock after the file_eval(). Fixes: https://tracker.ceph.com/issues/54463 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
In _do_cap_update() if one client is releasing the Fw caps the
relevant client range will be erased, and then new_max will be 0.
It will skip flushing the mdlog after it submitting a journal log,
which will keep holding the wrlock for the filelock.
So when a new client is trying to open the file for reading, since
the wrlock is locked for the filelock the file_eval() is possibly
couldn't changing the lock state and at the same time if the
filelock is in stable state, such as in EXECL, MIX. The mds may
skip flushing the mdlog in the open related code too.
We need to flush the mdlog if there still has any wanted caps
couldn't be satisfied and has any lock for the filelock after the
file_eval().
Fixes: https://tracker.ceph.com/issues/54463
Signed-off-by: Xiubo Li xiubli@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox