mds: find a new head for the batch ops when the head is dead#56941
mds: find a new head for the batch ops when the head is dead#56941
Conversation
ca8437a to
e4f14a2
Compare
If the batch head request is already dead and then we need to choose a new batch head anyways and release the reference for the current batch head request. Else it will be reported as slow request. Fixes: https://tracker.ceph.com/issues/65536 Signed-off-by: Xiubo Li <xiubli@redhat.com>
|
I'll run this through tests this week (was on PTO last week) |
Sure, thanks @vshankar |
* refs/pull/56941/head: mds: find a new head for the batch ops when the head is dead Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Dhairya Parmar <dparmar@redhat.com> Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
|
(unfortunately, failed are infra issues related to cephadm - would need a rebuild) |
|
This PR is under test in https://tracker.ceph.com/issues/65882. |
|
jenkins test make check |
|
jenkins test make check arm64 |
| dout(10) << __func__ << ": dead " << *mdr << dendl; | ||
| //if the mdr is a "batch_op" and it has followers, pick a follower as | ||
| //the new "head of the batch ops" and go on processing the new one. | ||
| if (mdr->client_request && mdr->is_batch_head()) { |
There was a problem hiding this comment.
I understand why this works but it is weird not to select a new batch head in ::request_cleanup. I would have preferred that but I think this PR needs to be merged urgently. Maybe in a follow-up cleanup?
There was a problem hiding this comment.
Does it really need ?
2079 void Server::respond_to_request(const MDRequestRef& mdr, int r)
2080 {
2081 mdr->result = r;
2082 if (mdr->client_request) {
2083 if (mdr->is_batch_head()) {
2084 dout(20) << __func__ << ": batch head " << *mdr << dendl;
2085 mdr->release_batch_op()->respond(r);
2086 } else {
2087 reply_client_request(mdr, make_message<MClientReply>(*mdr->client_request, r));
2088 }
2089 } else if (mdr->internal_op > -1) {
2090 dout(10) << __func__ << ": completing with result " << cpp_strerror(r) << " on internal " << *mdr << dendl;
2091 auto c = mdr->internal_op_finish;
2092 if (!c)
2093 ceph_abort_msg("trying to respond to internal op without finisher");
2094 mdcache->request_finish(mdr);
2095 c->complete(r);
2096 }
2097 }
We can see in Line#2085 it will cleanup all the batch requests when the batch head finishes.
There was a problem hiding this comment.
Let me have a check carefully and try to improve the code.
There was a problem hiding this comment.
@batrick please confirm that the if (dead) check only applies to internal requests, and can be moved after the client and peer conditional dispatches
There was a problem hiding this comment.
We can see in Line#2085 it will cleanup all the batch requests when the batch head finishes.
In this case, Server::respond_to_request is not called because the request was killed (that's why dead == true). This logic for batch_head handling in ::respond_to_request and now MDCache::dispatch_request should be uniformly handled in a single location: MDCache::request_cleanup.
There was a problem hiding this comment.
right, this is what @lxbsz had also figured out. So my question then is, shouldn't we just move that new if(dead) check into the else?
There was a problem hiding this comment.
No, I think we need to make a better effort unify handling of requests in the MDS. A dead request should not be dispatched no matter its type.
There was a problem hiding this comment.
@batrick But in the case when the sessions are killed and then all the corresponding requests will be marked killed too. Is that fine if we just remove the queued requests from the finisher queue directly ? If not then we cannot avoid this IMO, because the killed requests maybe still in the finisher queue and not dispatched yet.
But we can just choose a batch head directly when the a batch head request is being killed instead of doing this when the request is being dispatched.
There was a problem hiding this comment.
@batrick But in the case when the sessions are killed and then all the corresponding requests will be marked killed too. Is that fine if we just remove the queued requests from the finisher queue directly ? If not then we cannot avoid this IMO, because the killed requests maybe still in the finisher queue and not dispatched yet.
I understand. I am probably being confusing. It's okay that a dead request is re-dispatched through MDCache::dispatch_request but if it's marked dead, do nothing else and return;.
But we can just choose a batch head directly when the a batch head request is being killed instead of doing this when the request is being dispatched.
Yes!
There was a problem hiding this comment.
Let me try to improve it. Thanks @batrick @leonid-s-usov
https://tracker.ceph.com/issues/65882 (Note @vshankar I like to link to the qa run ticket as it's a SSOT.) |
Isn't that already done by ptl-tool when including a PR in a branch? Why relink it again? (If there are more than one QA trackers due to rebuild, etc., the last QA tracker linked is the SSOT). |
Well, actually, I like to link to the comment in the ticket where I approved the run which usually includes a link to the wiki for the full breakdown of failures. Sometimes a PR may also not have a comment linking to the ticket because it was added to a run after the QA ticket was already created; (Ultimately this is just a nit but I thought I'd share my thinking on it.) |
If the batch head request is already dead and then we need to choose a new batch head anyways and release the reference for the current batch head request. Else it will be reported as slow request.
Fixes: https://tracker.ceph.com/issues/65536
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e