client: flush the caps release in filesystem sync#58896
Conversation
vshankar
left a comment
There was a problem hiding this comment.
Please explain the commit message regarding the idea behind the change.
|
were there any problems uncovered due this @lxbsz ? |
|
In tracker you mentioned "Though the tick thread will do this but some times the tick could be stuck.", is it due to the thread entering sleep here?: Lines 7119 to 7121 in 2fa0e43 So maybe when the thread is sleeping, we're at a situation where the cap flush is needed but can't until the thread wakes up, but this should just be a slight delay i guess, shouldn't be that critical. What are other scenarios? |
Done. |
Not sure, before I hit several times the tick thread was blocked for a long time, maybe over load or something else. Here for this we just want to explicitly expose one method to flush the cap release manually to get rid of the cap revoke stuck bugs as we hit in https://tracker.ceph.com/issues/57244 many times before. |
We have figured out there has a race between caps revoke and the cap release in kclient. Just want to expose one way to flush the cap releases manually to get rid of the bug. |
So this is a temporary fix? |
|
This PR is under test in https://tracker.ceph.com/issues/67368. |
No, the caps releases also the metadata should be flushed when syncing the whole filesystem. |
|
jenkins test api |
|
jenkins test make check |
dparmar18
left a comment
There was a problem hiding this comment.
LGTM, we need a mechanism to flush cap releases manually and till now we had none. Other thing is Client::sync_fs does flush the metadata logs at Client::flush_mdlog_sync but never did we flush the cap releases. This is adding the missing bit to the function.
|
jenkins test make check |
|
jenkins test api |
* refs/pull/58896/head: client: flush the caps release in filesystem sync
|
jenkins retest this please |
We have hit a race between cap releases and cap revoke request that will cause the check_caps() to miss sending a cap revoke ack to MDS. And the client will depend on the cap release to release that revoking caps, which could be delayed for some unknown reasons. In Kclient we have figured out the RCA about race and we need a way to explictly trigger this manually could help to get rid of the caps revoke stuck issue. Fixes: https://tracker.ceph.com/issues/67221 Signed-off-by: Xiubo Li <xiubli@redhat.com>
|
rebased and pushed for getting jenkins tests to pass. |
Fixes: https://tracker.ceph.com/issues/67221
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e