client/mds: revoke the Fx caps to trigger dirty metadatas to be flushed from client#51931
client/mds: revoke the Fx caps to trigger dirty metadatas to be flushed from client#51931
Conversation
|
jenkins test api |
|
jenkins test windows |
|
@lxbsz Thanks for pointing to this. I've noted it my backlog and will to it on Monday (4th Sep). |
|
Hey Xiubo, I left a note on the tracker - https://tracker.ceph.com/issues/61584#note-5 |
Yeah, I replied it. Thanks @vshankar |
So, process_request_cap_release() sends revoke message to a client and starts processing the client request immediately. I wonder why its done that way since ideally it should wait for the revoke ack as its done in path_traverse like your change to acquire parent dir xlock(filelock). |
Circling back on this: adding xlock(filelock) in MDCache::path_traverse is correct, however, it could be the case that the revoke ack from the client (from process_request_cap_release()) is processed after revoking Fx caps for the parent directory. Would the mtime be overwritten in this case too? @lxbsz |
@vshankar Sorry for late to reply this. In the client request it will try to So in the
That doesn't matter, when the first revocation ack back it will wake up the waiters and then could continue and correctly handle the |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
Test runs in ~2h - wip-vshankar-testing-20231127.102654 |
|
https://pulpito.ceph.com/?branch=wip-vshankar-testing-20231127.102654 (rhel pkg install failures are a bunch, so, this would need a revalidate) |
|
@lxbsz I think this change is causing the MDS to assert during path traversal and acquiring xlock. /a/vshankar-2023-12-06_12:01:15-fs-wip-vshankar-testing-20231127.102654-r1-distro-default-smithi/7480089 |
Okay, let me have a check. Thanks |
@vshankar I found the root cause for this failure. For example for '/mydir/fileA But for Currently for the 'sm_filelock' it won't allow to do the remote xlock, please see By going through the code I didn't find any reason why we couldn't. Currently only the Locally I just enabled it and now still running the tests, till now everything worked well. @batrick @gregsfortytwo Any thought ? Thanks. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
Teuthology Job Status Job: snap-schedule Job: fs:upgrade all are known issues Job: fs:libcephfs all are known issues Job: snap_schedule Wiki: https://tracker.ceph.com/projects/cephfs/wiki/Main#2024-03-25 |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
@mchangir Sorry for missing this message. BTW, do you mean the |
batrick
left a comment
There was a problem hiding this comment.
The test case was simple, let's add it?
src/mds/MDCache.cc
Outdated
| if (depth > 0 || !mdr->lock_cache) { | ||
| lov.add_wrlock(&cur->filelock); | ||
| if (xlock_dentry_parent) { | ||
| lov.add_xlock(&cur->filelock); |
There was a problem hiding this comment.
This won't be appropriate I think. The filelock is deliberately configured to allow multiple writers (yes, even and especially for directory inodes) otherwise we will serialize every file create. (Please test this to confirm.)
What appears to be missing is that a wrlock on the filelock is not correctly revoking (or waiting for the revoke of) Fx on the directory inode. No?
There was a problem hiding this comment.
This won't be appropriate I think. The
filelockis deliberately configured to allow multiple writers (yes, even and especially for directory inodes) otherwise we will serialize every file create. (Please test this to confirm.)
Yeah, this really will serialize the creates.
What appears to be missing is that a wrlock on the
filelockis not correctly revoking (or waiting for the revoke of)Fxon the directory inode. No?
Yeah, correct. Maybe we can fix it by revoking the Fx when acquiring the wrlock ? Then it should be simple:
diff --git a/src/mds/SimpleLock.h b/src/mds/SimpleLock.h
index 6f1d049ea0a..2562d8c1143 100644
--- a/src/mds/SimpleLock.h
+++ b/src/mds/SimpleLock.h
@@ -358,14 +358,12 @@ public:
bool can_wrlock(client_t client) const {
return get_sm()->states[state].can_wrlock == ANY ||
(get_sm()->states[state].can_wrlock == AUTH && parent->is_auth()) ||
- (get_sm()->states[state].can_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client ||
- get_excl_client() == client));
+ (get_sm()->states[state].can_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client));
}
bool can_force_wrlock(client_t client) const {
return get_sm()->states[state].can_force_wrlock == ANY ||
(get_sm()->states[state].can_force_wrlock == AUTH && parent->is_auth()) ||
- (get_sm()->states[state].can_force_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client ||
- get_excl_client() == client));
+ (get_sm()->states[state].can_force_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client));
}
bool can_xlock(client_t client) const {
return get_sm()->states[state].can_xlock == ANY ||
There was a problem hiding this comment.
Tested the new patch and it worked well.
Sure. |
d2a7040 to
18d3fce
Compare
| (type->type == CEPH_LOCK_IFILE || | ||
| type->type == CEPH_LOCK_IAUTH || | ||
| type->type == CEPH_LOCK_ILINK || | ||
| type->type == CEPH_LOCK_IXATTR))))); |
There was a problem hiding this comment.
Is this problem actually unique to directories? Don't we want to revoke Fx even for regular files?
There was a problem hiding this comment.
Only for parent directory will have this issue.
src/mds/locks.c
Outdated
| CEPH_CAP_GCACHE | | ||
| CEPH_CAP_GBUFFER), | ||
| .can_remote_xlock = 0, | ||
| .can_remote_xlock = 1, |
| ASSERT_EQ(ceph_statx(cmount, c_dir, &stx2, CEPH_STATX_MTIME, 0), 0); | ||
| ASSERT_NE(utime_t(stx1.stx_mtime), utime_t(stx2.stx_mtime)); | ||
|
|
||
| ceph_shutdown(cmount); |
src/mds/MDSCacheObject.h
Outdated
| void state_set(unsigned mask) { state |= mask; } | ||
| void state_reset(unsigned s) { state = s; } | ||
|
|
||
| bool is_dir() const { return state_test(STATE_DIR); } |
There was a problem hiding this comment.
better would be a virtual bool is_dir() const that is overriden here (and defined in MDSCacheObject.
src/client/Client.cc
Outdated
|
|
||
| if (force || (drop & cap.issued)) { | ||
| if (drop & cap.issued) { | ||
| ldout(cct, 25) << "dropping caps " << ccap_string(drop) << dendl; |
There was a problem hiding this comment.
| ldout(cct, 25) << "dropping caps " << ccap_string(drop) << dendl; | |
| ldout(cct, 20) << "dropping caps " << ccap_string(drop) << dendl; |
src/mds/CDir.h
Outdated
|
|
||
| void mark_complete(); | ||
|
|
||
| bool is_dir() const { return true; } |
There was a problem hiding this comment.
| bool is_dir() const { return true; } | |
| bool is_dir() const override { return true; } |
If the 'Fx' caps is issued the client will cache the dirty metadatas, while if a write request, mkdir request for example, comes it will try to update the parents' mtime without trigger to revoke the corresponding caps. This will cause the mtime will be overwrote later after the dirty caps is flushed from clients. So for loner client's requests we need to revoke the 'Fx' caps anyway. Fixes: https://tracker.ceph.com/issues/61584 Signed-off-by: Xiubo Li <xiubli@redhat.com>
MDS need to rdlock directory inode's authlock when handling these
requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.
This is just sync up with kclient commit 222b7f90ba38 ("ceph:
voluntarily drop Ax cap for requests that create new inode")
Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
MDS need to xlock inode's linklock when handling link/rename requests.
Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message.
This is just a sync up with kclient commit d19a0b540182 ("ceph:
voluntarily drop Lx cap for link/rename requests")
Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
For write requests the parent's mtime will be updated correspondingly. And if the 'Xx' caps is issued and when releasing other caps together with the write requests the MDS Locker will try to eval the xattr lock, which need to change the locker state excl --> sync and need to do Xx caps revocation. Just voluntarily dropping CEPH_CAP_XATTR_EXCL caps to avoid a cap revoke message, which could cause the mtime will be overwrote by stale one. Fixes: https://tracker.ceph.com/issues/61584 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Fixes: https://tracker.ceph.com/issues/61584 Signed-off-by: Xiubo Li <xiubli@redhat.com>
|
@mchangir pinging on this since I see your testing tag. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
If the 'Fx' caps is issued the client will cache the dirty metadatas,
while if a write request, mkdir request for example, comes it will
try to update the parents' mtime without trigger to revoke the
corresponding caps. This will cause the mtime will be overwrote
later after the dirty caps is flushed from clients.
We need to acquire the xlock for the parent inode.
And also we could voluntarily drop Xx/Lx/Ax caps for some requests
to avoid revocation msgs.
Fixes: https://tracker.ceph.com/issues/61584
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows