Skip to content

client/mds: revoke the Fx caps to trigger dirty metadatas to be flushed from client#51931

Closed
lxbsz wants to merge 5 commits intoceph:mainfrom
lxbsz:wip-61584
Closed

client/mds: revoke the Fx caps to trigger dirty metadatas to be flushed from client#51931
lxbsz wants to merge 5 commits intoceph:mainfrom
lxbsz:wip-61584

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Jun 6, 2023

If the 'Fx' caps is issued the client will cache the dirty metadatas,
while if a write request, mkdir request for example, comes it will
try to update the parents' mtime without trigger to revoke the
corresponding caps. This will cause the mtime will be overwrote
later after the dirty caps is flushed from clients.

We need to acquire the xlock for the parent inode.

And also we could voluntarily drop Xx/Lx/Ax caps for some requests
to avoid revocation msgs.

Fixes: https://tracker.ceph.com/issues/61584

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@lxbsz lxbsz requested a review from a team June 6, 2023 05:19
@github-actions github-actions bot added the cephfs Ceph File System label Jun 6, 2023
@vshankar
Copy link
Contributor

jenkins test api

@vshankar
Copy link
Contributor

jenkins test windows

@vshankar
Copy link
Contributor

vshankar commented Sep 1, 2023

@lxbsz Thanks for pointing to this. I've noted it my backlog and will to it on Monday (4th Sep).

@lxbsz
Copy link
Member Author

lxbsz commented Sep 4, 2023

@lxbsz Thanks for pointing to this. I've noted it my backlog and will to it on Monday (4th Sep).

Sure, thanks @vshankar

@vshankar
Copy link
Contributor

Hey Xiubo, I left a note on the tracker - https://tracker.ceph.com/issues/61584#note-5

@lxbsz
Copy link
Member Author

lxbsz commented Sep 12, 2023

Hey Xiubo, I left a note on the tracker - https://tracker.ceph.com/issues/61584#note-5

Yeah, I replied it. Thanks @vshankar

@vshankar
Copy link
Contributor

If the 'Fx' caps is issued the client will cache the dirty metadatas, while if a write request, mkdir request for example, comes it will try to update the parents' mtime without trigger to revoke the corresponding caps. This will cause the mtime will be overwrote later after the dirty caps is flushed from clients.

We need to acquire the xlock for the parent inode.

And also we could voluntarily drop Xx/Lx/Ax caps for some requests to avoid revocation msgs.

So, process_request_cap_release() sends revoke message to a client and starts processing the client request immediately. I wonder why its done that way since ideally it should wait for the revoke ack as its done in path_traverse like your change to acquire parent dir xlock(filelock).

@vshankar
Copy link
Contributor

If the 'Fx' caps is issued the client will cache the dirty metadatas, while if a write request, mkdir request for example, comes it will try to update the parents' mtime without trigger to revoke the corresponding caps. This will cause the mtime will be overwrote later after the dirty caps is flushed from clients.
We need to acquire the xlock for the parent inode.
And also we could voluntarily drop Xx/Lx/Ax caps for some requests to avoid revocation msgs.

So, process_request_cap_release() sends revoke message to a client and starts processing the client request immediately. I wonder why its done that way since ideally it should wait for the revoke ack as its done in path_traverse like your change to acquire parent dir xlock(filelock).

Circling back on this: adding xlock(filelock) in MDCache::path_traverse is correct, however, it could be the case that the revoke ack from the client (from process_request_cap_release()) is processed after revoking Fx caps for the parent directory. Would the mtime be overwritten in this case too? @lxbsz

@lxbsz
Copy link
Member Author

lxbsz commented Sep 18, 2023

If the 'Fx' caps is issued the client will cache the dirty metadatas, while if a write request, mkdir request for example, comes it will try to update the parents' mtime without trigger to revoke the corresponding caps. This will cause the mtime will be overwrote later after the dirty caps is flushed from clients.
We need to acquire the xlock for the parent inode.
And also we could voluntarily drop Xx/Lx/Ax caps for some requests to avoid revocation msgs.

So, process_request_cap_release() sends revoke message to a client and starts processing the client request immediately. I wonder why its done that way since ideally it should wait for the revoke ack as its done in path_traverse like your change to acquire parent dir xlock(filelock).

@vshankar Sorry for late to reply this.

In the client request it will try to volunteer to release some caps those could be used by the MDS side, and it will only volunteer to release the caps which are not used/dirty in client side.

So in the process_request_cap_release() it just try to eval the Lockers to help speed up the client requests handling later, and all the necessary revocations should be done and guaranteed by the client request handling code themselves IMO.

Circling back on this: adding xlock(filelock) in MDCache::path_traverse is correct, however, it could be the case that the revoke ack from the client (from process_request_cap_release()) is processed after revoking Fx caps for the parent directory. Would the mtime be overwritten in this case too? @lxbsz

That doesn't matter, when the first revocation ack back it will wake up the waiters and then could continue and correctly handle the mtime.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Nov 17, 2023
@lxbsz lxbsz removed the stale label Nov 17, 2023
@lxbsz
Copy link
Member Author

lxbsz commented Nov 17, 2023

jenkins test make check

1 similar comment
@vshankar
Copy link
Contributor

jenkins test make check

@vshankar
Copy link
Contributor

Test runs in ~2h - wip-vshankar-testing-20231127.102654

@vshankar
Copy link
Contributor

vshankar commented Dec 4, 2023

https://pulpito.ceph.com/?branch=wip-vshankar-testing-20231127.102654

(rhel pkg install failures are a bunch, so, this would need a revalidate)

@vshankar
Copy link
Contributor

vshankar commented Dec 7, 2023

@lxbsz I think this change is causing the MDS to assert during path traversal and acquiring xlock.

2023-12-06T14:40:49.685 INFO:tasks.ceph.mds.b.smithi137.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-22-g9488dd55/rpm/el9/BUILD/ceph-19.0.0-22-g9488dd55/src/mds/Locker.cc: 1963: FAILED ceph_assert(lock->get_sm()->can_remote_xlock)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: ceph version 19.0.0-22-g9488dd55 (9488dd55a18030b93f338d78f72b42610452f71c) reef (dev)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f3cf856eef8]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x16f0b6) [0x7f3cf856f0b6]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 3: (Locker::xlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&)+0x939) [0x55da81e3b599]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 4: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x20f5) [0x55da81e30825]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 5: (MDCache::path_traverse(boost::intrusive_ptr<MDRequestImpl>&, ContextFactory<MDSContext>&, filepath const&, int, std::vector<CDentry*, std::allocator<CDentry*> >*, CInode**)+0x79b) [0x55da81ddf34b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 6: (Server::rdlock_path_xlock_dentry(boost::intrusive_ptr<MDRequestImpl>&, bool, bool, bool, bool)+0x371) [0x55da81d19681]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 7: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x90) [0x55da81d3b340]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 8: (Server::handle_peer_auth_pin_ack(boost::intrusive_ptr<MDRequestImpl>&, boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x81f) [0x55da81d1602f]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 9: (Server::handle_peer_request_reply(boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x6bc) [0x55da81d1691c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 10: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x97) [0x55da81d025d7]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 11: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x4eb) [0x55da81c8657b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 12: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x55da81c86a6c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 13: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x195) [0x55da81c70595]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 14: (DispatchQueue::entry()+0x53a) [0x7f3cf876040a]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 15: /usr/lib64/ceph/libceph-common.so.2(+0x3f24a1) [0x7f3cf87f24a1]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 16: /lib64/libc.so.6(+0x9f802) [0x7f3cf7c9f802]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr: 17: /lib64/libc.so.6(+0x3f450) [0x7f3cf7c3f450]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr:*** Caught signal (Aborted) **

/a/vshankar-2023-12-06_12:01:15-fs-wip-vshankar-testing-20231127.102654-r1-distro-default-smithi/7480089

@lxbsz
Copy link
Member Author

lxbsz commented Dec 7, 2023

@lxbsz I think this change is causing the MDS to assert during path traversal and acquiring xlock.

2023-12-06T14:40:49.685 INFO:tasks.ceph.mds.b.smithi137.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-22-g9488dd55/rpm/el9/BUILD/ceph-19.0.0-22-g9488dd55/src/mds/Locker.cc: 1963: FAILED ceph_assert(lock->get_sm()->can_remote_xlock)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: ceph version 19.0.0-22-g9488dd55 (9488dd55a18030b93f338d78f72b42610452f71c) reef (dev)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f3cf856eef8]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x16f0b6) [0x7f3cf856f0b6]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 3: (Locker::xlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&)+0x939) [0x55da81e3b599]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 4: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x20f5) [0x55da81e30825]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 5: (MDCache::path_traverse(boost::intrusive_ptr<MDRequestImpl>&, ContextFactory<MDSContext>&, filepath const&, int, std::vector<CDentry*, std::allocator<CDentry*> >*, CInode**)+0x79b) [0x55da81ddf34b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 6: (Server::rdlock_path_xlock_dentry(boost::intrusive_ptr<MDRequestImpl>&, bool, bool, bool, bool)+0x371) [0x55da81d19681]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 7: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x90) [0x55da81d3b340]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 8: (Server::handle_peer_auth_pin_ack(boost::intrusive_ptr<MDRequestImpl>&, boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x81f) [0x55da81d1602f]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 9: (Server::handle_peer_request_reply(boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x6bc) [0x55da81d1691c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 10: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x97) [0x55da81d025d7]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 11: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x4eb) [0x55da81c8657b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 12: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x55da81c86a6c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 13: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x195) [0x55da81c70595]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 14: (DispatchQueue::entry()+0x53a) [0x7f3cf876040a]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 15: /usr/lib64/ceph/libceph-common.so.2(+0x3f24a1) [0x7f3cf87f24a1]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 16: /lib64/libc.so.6(+0x9f802) [0x7f3cf7c9f802]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr: 17: /lib64/libc.so.6(+0x3f450) [0x7f3cf7c3f450]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr:*** Caught signal (Aborted) **

/a/vshankar-2023-12-06_12:01:15-fs-wip-vshankar-testing-20231127.102654-r1-distro-default-smithi/7480089

Okay, let me have a check. Thanks

@lxbsz
Copy link
Member Author

lxbsz commented Dec 27, 2023

@lxbsz I think this change is causing the MDS to assert during path traversal and acquiring xlock.

2023-12-06T14:40:49.685 INFO:tasks.ceph.mds.b.smithi137.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-22-g9488dd55/rpm/el9/BUILD/ceph-19.0.0-22-g9488dd55/src/mds/Locker.cc: 1963: FAILED ceph_assert(lock->get_sm()->can_remote_xlock)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: ceph version 19.0.0-22-g9488dd55 (9488dd55a18030b93f338d78f72b42610452f71c) reef (dev)
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f3cf856eef8]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x16f0b6) [0x7f3cf856f0b6]
2023-12-06T14:40:49.687 INFO:tasks.ceph.mds.b.smithi137.stderr: 3: (Locker::xlock_start(SimpleLock*, boost::intrusive_ptr<MDRequestImpl>&)+0x939) [0x55da81e3b599]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 4: (Locker::acquire_locks(boost::intrusive_ptr<MDRequestImpl>&, MutationImpl::LockOpVec&, CInode*, bool)+0x20f5) [0x55da81e30825]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 5: (MDCache::path_traverse(boost::intrusive_ptr<MDRequestImpl>&, ContextFactory<MDSContext>&, filepath const&, int, std::vector<CDentry*, std::allocator<CDentry*> >*, CInode**)+0x79b) [0x55da81ddf34b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 6: (Server::rdlock_path_xlock_dentry(boost::intrusive_ptr<MDRequestImpl>&, bool, bool, bool, bool)+0x371) [0x55da81d19681]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 7: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x90) [0x55da81d3b340]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 8: (Server::handle_peer_auth_pin_ack(boost::intrusive_ptr<MDRequestImpl>&, boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x81f) [0x55da81d1602f]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 9: (Server::handle_peer_request_reply(boost::intrusive_ptr<MMDSPeerRequest const> const&)+0x6bc) [0x55da81d1691c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 10: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x97) [0x55da81d025d7]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 11: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x4eb) [0x55da81c8657b]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 12: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x5c) [0x55da81c86a6c]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 13: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x195) [0x55da81c70595]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 14: (DispatchQueue::entry()+0x53a) [0x7f3cf876040a]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 15: /usr/lib64/ceph/libceph-common.so.2(+0x3f24a1) [0x7f3cf87f24a1]
2023-12-06T14:40:49.688 INFO:tasks.ceph.mds.b.smithi137.stderr: 16: /lib64/libc.so.6(+0x9f802) [0x7f3cf7c9f802]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr: 17: /lib64/libc.so.6(+0x3f450) [0x7f3cf7c3f450]
2023-12-06T14:40:49.689 INFO:tasks.ceph.mds.b.smithi137.stderr:*** Caught signal (Aborted) **

/a/vshankar-2023-12-06_12:01:15-fs-wip-vshankar-testing-20231127.102654-r1-distro-default-smithi/7480089

@vshankar I found the root cause for this failure. For example for '/mydir/fileA, when deleting the 'fileA' the clients will send the unlinkrequest to thefileAdentry'sauth MDS`.

But for mydir/ directory the corresponding inode's auth MDS won't be always be the same with fileA. So when we try to get the xlock of mydir/ it will trigger the above ceph assert.

Currently for the 'sm_filelock' it won't allow to do the remote xlock, please see can_remote_xlock:

126 const struct sm_t sm_filelock = {
127         .states = filelock,
128         .allowed_ever_auth = (CEPH_CAP_GSHARED |
129                               CEPH_CAP_GEXCL |
130                               CEPH_CAP_GCACHE |
131                               CEPH_CAP_GRD |
132                               CEPH_CAP_GWR |
133                               CEPH_CAP_GWREXTEND |
134                               CEPH_CAP_GBUFFER | 
135                               CEPH_CAP_GLAZYIO),
136         .allowed_ever_replica = (CEPH_CAP_GSHARED |
137                                  CEPH_CAP_GCACHE |
138                                  CEPH_CAP_GRD | 
139                                  CEPH_CAP_GLAZYIO),
140         .careful = (CEPH_CAP_GSHARED | 
141                     CEPH_CAP_GEXCL | 
142                     CEPH_CAP_GCACHE |
143                     CEPH_CAP_GBUFFER),
144         .can_remote_xlock = 0,
145 };

By going through the code I didn't find any reason why we couldn't. Currently only the sm_simplelock allows it. Any thought or caution about enabling it ?

Locally I just enabled it and now still running the tests, till now everything worked well.

@batrick @gregsfortytwo Any thought ? Thanks.

@vshankar vshankar self-assigned this Jan 19, 2024
@vshankar vshankar requested a review from a team January 19, 2024 07:42
@mchangir mchangir added the wip-mchangir-testing not yet production ready label Mar 18, 2024
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@mchangir
Copy link
Contributor

Teuthology Job Status

Job: snap-schedule
URL: https://pulpito.ceph.com/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/
Status: 11 jobs failed with PG_AVAILABILITY issues, whereas they actual tests passed as seen below

mchangir:logs$ for log in $(find /teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi -mindepth 2 -maxdepth 2 -name 'teuthology.log'); do grep -H 'cephfs_test_runner:OK' $log; done
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616249/teuthology.log:2024-03-22T12:09:40.070 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616252/teuthology.log:2024-03-22T12:11:43.760 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616255/teuthology.log:2024-03-22T12:10:44.872 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616251/teuthology.log:2024-03-22T12:20:18.371 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616258/teuthology.log:2024-03-22T12:15:45.875 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616254/teuthology.log:2024-03-22T12:20:04.651 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616248/teuthology.log:2024-03-22T12:22:11.607 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616250/teuthology.log:2024-03-22T12:23:09.965 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616256/teuthology.log:2024-03-22T12:22:24.853 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616257/teuthology.log:2024-03-22T12:20:46.343 INFO:tasks.cephfs_test_runner:OK
/teuthology/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/7616253/teuthology.log:2024-03-22T12:22:12.682 INFO:tasks.cephfs_test_runner:OK

Job: fs:upgrade
URL: https://pulpito.ceph.com/mchangir-2024-03-22_09:46:06-fs:upgrade-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/
Status: 5 of 13 failed

all are known issues

Job: fs:libcephfs
URL: https://pulpito.ceph.com/mchangir-2024-03-22_09:48:09-fs:libcephfs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/
Status: 2 fail, 2 pass

all are known issues

Job: snap_schedule
URL: https://pulpito.ceph.com/mchangir-2024-03-22_09:40:19-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/
Status: 12 of 12 passed

Wiki: https://tracker.ceph.com/projects/cephfs/wiki/Main#2024-03-25

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label May 24, 2024
@lxbsz
Copy link
Member Author

lxbsz commented May 27, 2024

Teuthology Job Status

Job: snap-schedule URL: https://pulpito.ceph.com/mchangir-2024-03-22_09:41:51-fs-wip-mchangir-testing-main-20240318.032620-testing-default-smithi/ Status: 11 jobs failed with PG_AVAILABILITY issues, whereas they actual tests passed as seen below

@mchangir Sorry for missing this message.

BTW, do you mean the PG_AVAILABILITY issue is related to this ?

@github-actions github-actions bot removed the stale label May 27, 2024
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case was simple, let's add it?

if (depth > 0 || !mdr->lock_cache) {
lov.add_wrlock(&cur->filelock);
if (xlock_dentry_parent) {
lov.add_xlock(&cur->filelock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be appropriate I think. The filelock is deliberately configured to allow multiple writers (yes, even and especially for directory inodes) otherwise we will serialize every file create. (Please test this to confirm.)

What appears to be missing is that a wrlock on the filelock is not correctly revoking (or waiting for the revoke of) Fx on the directory inode. No?

Copy link
Member Author

@lxbsz lxbsz Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be appropriate I think. The filelock is deliberately configured to allow multiple writers (yes, even and especially for directory inodes) otherwise we will serialize every file create. (Please test this to confirm.)

Yeah, this really will serialize the creates.

What appears to be missing is that a wrlock on the filelock is not correctly revoking (or waiting for the revoke of) Fx on the directory inode. No?

Yeah, correct. Maybe we can fix it by revoking the Fx when acquiring the wrlock ? Then it should be simple:

diff --git a/src/mds/SimpleLock.h b/src/mds/SimpleLock.h
index 6f1d049ea0a..2562d8c1143 100644
--- a/src/mds/SimpleLock.h
+++ b/src/mds/SimpleLock.h
@@ -358,14 +358,12 @@ public:
   bool can_wrlock(client_t client) const {
     return get_sm()->states[state].can_wrlock == ANY ||
       (get_sm()->states[state].can_wrlock == AUTH && parent->is_auth()) ||
-      (get_sm()->states[state].can_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client ||
-                                                                   get_excl_client() == client));
+      (get_sm()->states[state].can_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client));
   }
   bool can_force_wrlock(client_t client) const {
     return get_sm()->states[state].can_force_wrlock == ANY ||
       (get_sm()->states[state].can_force_wrlock == AUTH && parent->is_auth()) ||
-      (get_sm()->states[state].can_force_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client ||
-                                                                         get_excl_client() == client));
+      (get_sm()->states[state].can_force_wrlock == XCL && client >= 0 && (get_xlock_by_client() == client));
   }
   bool can_xlock(client_t client) const {
     return get_sm()->states[state].can_xlock == ANY ||

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the new patch and it worked well.

@lxbsz
Copy link
Member Author

lxbsz commented Jun 20, 2024

Currently only the sm_simplelock allows it

Sure.

@lxbsz lxbsz force-pushed the wip-61584 branch 2 times, most recently from d2a7040 to 18d3fce Compare June 20, 2024 07:43
@github-actions github-actions bot added the tests label Jun 20, 2024
(type->type == CEPH_LOCK_IFILE ||
type->type == CEPH_LOCK_IAUTH ||
type->type == CEPH_LOCK_ILINK ||
type->type == CEPH_LOCK_IXATTR)))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this problem actually unique to directories? Don't we want to revoke Fx even for regular files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for parent directory will have this issue.

src/mds/locks.c Outdated
CEPH_CAP_GCACHE |
CEPH_CAP_GBUFFER),
.can_remote_xlock = 0,
.can_remote_xlock = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ASSERT_EQ(ceph_statx(cmount, c_dir, &stx2, CEPH_STATX_MTIME, 0), 0);
ASSERT_NE(utime_t(stx1.stx_mtime), utime_t(stx2.stx_mtime));

ceph_shutdown(cmount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

void state_set(unsigned mask) { state |= mask; }
void state_reset(unsigned s) { state = s; }

bool is_dir() const { return state_test(STATE_DIR); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better would be a virtual bool is_dir() const that is overriden here (and defined in MDSCacheObject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


if (force || (drop & cap.issued)) {
if (drop & cap.issued) {
ldout(cct, 25) << "dropping caps " << ccap_string(drop) << dendl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ldout(cct, 25) << "dropping caps " << ccap_string(drop) << dendl;
ldout(cct, 20) << "dropping caps " << ccap_string(drop) << dendl;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it.

src/mds/CDir.h Outdated

void mark_complete();

bool is_dir() const { return true; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool is_dir() const { return true; }
bool is_dir() const override { return true; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lxbsz added 5 commits July 3, 2024 10:36
If the 'Fx' caps is issued the client will cache the dirty metadatas,
while if a write request, mkdir request for example, comes it will
try to update the parents' mtime without trigger to revoke the
corresponding caps. This will cause the mtime will be overwrote
later after the dirty caps is flushed from clients.

So for loner client's requests we need to revoke the 'Fx' caps anyway.

Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
MDS need to rdlock directory inode's authlock when handling these
requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke
message.

This is just sync up with kclient commit 222b7f90ba38 ("ceph:
voluntarily drop Ax cap for requests that create new inode")

Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
MDS need to xlock inode's linklock when handling link/rename requests.
Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message.

This is just a sync up with kclient commit d19a0b540182 ("ceph:
voluntarily drop Lx cap for link/rename requests")

Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
For write requests the parent's mtime will be updated correspondingly.
And if the 'Xx' caps is issued and when releasing other caps together
with the write requests the MDS Locker will try to eval the xattr lock,
which need to change the locker state excl --> sync and need to do Xx
caps revocation.

Just voluntarily dropping CEPH_CAP_XATTR_EXCL caps to avoid a cap
revoke message, which could cause the mtime will be overwrote by stale
one.

Fixes: https://tracker.ceph.com/issues/61584
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@vshankar
Copy link
Contributor

@mchangir pinging on this since I see your testing tag.

@vshankar
Copy link
Contributor

@mchangir pinging on this since I see your testing tag.

ping @mchangir ?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 29, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System stale tests wip-mchangir-testing not yet production ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants