mds: async dir operation support#27866
Conversation
28eed40 to
ea3ee58
Compare
7bd4181 to
0ae14d0
Compare
31a90ce to
f2cb698
Compare
|
kernel patches https://github.com/ceph/ceph-client/tree/ceph-async-unlink |
ffb17f3 to
3a7801b
Compare
|
An old run http://pulpito.ceph.com/zyan-2019-12-06_02:10:01-fs-wip-mds-locking-order-testing-basic-smithi/. The failed snaptests are because libc of rhel7 has no syncfs syscall |
src/mds/SimpleLock.h
Outdated
| case CEPH_LOCK_IFLOCK: return "iflock"; | ||
| case CEPH_LOCK_IPOLICY: return "ipolicy"; | ||
| default: ceph_abort(); return std::string_view(); | ||
| default: return "unkown"; |
| lock->get_parent()->auth_pin(lock); | ||
| } | ||
|
|
||
| void Locker::xlock_downgrade(SimpleLock *lock, MutationImpl *mut) |
There was a problem hiding this comment.
nit: "instrcuts" => "instructs" in commit message
|
http://pulpito.ceph.com/zyan-2019-12-10_08:41:40-fs-wip-mds-locking-order-testing-basic-smithi/ The failed snaptests are because libc of rhel7 has no syncfs syscall. No clue why libcephfs interface failed once |
To take locks in top-down order for a MDRequest, we need to first take snap/policy rdlocks on ancestor inodes of the request's base inode. It's not convenient to use Locker::acquire_locks() to do the job because path to request's base inode can change before all of these locks are rdlocked. This patch introduces Locker::try_rdlock_snap_layout(), which tries taking snap/policy rdlocks on request's base inode and its ancestors all at the same time. MDCache::path_traverse() calls this function at first, then uses Locker::acquire_locks() to take snaplock on components of request's path. This patch also reorders inode locks, put snaplock and policy at the front. Because some requests (such as setattr) may xlock other locks after taking snaplock/policylock. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Are you sure? syncfs has been around a long time. In fact, looking at my centos7.7 VM: Note that programs that use it do need to define |
Introduce MDS_TRAVERSE_RDLOCK_PATH flag, which instrcuts MDCache::path_traverse() to lock dentries during path traverse. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Introduce MDS_TRAVERSE_XLOCK_DENTRY, which instructs MDCache::path_traverse() to take appropriate locks (xlock dentry, wrlock directory's filelock/nestlock) for file create/deletion. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
The helper function is for requests that operate on two paths. It
ensures that the two paths get locks in proper order. The rule is:
1. Lock directory inodes or dentries according to which trees they
are under. Lock objects under fs root before objects under mdsdir.
2. Lock directory inodes or dentries according to their depth, in
ascending order.
3. Lock directory inodes or dentries according to inode numbers or
dentries' parent inode numbers, in ascending order.
4. Lock dentries in the same directory in order of their keys.
5. Lock non-directory inodes according to inode numbers, in ascending
order.
This patch also makes handle_client_link() and handle_client_rename()
to use this helper function.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
If an inode's snaplock is locked by a MDRequest, the inode can not be renamed by other MDRequest. For rename case, snaplocks of all inodes in paths are locked. So the paths are stable while handling slave rename. It's OK to discover all components (even they are xlocked) in the paths. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
MDS now relies on snaplocks to ensure that paths for slave request are stable. MDCache::handle_find_ino_reply() may encounter xlocked dentry during path traverse. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Take snap rdlocks (instead of taking dentry locks) on subtree's ancestor inodes. Path to subtree is stable after they are all locked. For dirfragtree locks on subtree bounds, it's not convenient to rdlock them in top-down order (paths to them are not stable). The solution is kicking them to SYNC state and try taking rdlocks on all of them. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
The lock cache preserves locks and authpins required for directory operations. MDS can create a lock cache when it has acquired all locks of a directory operation. The lock cache can be used to for later operations of the same type on the same directory. For example, when mds has acquired all locks of a unlink operation, it creates a lock cache, which holds holds wrlocks on direcotry inode's filelock and nestlock, rdlocks on ancestor inodes' snaplocks. For later unlink operations on the same directory, MDS only needs to xlock the dentry to unlink and xlock linklock of the inode to unlink. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Add a list to SimpleLock, which tracks lock caches which hold locks on SimpleLock itself. When mds want to change lock state, it can find lock caches and invalidate them. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Add a list to CDir, which tracks lock caches which hold auth pins on objects in the CDir. When mds want to freeze dirfrag or subtree, it can find lock caches and invalidate them Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Frozen inode in directory with lock caches may cause ABBA deadlock. The requeset that owns the frozen inode can't get locks because lock cache is holding conflicting locks. The request that uses lock cache can't auth pin the frozen inode. The solution is not creating lock cache when directory contains freezing or frozen inode. When mds starts freezing an inode, invalidate all lock caches on its parent directory. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Define cap bits for async dir operation. Lock cache for a given type of dir operation can be delegeted to client through cap mechanism. As long as client holds correspondindg cap, dir operation requests from the client can use the lock cache. If mds want to invalidate a lock cache, it needs to first revoke corresponding cap from client. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
later patch will introduce new mask bits Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
when client has 'CEPH_CAP_FILE_EXCL|CEPH_CAP_DIR_UNLINK' caps of a directory inode, it can asynchronously unlink file from the directory if the corresponding dentry is a primary link. when client has 'CEPH_CAP_FILE_EXCL|CEPH_CAP_DIR_CREATE' caps of a directory inode, it can asynchronously create new file in the directory as long as no file with the same name exists. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Previous commit makes Server::rdlock_path_pin_ref() rdlock snaplock and policylock. Deadlock happens if we use this function for requests that xlock snaplock or snaplock. Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
629e5e0 to
120d70d
Compare
|
you are right. I wrongly read the log. snaptests failures were caused by a bug in this PR |
|
http://pulpito.ceph.com/zyan-2019-12-18_13:40:19-fs-wip-mds-locking-order-testing-basic-smithi/ I think this is ready to merge |
batrick
left a comment
There was a problem hiding this comment.
Great work on this Zheng. I haven't looked at the code in detail yet but let's get this in now so you can continue to monitor nightly testing over the next few weeks.
* refs/pull/27866/head: mds: fix deadlock when xlocking policylock mds: handle link request with zero depth filepath2 mds: enable lock cache for openc/unlink requests mds: include linkage type in dentry lease mds: cleanup Server::set_trace_dist() mds: define lease mask bits mds: delegete lock cache to client mds: suppress frozen inode when locks of dir operation is cached. mds: invalidate lock caches when freezing dirfrag/subtree mds: invalidate lock caches if they hold conflicting locks mds: initial code for lock cache mds: adjust locking for subtree migration mds: add 'path_locked' flag to MDCache::find_ino_peers() mds: change MDCache::discover_path()'s wants_xlocked semantic mds: introduce Server::rdlock_two_paths_xlock_destdn() mds: make Server::rdlock_path_xlock_dentry take locks mds: make Server::rdlock_path_pin_ref() take dentry rdlocks mds: take snaplock and policylock during path traverse. mds: let Locker::acquire_locks()'s caller choose locking order Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Change the order of taking locks to:
inode numbers, in ascending order.