mds/cache: don't assume non-auth xlocks to be remote locks#57020
mds/cache: don't assume non-auth xlocks to be remote locks#57020
Conversation
|
jenkins test windows |
4386926 to
e3f10dc
Compare
|
We know we're on the right track by seeing all tests fail after the first fix: https://pulpito.ceph.com/leonidus-2024-04-22_05:52:53-fs-wip-lusov-quiescer-distro-default-smithi/ :D I will be running again soon with the second fix in |
|
These changes have been validated by https://pulpito.ceph.com/leonidus-2024-04-22_12:36:42-fs-wip-lusov-quiescer-distro-default-smithi/. That run shows a few failures, but no signs of what this PR should have prevented, as opposed to the prior similar runs. However, to be on the safe side, this PR should be run separately as part of the fs workload to make sure it doesn't break anything. |
e3f10dc to
6c9a4ad
Compare
6c9a4ad to
e0784bd
Compare
A few places in the code assumed that non-auth xlocks must be remote, which prevented a proper drop lock procedure when those locks turned out to be locallocks. Fixes: https://tracker.ceph.com/issues/65606 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
e0784bd to
3aa055d
Compare
|
I'll add this to my next qa run. |
mchangir
left a comment
There was a problem hiding this comment.
Since we are talking about locks ...
static const uint64_t WAIT_RD = (1<<0); // to read
static const uint64_t WAIT_WR = (1<<1); // to write
static const uint64_t WAIT_XLOCK = (1<<2); // to xlock (** dup)
static const uint64_t WAIT_STABLE = (1<<2); // for a stable state
static const uint64_t WAIT_REMOTEXLOCK = (1<<3); // for a remote xlock
static const int WAIT_BITS = 4;
static const uint64_t WAIT_ALL = ((1<<WAIT_BITS)-1);
why do WAIT_XLOCK and WAIT_STABLE occupy the same bit pos ?
I found this commit, but no explanation about this specific change:
commit a6f5abd95e80a2137c1e3e463fb4bdbcc95e49d2
Author: Sage Weil <sweil@redhat.com>
Date: Tue Jun 19 16:11:50 2007 +0000
* force trim of replicated null dentries that sync to non-null
* fixed authpinnable waits in server (now wait only if frozen; locker->acquire_locks will wait while freezing, and handle auth_pins properly)
git-svn-id: https://ceph.svn.sf.net/svnroot/ceph@1428 29311d96-e01e-0410-9327-a35deaab8ce9
I think it's a combination of history, limited number of wait bits (until Patrick's change to a 128 bitmask in the context of the quiesce project), and the fact that up until now these two wait bits weren't ever needed as separate wait tags. WAIT_STABLE is a superset of the functionality from the perspective of mirrored locks (SimpleLock, ScatterLock, FileLock), and then locallock which also supports xlocking doesn't have the "stable" notion. I think that's what's pinning this wait bit define. In a few places around P.S.: this question is not related to the PR |
thanks for answering this |
|
Jenkins test windows |
|
jenkins test windows |
|
jenkins test make check arm64 |
|
This PR is under test in https://tracker.ceph.com/issues/65661. |
* refs/pull/57020/head: mds/cache: don't assume non-auth xlocks to be remote locks
|
jenkins test make check arm64 |
|
This PR is under test in https://tracker.ceph.com/issues/65694. |
A few places in the code assumed that non-auth xlocks must be remote, which prevented a proper drop lock procedure when those locks turned out to be locallocks. Fixes: https://tracker.ceph.com/issues/65710 Original-Issue: https://tracker.ceph.com/issues/65606 Original-PR: #57020 Signed-off-by: Leonid Usov <leonid.usov@ibm.com> (cherry picked from commit 3aa055d)
A few places in the code assumed that non-auth xlocks must be remote, which prevented a proper drop lock procedure when those locks turned out to be locallocks. Fixes: https://tracker.ceph.com/issues/65710 Original-Issue: https://tracker.ceph.com/issues/65606 Original-PR: #57020 Signed-off-by: Leonid Usov <leonid.usov@ibm.com> (cherry picked from commit 3aa055d)
A few places in the code assumed that non-auth xlocks must be remote, which prevented a proper drop lock procedure when those locks turned out to be locallocks. Fixes: https://tracker.ceph.com/issues/65710 Original-Issue: https://tracker.ceph.com/issues/65606 Original-PR: ceph#57020 Signed-off-by: Leonid Usov <leonid.usov@ibm.com> (cherry picked from commit 3aa055d)
A few places in the code assumed that non-auth xlocks
must be remote, which prevented a proper drop lock procedure
when those locks turned out to be locallocks.
Fixes: https://tracker.ceph.com/issues/65606
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e