mds: add optimization for replica recall during quiesce#56867
mds: add optimization for replica recall during quiesce#56867
Conversation
ab9fc21 to
04e11ae
Compare
There was a problem hiding this comment.
I really like this approach, it's the simplest possible interface for quiesce, as we discussed here
Then we didn't go for it because we weren't sure how much of a refactoring that would be to make the Locker re-eval all locks and revoke the unwanted caps.
Now I'm a little worried that without causing an explicit transition of locks to known states before we drop the locks may get us into a pool of hard to debug transient issues when unstable locks will suddenly have to re-eval with a different cap set, potentially incompatible with their expectations.
I hope my worries are just that, in which case I would prefer this PR over my variant at #56755, with just one condition: we should avoid using any other locks like `policylock, and have the Locker call us back just because it's seen that we are xlocking the quiesce lock.
It's my concern as well because there may be subtle gotchas. I already found one case where quiesce did not complete with this code but I had forgotten client debugging so I could not diagnose. It's possible with some adjustments this will work fine as-is. Perhaps we can also keep the locks but only use |
04e11ae to
b1ea57a
Compare
I've made this change but I'm actively testing the old approach to see what caused the quiesce to not complete. |
|
That last change makes this less pretty, TBH. Also, please see slack, I have now a different opinion on taking locks on non-auth:
I believe that replica will update the lock states to LOCK_LOCK automatically as the auth will take rd/x locks on the caps-related locks, so the whole thing could be simpler by only taking the quiesce lock on replica and ignore the other locks, see my addition to the xlock PR |
Could it be a policylock-related deadlock? If we have to take the policylock, we should probably call |
b1ea57a to
17a0cf8
Compare
17a0cf8 to
46ce666
Compare
|
I've updated this code as discussed in our meeting. The change is in anticipation of the changes you're working on Leonid in #56755. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
src/mds/MDCache.cc
Outdated
| // as a result of the auth taking the above locks. | ||
|
|
||
| /* kick cap issuance without waiting for auth */ | ||
| if (quiesce_replica_recall && !in->is_auth()) { |
There was a problem hiding this comment.
&& !in->is_auth() is redundant, we are already in the else branch of the inverted test.
There was a problem hiding this comment.
right, this was just the state after a trivial rebase, see now
src/common/options/mds.yaml.in
Outdated
| - mds | ||
| flags: | ||
| - runtime | ||
| - name: mds_cache_quiesce_auth_recall |
There was a problem hiding this comment.
I don't think this one is used anywhere, is it?
leonid-s-usov
left a comment
There was a problem hiding this comment.
I'm not sure if it will make a difference, there should not be any W caps issued by the replicas, if I understand correctly. Otherwise, this looks good.
1f800ae to
ab7d084
Compare
leonid-s-usov
left a comment
There was a problem hiding this comment.
Ok, this looks more to the point now. I still have some comments, see inline
73a86e6 to
ec69f0f
Compare
|
Since we're still taking locks, this doesn't require adjusting tests. It's also off by-default. I may like to use it for some performance measurements. |
Once the quiescelock is held, the replica can proceed with recalling caps so the auth can more rapidly acquire locks when it attempts to quiesce. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
@batrick Here we are calling issue caps before attempting the locks. How does this address those edge cases you mentioned, when locks may be caught in some transitory states by this caps revocation? |
This is an itneresting question. The only two ways to go that makes sense to me is:
I'm not sure there is an unconsidered edge case. The code has changed significantly since I first posted this PR. The deadlock may be gone entirely. It may have been something else entirely. I didn't get to dig into it much due to lack of debugging. What I would like is to get this PR in a state where we can test (performance of) both of the above two cases with config changes but those configs are off by default. |
|
I think we'll forgo this. We already mask the caps allows when the quiescelock is xlocked. I don't think this really does anything useful. |
Need to update tests.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e