Skip to content

mds: add quiesce protocol#54581

Merged
vshankar merged 65 commits intoceph:mainfrom
batrick:mds-quiesce
Mar 22, 2024
Merged

mds: add quiesce protocol#54581
vshankar merged 65 commits intoceph:mainfrom
batrick:mds-quiesce

Conversation

@batrick
Copy link
Member

@batrick batrick commented Nov 20, 2023

See also: #54485

Currently TODO:

  • New ceph.quiesce.ignore vxattr for files (for use by volumes "meta" file)
  • Code TODOs

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@batrick batrick added the cephfs Ceph File System label Nov 20, 2023
@batrick
Copy link
Member Author

batrick commented Nov 20, 2023

cc @leonid-s-usov

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a working solution here :)

I'm concerned with the need to increase the gather context cardinality. Are we expecting more than 32 bits of individual sub-operations for the quiesce?

Another thing I couldn't spot is how new CInodes that get loaded under the quiescing realm will be immediately locked.

@batrick
Copy link
Member Author

batrick commented Nov 21, 2023

I'm concerned with the need to increase the gather context cardinality. Are we expecting more than 32 bits of individual sub-operations for the quiesce?

I would surely hope not. It was just defensive programming.

@batrick
Copy link
Member Author

batrick commented Nov 21, 2023

Another thing I couldn't spot is how new CInodes that get loaded under the quiescing realm will be immediately locked.

The xlock on the filelock should prevent lookup/addition of new inodes (in fact it's missing for the root of the subvolume, I'll add that).

@leonid-s-usov
Copy link
Contributor

leonid-s-usov commented Nov 22, 2023

The xlock on the filelock should prevent the lookup/addition of new inodes

We may go with this for the first iteration, but do we really want to prevent read-only access to new files while the quiesce is active? It's overly restrictive, isn't it?

@batrick
Copy link
Member Author

batrick commented Jan 10, 2024

@leonid-s-usov updated with latest code. Still have some tests I'm working on.

@batrick batrick requested a review from a team as a code owner January 10, 2024 22:15
@batrick batrick force-pushed the mds-quiesce branch 2 times, most recently from 398a0b7 to 8dadedc Compare January 11, 2024 03:08
@batrick
Copy link
Member Author

batrick commented Jan 11, 2024

Added a commit for some initial developer docs on the protocol.

Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting stuff. I've left a number of nitpicky documentarian suggestions.

@batrick
Copy link
Member Author

batrick commented Jan 11, 2024

Interesting stuff. I've left a number of nitpicky documentarian suggestions.

Thanks for the review, pushed changes.

@batrick
Copy link
Member Author

batrick commented Jan 11, 2024

@batrick
Copy link
Member Author

batrick commented Jan 11, 2024

I cannot inline reply for some reason so leaving a standalone comment:

so now that we have the quiescelock, and assuming I understood correctly that other operations are now required to READLOCK it in addition to any other lock they might want, then why do we need to readlock all of those here, wouldn't it be sufficient to only quiescelock exclusively?

Apparently, I'm missing something, so please clarify

Locking the other inodes is also important as that will recall capabilities which do not hold the quiescelock. Exlusive locking the quiescelock only would simply prevent new requests from being processed.

@batrick batrick force-pushed the mds-quiesce branch 2 times, most recently from 56e7db2 to 6138e27 Compare January 12, 2024 19:07
@vshankar vshankar requested a review from a team January 17, 2024 10:05
batrick added 11 commits March 20, 2024 10:56
This avoids waking up all waiters when only WAIT_XLOCK waiters should wake.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The previous scheme invalidated a lock cache and then immediately removed it
from its Capability list. The lock cache would eventually be deleted but a new
one could be constructed shortly after. The main reason for this is that simply
invalidating the lock cache does not drive a state change in the local locks
preventing new writers. This is mostly important for acquiring the quiescelock.

This commit also corrects a bug where a MDLockCache would be created for a
given opcode type (like create) when the capability does not have the issued
cap (CEPH_CAP_DIR_CREATE). The bug would not cause any negative side-effects
but would hold locks unnecessarily when only MDS ops (and not the client
executing ops asynchronously) are acquiring the locks.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is no longer necessary with the change to a LocalLock quiescelock.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This reverts commit 16e50ab.

This flag is no longer necessary as the volumes plugin issues quiesce calls
against the data (i.e. root) directory of the subvolume rather than the
subvolume directory (with its associated .meta file).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Instead of requiring the caller to put the arguments in a list, allow passing
as regular arguments.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Otherwise it gets included in the *args list. This is necessary after commit
`qa: simplify calls to (rank|mds)_(tell|asok)`.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
…round jobs

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
For storing misc. test artifacts.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
- Verify multirank quiesce incl. caps issued.
- Unset splitauth experimental

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
To include changes relating to it now being a local lock that prevents mutable
caps.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@vshankar
Copy link
Contributor

@batrick Let me know if you want me to verify the fs suite run for this change (also, please link the latest one if that's the case).

@batrick
Copy link
Member Author

batrick commented Mar 20, 2024

@batrick Let me know if you want me to verify the fs suite run for this change (also, please link the latest one if that's the case).

I'm planning to run it again with other fixes I have for main. See wip-pdonnell-testing

@batrick
Copy link
Member Author

batrick commented Mar 21, 2024

https://tracker.ceph.com/projects/cephfs/wiki/Main#2024-03-20

So this has been quite an odyssey. main is currently a mess but none of the failures appear related to this PR. For example, see an older run that I was iterating on to fix QA test failures:

https://pulpito.ceph.com/pdonnell-2024-03-14_15:20:35-fs-wip-batrick-testing-20240310.014948-distro-default-smithi/

I think this is ready for mergig. @vshankar please do the honors. Also these:

wip-pdonnell-testing

@batrick
Copy link
Member Author

batrick commented Mar 21, 2024

retest this please

@batrick
Copy link
Member Author

batrick commented Mar 21, 2024

jenkins test api

@batrick
Copy link
Member Author

batrick commented Mar 22, 2024

jenkins test dashboard cephadm

@batrick
Copy link
Member Author

batrick commented Mar 22, 2024

jenkins test make check arm64

@leonid-s-usov
Copy link
Contributor

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants