Skip to content

mds/MDSDaemon: unlock mds_lock while shutting down Beacon and others#60326

Merged
vshankar merged 1 commit intoceph:mainfrom
MaxKellermann:MDSDaemon_suicide_unlock
Aug 4, 2025
Merged

mds/MDSDaemon: unlock mds_lock while shutting down Beacon and others#60326
vshankar merged 1 commit intoceph:mainfrom
MaxKellermann:MDSDaemon_suicide_unlock

Conversation

@MaxKellermann
Copy link
Member

@MaxKellermann MaxKellermann commented Oct 15, 2024

This fixes a deadlock bug during MDS shutdown:

  • the "signal_handler" thread receives the shutdown signal and invokes
    MDSDaemon::suicide() while holding mds_lock

  • MDSDaemon::suicide() invokes Beacon::send_and_wait() while still
    holding mds_lock

  • meanwhile, all "ms_dispatch" threads get stuck waiting for
    mds_lock, for example in MDCache::upkeep_main() or
    MDSDaemon::ms_dispatch2()

  • Beacon::send_and_wait() waits for a MSG_MDS_BEACON packet to be
    dispatched (via cvar with a timeout)

At this point, even if a MSG_MDS_BEACON packet is received by one of
the worker threads, they will put it in the DispatchQueue, but no
dispatcher thread will be able to handle it because they are all
stuck. The cvar.wait_for() clal in Beacon::send_and_wait() will
therefore time out and the MSG_MDS_BEACON will never be processed.

The proper solution is to unlock mds_lock to avoid the dispatchers
from getting stuck. And in general, we should be holding a lock
strictly only when it is needed and never do blocking calls while
holding a lock.

Fixes: https://tracker.ceph.com/issues/68760

This used to be part of #60320

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

@github-actions github-actions bot added the cephfs Ceph File System label Oct 15, 2024
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a tracker ticket.

@MaxKellermann
Copy link
Member Author

@MaxKellermann MaxKellermann force-pushed the MDSDaemon_suicide_unlock branch from 60b532c to 4211071 Compare October 29, 2024 20:38
@MaxKellermann MaxKellermann force-pushed the MDSDaemon_suicide_unlock branch from 4211071 to c1df8f3 Compare October 31, 2024 10:09
@MaxKellermann
Copy link
Member Author

Note: MDSRank::damaged needs adjusted too.

True. What a weird piece of code! (and [[noreturn]] is missing)
Fixing this one would require passing a unique_lock to it, propagating it through all callers. That's a bigger effort.
(Or just call unlock() manually. Which would be dirty. But acceptable for this one method?)

@batrick
Copy link
Member

batrick commented Nov 4, 2024

Note: MDSRank::damaged needs adjusted too.

True. What a weird piece of code! (and [[noreturn]] is missing) Fixing this one would require passing a unique_lock to it, propagating it through all callers. That's a bigger effort. (Or just call unlock() manually. Which would be dirty. But acceptable for this one method?)

I think just calling unlock is fine with an appropriate comment. We don't expect the MDS to continue normally so it's fine.

I am wondering about the wisdom of letting other threads in the MDS (like MDCache::upkeeper) continue; not all threads are gated by the MDSDaemon::ms_dispatch2 _DNE check. Probably all points that acquire mds_lock should also check for some "terminate" flag and sit quietly (until the beacon reply is processed and the MDS exits).

@MaxKellermann
Copy link
Member Author

@batrick, it's been a month. Is this going to be merged?

@MaxKellermann
Copy link
Member Author

@batrick, it's been two months now.

@batrick
Copy link
Member

batrick commented Jan 5, 2025

@batrick, it's been a month. Is this going to be merged?

Sorry, I think I was waiting to see if you were going to adjust the code to:

  • only unlock after setting the beacon to state to _DNE
  • add a comment with explanation

@MaxKellermann
Copy link
Member Author

Sorry, I think I was waiting to see if you were going to adjust the code to:

* only unlock after setting the beacon to state to _DNE

I don't think that makes sense because the beacon state is protected by Beacon::lock not mds_lock. The proper check you're looking for is the stopping flag; protected by mds_lock and checked by ms_dispatch2() right before the beacon/DNE check.

* add a comment with explanation

I added a comment to the unlock() call.

@MaxKellermann
Copy link
Member Author

@batrick, do you agree or do you still want me to move the unlock?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@vshankar vshankar requested a review from batrick February 10, 2025 13:22
@MaxKellermann
Copy link
Member Author

It's been nearly half a year since I submitted this PR, and it fixes a very annoying Ceph bug. I'm running out of time to get my Ceph improvements merged.

@batrick
Copy link
Member

batrick commented May 22, 2025

Sorry again for the wait.

Sorry, I think I was waiting to see if you were going to adjust the code to:

* only unlock after setting the beacon to state to _DNE

I don't think that makes sense because the beacon state is protected by Beacon::lock not mds_lock. The proper check you're looking for is the stopping flag; protected by mds_lock and checked by ms_dispatch2() right before the beacon/DNE check.

The concern I have is the race which would exist between unlocking the mds_lock and acquiring the Beacon::mutex. That would allow dispatch threads to come in, check the beacon state is not _DNE and do some work (like process client requests).

We already have a mds_lock -> Beacon::mutex lock dependency and routinely check the beacon with the mds lock held (see MDSDaemon::ms_dispatch2).

* add a comment with explanation

I added a comment to the unlock() call.

Thanks.

If you can please just move the unlock after the beacon.set_want_state(*mdsmap, MDSMap::STATE_DNE); I will be happy to approve this.

@MaxKellermann MaxKellermann force-pushed the MDSDaemon_suicide_unlock branch from 0695334 to aed19e4 Compare May 22, 2025 13:00
@MaxKellermann
Copy link
Member Author

If you can please just move the unlock after the beacon.set_want_state(*mdsmap, MDSMap::STATE_DNE); I will be happy to approve this.

done

@batrick batrick removed their assignment May 22, 2025
@batrick
Copy link
Member

batrick commented May 22, 2025

jenkins test api

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 21, 2025
@batrick batrick removed the stale label Jul 22, 2025
@batrick
Copy link
Member

batrick commented Jul 22, 2025

@vshankar @mchangir @kotreshhr for your next batch, please.

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/72244.

This fixes a deadlock bug during MDS shutdown:

- the "signal_handler" thread receives the shutdown signal and invokes
  MDSDaemon::suicide() while holding `mds_lock`

- MDSDaemon::suicide() invokes Beacon::send_and_wait() while still
  holding `mds_lock`

- meanwhile, all "ms_dispatch" threads get stuck waiting for
  `mds_lock`, for example in MDCache::upkeep_main() or
  MDSDaemon::ms_dispatch2()

- Beacon::send_and_wait() waits for a `MSG_MDS_BEACON` packet to be
  dispatched (via `cvar` with a timeout)

At this point, even if a `MSG_MDS_BEACON` packet is received by one of
the worker threads, they will put it in the `DispatchQueue`, but no
dispatcher thread will be able to handle it because they are all
stuck.  The cvar.wait_for() call in Beacon::send_and_wait() will
therefore time out and the `MSG_MDS_BEACON` will never be processed.

The proper solution is to unlock `mds_lock` to avoid the dispatchers
from getting stuck.  And in general, we should be holding a lock
strictly only when it is needed and never do blocking calls while
holding a lock.

Fixes: https://tracker.ceph.com/issues/68760
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
@MaxKellermann MaxKellermann force-pushed the MDSDaemon_suicide_unlock branch from aed19e4 to c0fedb5 Compare July 30, 2025 06:25
@vshankar
Copy link
Contributor

putting this to retest

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar
Copy link
Contributor

vshankar commented Aug 4, 2025

jenkins make check arm64

@vshankar
Copy link
Contributor

vshankar commented Aug 4, 2025

jenkins test make check arm64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System needs-qa

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants