Skip to content

mds: fix deadlock between unlinking and linkmerge#52199

Merged
batrick merged 6 commits intoceph:mainfrom
lxbsz:wip-61818
Sep 15, 2023
Merged

mds: fix deadlock between unlinking and linkmerge#52199
batrick merged 6 commits intoceph:mainfrom
lxbsz:wip-61818

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Jun 27, 2023

In case of one inode have multiple hardlinks, and if an unlink
request removes the primary dentry and just before the linkmerge
is triggered a new unlink request comes. The new unlink request
will need the linkmerge to finish.

We should just wait the linkmerge to finish before handling the
new unlink request for the same inode.

Fixes: https://tracker.ceph.com/issues/58340
Fixes: https://tracker.ceph.com/issues/61818

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@lxbsz lxbsz requested review from a team, batrick and vshankar June 27, 2023 06:35
@github-actions github-actions bot added the cephfs Ceph File System label Jun 27, 2023
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a simpler fix here. The core problem is that StrayManager::reintegrate_stray sets the dentry state to REINTEGRATING before the rename op is even dispatched and acquired the necessary locks. Until that happens, we've simply recorded an intent (the op) without actually changing anything or acquiring any locks.

Had Server::handle_client_rename set the reintegration state after acquiring locks and checking for any pending unlinks, we would not have hit this deadlock, no?

@batrick
Copy link
Member

batrick commented Aug 9, 2023

Had Server::handle_client_rename set the reintegration state after acquiring locks and checking for any pending unlinks, we would not have hit this deadlock, no?

Going further, we know it's a reintegration op whenever it's a linkmerge even if the client issued the op itself. Of course, it may be just two remote dentries being merged (therefore no actual "reintegration") but that detail is not important when we're just trying to block a conflicting op on the linkmerge.

@lxbsz
Copy link
Member Author

lxbsz commented Aug 10, 2023

I think there is a simpler fix here. The core problem is that StrayManager::reintegrate_stray sets the dentry state to REINTEGRATING before the rename op is even dispatched and acquired the necessary locks. Until that happens, we've simply recorded an intent (the op) without actually changing anything or acquiring any locks.

Had Server::handle_client_rename set the reintegration state after acquiring locks and checking for any pending unlinks, we would not have hit this deadlock, no?

Not sure I totally get it.

The root cause is that the reintegrate is triggered by the unlink request itself. I found a simper approach, that is skipping checking the is_unlink_pending() or always return true from is_unlink_pending() if the rename is a linkmerge request.

@lxbsz lxbsz requested review from a team, batrick and dparmar18 August 21, 2023 00:44
@lxbsz
Copy link
Member Author

lxbsz commented Aug 21, 2023

@batrick I have simplified the code, please take a look again. Thanks.

@lxbsz
Copy link
Member Author

lxbsz commented Aug 21, 2023

jenkins test api

@batrick
Copy link
Member

batrick commented Aug 21, 2023

I think there is a simpler fix here. The core problem is that StrayManager::reintegrate_stray sets the dentry state to REINTEGRATING before the rename op is even dispatched and acquired the necessary locks. Until that happens, we've simply recorded an intent (the op) without actually changing anything or acquiring any locks.
Had Server::handle_client_rename set the reintegration state after acquiring locks and checking for any pending unlinks, we would not have hit this deadlock, no?

Not sure I totally get it.

The root cause is that the reintegrate is triggered by the unlink request itself. I found a simper approach, that is skipping checking the is_unlink_pending() or always return true from is_unlink_pending() if the rename is a linkmerge request.

Yes, I understood the cause. The problem, as I see it, is that we're setting REINTEGRATING when no reintegration has actually begun. All we've done is submitted a request which could be processed immediately or 100ms from now. The state change should be effected in handle_client_rename when it's a linkmerge and after acquiring the required dentry locks.

@batrick
Copy link
Member

batrick commented Aug 21, 2023

@lxbsz let's talk more about this after standup if it's not clear. I don't want to lose track of this again.

@batrick
Copy link
Member

batrick commented Aug 21, 2023

Also, I'm talking about this code:

rdn->state_set(CDentry::STATE_REINTEGRATING);

@lxbsz
Copy link
Member Author

lxbsz commented Aug 22, 2023

Also, I'm talking about this code:

rdn->state_set(CDentry::STATE_REINTEGRATING);

@batrick I have reverted the previous commit, which introduced the STATE_REINTEGRATING. And fixed these two issues in a following commit 9d8f88d, which will always wait the previous unlink's linkmerge to finish.

Please review again. Thanks

@lxbsz
Copy link
Member Author

lxbsz commented Aug 23, 2023

jenkins test make check

@vshankar
Copy link
Contributor

Tests run in ~2 hours - https://shaman.ceph.com/builds/ceph/wip-vshankar-testing-20230828.044427/

(if teuthology behaves :)

@batrick
Copy link
Member

batrick commented Sep 9, 2023

Going to run this through testing myself too.

@vshankar
Copy link
Contributor

Going to run this through testing myself too.

I ran it last week and the (re)run are in comment #52199 (comment). Failures are known issues, so this is good for merge.

@vshankar
Copy link
Contributor

@lxbsz
Copy link
Member Author

lxbsz commented Sep 12, 2023

Going to run this through testing myself too.

I ran it last week and the (re)run are in comment #52199 (comment). Failures are known issues, so this is good for merge.

@vshankar @batrick Could we merge it now ? Thanks.

@batrick
Copy link
Member

batrick commented Sep 12, 2023

Going to run this through testing myself too.

I ran it last week and the (re)run are in comment #52199 (comment). Failures are known issues, so this is good for merge.

@vshankar @batrick Could we merge it now ? Thanks.

I hope to go through the results tomorrow. I've been busy with a customer case today.

@batrick
Copy link
Member

batrick commented Sep 12, 2023

Going to run this through testing myself too.

I ran it last week and the (re)run are in comment #52199 (comment). Failures are known issues, so this is good for merge.

@vshankar @batrick Could we merge it now ? Thanks.

I hope to go through the results tomorrow. I've been busy with a customer case today.

Here tehy are if you want to have an advance look: https://pulpito.ceph.com/pdonnell-2023-09-09_19:00:20-fs-wip-batrick-testing-20230909.145638-distro-default-smithi/

@vshankar
Copy link
Contributor

Going to run this through testing myself too.

I ran it last week and the (re)run are in comment #52199 (comment). Failures are known issues, so this is good for merge.

@vshankar @batrick Could we merge it now ? Thanks.

I hope to go through the results tomorrow. I've been busy with a customer case today.

Here tehy are if you want to have an advance look: https://pulpito.ceph.com/pdonnell-2023-09-09_19:00:20-fs-wip-batrick-testing-20230909.145638-distro-default-smithi/

I will have a look today anyway (my test runs show this change is mergeable).

@batrick
Copy link
Member

batrick commented Sep 12, 2023

2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170653+0000 mds.h (mds.1) 1 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 182.965753 secs
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 journald: cluster
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170658+0000 mds.h (mds.1) 2 : cluster [WRN] slow request 182.965752 seconds old, received at 2023-09-09T21:19:54.204855+0000: client_request(mds.1:81 rename #0x20000000441/c14f #0x61a/2000000087b caller_uid=0, caller_gid=0{}) currently acquired locks
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 journald: cluster
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170661+0000 mds.h (mds.1) 3 : cluster [WRN] slow request 182.965474 seconds old, received at 2023-09-09T21:19:54.205133+0000: client_request(client.15657:25500 unlink #0x20000000441/c14f 2023-09-09T21:19:54.204958+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) currently requesting remote authpins

From: /teuthology/pdonnell-2023-09-09_19:00:20-fs-wip-batrick-testing-20230909.145638-distro-default-smithi/7392394/remote/smithi104/log/df7acbaa-4f54-11ee-9ab7-7b867c8bd7da/ceph-mon.a.log.gz

I'll look more tomorrow but just a heads up @lxbsz

@lxbsz lxbsz requested a review from a team as a code owner September 12, 2023 02:27
@github-actions github-actions bot added the core label Sep 12, 2023
@lxbsz
Copy link
Member Author

lxbsz commented Sep 12, 2023

2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170653+0000 mds.h (mds.1) 1 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 182.965753 secs
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 journald: cluster
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170658+0000 mds.h (mds.1) 2 : cluster [WRN] slow request 182.965752 seconds old, received at 2023-09-09T21:19:54.204855+0000: client_request(mds.1:81 rename #0x20000000441/c14f #0x61a/2000000087b caller_uid=0, caller_gid=0{}) currently acquired locks
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 journald: cluster
2023-09-09T21:22:58.177+0000 7fc8f05bb700  7 mon.a@0(leader).log v588 update_from_paxos applying incremental log 588 2023-09-09T21:22:57.170661+0000 mds.h (mds.1) 3 : cluster [WRN] slow request 182.965474 seconds old, received at 2023-09-09T21:19:54.205133+0000: client_request(client.15657:25500 unlink #0x20000000441/c14f 2023-09-09T21:19:54.204958+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) currently requesting remote authpins

From: /teuthology/pdonnell-2023-09-09_19:00:20-fs-wip-batrick-testing-20230909.145638-distro-default-smithi/7392394/remote/smithi104/log/df7acbaa-4f54-11ee-9ab7-7b867c8bd7da/ceph-mon.a.log.gz

I'll look more tomorrow but just a heads up @lxbsz

[Edit] This was caused by #46331 and I also reverted the corresponding commits. Since the async dirop is disabled in downstream, and just reverting these commits will be safe, I will fix the old issue again after this.

This reverts commit 48f9a89.

Fixes: https://tracker.ceph.com/issues/61818
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This reverts commit 478db14.

Fixes: https://tracker.ceph.com/issues/61818
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This reverts commit c3b3672.

Fixes: https://tracker.ceph.com/issues/61818
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This reverts commit 417f247.

Fixes: https://tracker.ceph.com/issues/61818
Signed-off-by: Xiubo Li <xiubli@redhat.com>
…me dentries"

This reverts commit d4b9431.

Fixes: https://tracker.ceph.com/issues/61818
Signed-off-by: Xiubo Li <xiubli@redhat.com>
If one inode has more than one hardlink and after the primary dentry
is unlinked it will located the inode in the stray dir temporarily,
which is pending reintegration.

Just before the linkmerge/migrate is triggered a link request comes
it will fail with -EXDEV.

Just skip it and continue the linking.

Fixes: https://tracker.ceph.com/issues/56695
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@vshankar
Copy link
Contributor

[Edit] This was caused by #46331 and I also reverted the corresponding commits. Since the async dirop is disabled in downstream, and just reverting these commits will be safe, I will fix the old issue again after this.

I misunderstood what you said @lxbsz when you pinged me separately regarding this. I get it now (sorry - too many PRs to look at).

batrick added a commit to batrick/ceph that referenced this pull request Sep 12, 2023
* refs/pull/52199/head:
	mds: continue linking if targeti is temporarily located in stray dir
	Revert "mds: wait unlink to finish to avoid conflict when creating same dentries"
	Revert "mds: clear the STATE_UNLINKING state when the unlink fails"
	Revert "mds: wait reintegrate to finish when unlinking"
	Revert "mds: notify the waiters in replica MDSs"
	Revert "mds: wait the linkmerge/migrate to finish after unlink"
@batrick
Copy link
Member

batrick commented Sep 15, 2023

@batrick batrick merged commit 01ef9e5 into ceph:main Sep 15, 2023
@batrick
Copy link
Member

batrick commented Sep 15, 2023

@lxbsz the changes associated with this PR are a bit tangled. Can you please go through all the backports for the relevant reverts and close or update them (that they will soon be reverted when this is backported). Make sure the tracker tickets are coherent and consistent.

batrick@menzoberranzan ~/ceph$ src/script/ptl-tool.py --base main --merge-branch-name main --branch HEAD 52199
Will merge PRs: [52199]
Detaching HEAD onto base: main
Merging PR #52199
[ 5b9df823 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/56695
[ 8e240303 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ 4ec2b03f ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ b0b279ae ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ 84c7da72 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ a9dd5df2 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ https://github.com/ceph/ceph/pull/52199 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/58340
[ https://github.com/ceph/ceph/pull/52199 ] Ceph tracker cited: Fixes: https://tracker.ceph.com/issues/61818
[ https://github.com/ceph/ceph/pull/52199 ] Ceph tracker cited:   - [ ] Affects [Dashboard](https://tracker.ceph.com/projects/dashboard/issues/new), opened tracker ticket
[ https://github.com/ceph/ceph/pull/52199 ] Ceph tracker cited:   - [ ] Affects [Orchestrator](https://tracker.ceph.com/projects/orchestrator/issues/new), opened tracker ticket
[ https://github.com/ceph/ceph/pull/52199#issuecomment-1713762506 ] Ceph tracker cited: @batrick - https://tracker.ceph.com/projects/cephfs/wiki/Main#11-Sep-2023
[ https://github.com/ceph/ceph/pull/52199#issuecomment-1721527079 ] Ceph tracker cited: https://tracker.ceph.com/projects/cephfs/wiki/Main#2023-Sep-12
[ https://github.com/ceph/ceph/pull/52199#discussion_r1309203317 ] Ceph tracker cited: I'm not even sure what the original bug is from https://tracker.ceph.com/issues/58340 because that log has long been garbage collected. Of course, I'm fairly confident you figured it out Xiubo. Do you have a reliable test case to share?
[ https://github.com/ceph/ceph/pull/52199#discussion_r1309203317 ] Ceph tracker cited: https://tracker.ceph.com/issues/56695
[ https://github.com/ceph/ceph/pull/52199#discussion_r1309652906 ] Ceph tracker cited: A detail example please see https://tracker.ceph.com/issues/61818#note-2.
[ https://github.com/ceph/ceph/pull/52199#discussion_r1309652906 ] Ceph tracker cited: > I'm not even sure what the original bug is from https://tracker.ceph.com/issues/58340 because that log has long been garbage collected. Of course, I'm fairly confident you figured it out Xiubo. Do you have a reliable test case to share?
[ https://github.com/ceph/ceph/pull/52199#discussion_r1309652906 ] Ceph tracker cited: > https://tracker.ceph.com/issues/56695
Leaving HEAD detached; no branch anchors your commits

@lxbsz
Copy link
Member Author

lxbsz commented Sep 18, 2023

@lxbsz the changes associated with this PR are a bit tangled. Can you please go through all the backports for the relevant reverts and close or update them (that they will soon be reverted when this is backported). Make sure the tracker tickets are coherent and consistent.

@batrick Sure, I will do that today. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants