mds: avoid acquiring the wrlock twice for a single request by lxbsz · Pull Request #58474 · ceph/ceph

lxbsz · 2024-07-09T08:35:41Z

In case the current request has lock cache attached and then the
lock cache must have already acquired the wrlock of filelock. So
currently the path_traverse() will acquire the wrlock twice and
possibly caused deadlock by itself.

Fixes: https://tracker.ceph.com/issues/65607

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

batrick · 2024-07-15T21:10:12Z

src/mds/Server.cc

 }

 void Server::handle_conf_change(const std::set<std::string>& changed) {
+  if (changed.count("mds_lock_cache")){


This needs added to the tracked configs in MDSRankDispatcher::get_tracked_conf_keys()?

This config may be more appropriately named mds_allow_async_dirops.

This needs added to the tracked configs in MDSRankDispatcher::get_tracked_conf_keys()?

Yeah, will fix it.

This config may be more appropriately named mds_allow_async_dirops.

Sound better.

@batrick BTW, then we should also stop delegating the inode# to kclient to comply to the
name of mds_allow_async_dirops.

IMO this should be fine, because at least in mds side we will have a simple way to disable the async dirops instead of doing this in kclient mount one by one.

What do you think ?

Sounds good.

Recently was reminded of mds_client_delegate_inos_pct which can be set to 0, effectively also disabling async dirops.

It would be nice if we had a single config to control async dirops which is more clear (like a simple well-named bool). I still support having the new name. However, we shoudl observe that even without delegated inodes that the MDS may create lock caches. So perhaps this config should keep its current name, although "mds_allow_lock_caching" would be better.

Recently was reminded of mds_client_delegate_inos_pct which can be set to 0, effectively also disabling async dirops.

Yeah, correct. I remember this now.

batrick · 2024-07-15T21:11:12Z

src/mds/MDCache.cc

 	} else {
 	  // force client to flush async dir operation if necessary
-	  if (cur->filelock.is_cached())
+	  if (!mdr->lock_cache && cur->filelock.is_cached())


Should we not simply check that the filelock is not already in the lock_cache?

lxbsz · 2024-07-17T06:33:19Z

jenkins test windows

gregsfortytwo · 2024-07-30T02:45:39Z

src/mds/MDCache.cc

-	  if (cur->filelock.is_cached())
+	  if (cur->filelock.is_cached() &&
+	      !(mdr->lock_cache &&
+		static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {


Deciding if we need to take a lock by checking if it's already taken is a huge code smell. How is this safe? If it is safe, please comment with some justification.

Is there really not a way to know up front what the expected and needed state is? How do we ensure that the lock state doesn't change, if we don't own the lock?

Once a lock_cache is created the locks belong to it won't be dropped until the lock_cache is removed.

And for a lock_cache it will hold a reference for each mdr. So here it will be safe that to check whether the cur->filelock is owned by the lock_cache's or not.

src/mds/Server.cc

batrick · 2024-08-05T18:34:45Z

src/mds/MDCache.cc

+	  if (cur->filelock.is_cached() &&
+	      !(mdr->lock_cache &&
+		static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {
 	    lov.add_wrlock(&cur->filelock);
+	  }


Hmm, so it's correct that the current design of the lock caching is that we skip adding locks if held by the lock_cache.

What confuses me here is that this fix presupposes that the mdr is taking this else path. From what I can see in the log posted on the tracker, the mdr should be taking if path and not adding the filelock at all.

I do think this patch is fixing a genuine issue however. I think perhaps the right approach is:

Suggested change

if (cur->filelock.is_cached() &&

!(mdr->lock_cache &&

static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {

lov.add_wrlock(&cur->filelock);

}

if (cur->filelock.is_cached()) {

if (mdr->lock_cache)

mds->locker->put_lock_cache(mdr->lock_cache);

lov.add_wrlock(&cur->filelock);

}

Hmm, so it's correct that the current design of the lock caching is that we skip adding locks if held by the lock_cache.

What confuses me here is that this fix presupposes that the mdr is taking this else path. From what I can see in the log posted on the tracker, the mdr should be taking if path and not adding the filelock at all.

This is because the (dnl->is_null() || !want_inode) was also false for [inode 0x100244d7fcb [...2,head] /clusteradmin/weiler/deph-test/trash/.

And then went to the else path.

I do think this patch is fixing a genuine issue however. I think perhaps the right approach is:

I am afraid we will introduce other issues here.

Because in the front of the path_traverse() it will attach the lock_cache to current mdr and then skip to call try_rdlock_snap_layout():

8362 if (flags & MDS_TRAVERSE_CHECK_LOCKCACHE) 8363 mds->locker->find_and_attach_lock_cache(mdr, cur); 8364 8365 if (mdr && mdr->lock_cache) { 8366 if (flags & MDS_TRAVERSE_WANT_DIRLAYOUT) 8367 mdr->dir_layout = mdr->lock_cache->get_dir_layout(); 8368 } else if (rdlock_snap) { 8369 int n = (flags & MDS_TRAVERSE_RDLOCK_SNAP2) ? 1 : 0; 8370 if ((n == 0 && !(mdr->locking_state & MutationImpl::SNAP_LOCKED)) || 8371 (n == 1 && !(mdr->locking_state & MutationImpl::SNAP2_LOCKED))) { 8372 bool want_layout = (flags & MDS_TRAVERSE_WANT_DIRLAYOUT); 8373 if (!mds->locker->try_rdlock_snap_layout(cur, mdr, n, want_layout)) 8374 return 1; 8375 } 8376 }

If we just drop the lock_cache here we should to do the above beforehand. Or retry the path_traverse().

And also I can be sure that the lock_cache feature will be disabled always, because we need to drop the lock_cache always.

github-actions · 2024-08-27T17:55:24Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

The lock cache is buggy and we need to disable it as a workaround. Fixes: https://tracker.ceph.com/issues/65607 Signed-off-by: Xiubo Li <xiubli@redhat.com>

In case the current request has lock cache attached and then the lock cache must have already acquired the wrlock of filelock. So currently the path_traverse() will acquire the wrlock twice and possibly caused deadlock by itself. Fixes: https://tracker.ceph.com/issues/65607 Signed-off-by: Xiubo Li <xiubli@redhat.com>

lxbsz · 2024-08-28T07:15:28Z

Nothing changed, just rebased it and resolved the conflicts.

vshankar · 2024-09-30T12:18:46Z

jenkins retest this please

Sunnatillo · 2024-11-22T09:37:12Z

Hi @lxbsz @vshankar @batrick.
We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

smoshiur1237 · 2024-11-27T11:23:15Z

Hi @lxbsz @vshankar @batrick. We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

We can confirm that PR is resolving the mds locking issue. is there any plan to take this changes in?

vshankar · 2024-11-27T12:20:51Z

Hi @lxbsz @vshankar @batrick. We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

We can confirm that PR is resolving the mds locking issue. is there any plan to take this changes in?

This has to be reviewed and tested again before it can be merged and backported. We will try to get it included soon...

vshankar · 2024-11-27T12:43:26Z

cc @kotreshhr for review (@mchangir please keep an eye on this for QA testing once the change has been reviewed and approved).

kashifest · 2024-12-04T07:37:37Z

@vshankar @kotreshhr can we take this forward? This issue is critical for us.

vshankar · 2024-12-04T07:52:27Z

@vshankar @kotreshhr can we take this forward? This issue is critical for us.

Bumping this up @kotreshhr - PTAL

erichweiler · 2024-12-10T00:48:20Z

@vshankar @kotreshhr I'd like to add that this is a critical issue for us as well, and am excited to see this fix merged!

erichweiler · 2025-01-06T18:43:21Z

@vshankar @kotreshhr just a bump on this item!

Sunnatillo · 2025-01-07T09:05:27Z

I opened new PR according to our discussion on cephfs standup as author has shifted focus from this project.

I am new to ceph community, all comments and suggestions for improvements are welcome.

New PR is here

vshankar · 2025-01-07T09:34:47Z

I opened new PR according to our discussion on cephfs standup as author has shifted focus from this project.

I am new to ceph community, all comments and suggestions for improvements are welcome.

New PR is here

Thanks for the patch @Sunnatillo - will have it reviewed.

vshankar · 2025-01-13T13:48:02Z

Superseded by #61250

lxbsz requested a review from a team July 9, 2024 08:35

github-actions bot added cephfs Ceph File System common labels Jul 9, 2024

batrick requested changes Jul 15, 2024

View reviewed changes

lxbsz force-pushed the wip-65607 branch from 940f3fc to 4d5a9ed Compare July 16, 2024 02:50

batrick approved these changes Jul 16, 2024

View reviewed changes

lxbsz force-pushed the wip-65607 branch 2 times, most recently from d49a096 to 2ade6e2 Compare July 17, 2024 01:51

gregsfortytwo reviewed Jul 30, 2024

View reviewed changes

batrick requested changes Aug 5, 2024

View reviewed changes

lxbsz force-pushed the wip-65607 branch from 2ade6e2 to e2d44fd Compare August 7, 2024 00:36

github-actions bot added the needs-rebase label Aug 27, 2024

lxbsz added 2 commits August 28, 2024 15:14

mds: add 'mds_allow_async_dirops' opt to allow/disable async dirop

8c0e606

The lock cache is buggy and we need to disable it as a workaround. Fixes: https://tracker.ceph.com/issues/65607 Signed-off-by: Xiubo Li <xiubli@redhat.com>

lxbsz force-pushed the wip-65607 branch from e2d44fd to 7d936d7 Compare August 28, 2024 07:14

github-actions bot removed the needs-rebase label Aug 28, 2024

vshankar self-assigned this Sep 30, 2024

vshankar assigned batrick Oct 22, 2024

vshankar requested review from a team and batrick November 27, 2024 12:42

vshankar assigned kotreshhr Nov 27, 2024

Sunnatillo mentioned this pull request Jan 7, 2025

mds: prevent duplicate wrlock acquisition for a single request #61250

Merged

14 tasks

vshankar closed this Jan 13, 2025

Conversation

lxbsz commented Jul 9, 2024

Contribution Guidelines

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lxbsz Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lxbsz commented Jul 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 27, 2024

Uh oh!

lxbsz commented Aug 28, 2024

Uh oh!

vshankar commented Sep 30, 2024

Uh oh!

Sunnatillo commented Nov 22, 2024

Uh oh!

smoshiur1237 commented Nov 27, 2024

Uh oh!

vshankar commented Nov 27, 2024

Uh oh!

vshankar commented Nov 27, 2024

Uh oh!

kashifest commented Dec 4, 2024

Uh oh!

vshankar commented Dec 4, 2024

Uh oh!

erichweiler commented Dec 10, 2024

Uh oh!

erichweiler commented Jan 6, 2025

Uh oh!

Sunnatillo commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vshankar commented Jan 7, 2025

Uh oh!

vshankar commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

lxbsz Jul 16, 2024 •

edited

Loading

Sunnatillo commented Jan 7, 2025 •

edited

Loading