Skip to content

mds: avoid acquiring the wrlock twice for a single request#58474

Closed
lxbsz wants to merge 2 commits intoceph:mainfrom
lxbsz:wip-65607
Closed

mds: avoid acquiring the wrlock twice for a single request#58474
lxbsz wants to merge 2 commits intoceph:mainfrom
lxbsz:wip-65607

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Jul 9, 2024

In case the current request has lock cache attached and then the
lock cache must have already acquired the wrlock of filelock. So
currently the path_traverse() will acquire the wrlock twice and
possibly caused deadlock by itself.

Fixes: https://tracker.ceph.com/issues/65607

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@lxbsz lxbsz requested a review from a team July 9, 2024 08:35
@github-actions github-actions bot added cephfs Ceph File System common labels Jul 9, 2024
}

void Server::handle_conf_change(const std::set<std::string>& changed) {
if (changed.count("mds_lock_cache")){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs added to the tracked configs in MDSRankDispatcher::get_tracked_conf_keys()?

This config may be more appropriately named mds_allow_async_dirops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs added to the tracked configs in MDSRankDispatcher::get_tracked_conf_keys()?

Yeah, will fix it.

This config may be more appropriately named mds_allow_async_dirops.

Sound better.

Copy link
Member Author

@lxbsz lxbsz Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick BTW, then we should also stop delegating the inode# to kclient to comply to the
name of mds_allow_async_dirops.

IMO this should be fine, because at least in mds side we will have a simple way to disable the async dirops instead of doing this in kclient mount one by one.

What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently was reminded of mds_client_delegate_inos_pct which can be set to 0, effectively also disabling async dirops.

It would be nice if we had a single config to control async dirops which is more clear (like a simple well-named bool). I still support having the new name. However, we shoudl observe that even without delegated inodes that the MDS may create lock caches. So perhaps this config should keep its current name, although "mds_allow_lock_caching" would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently was reminded of mds_client_delegate_inos_pct which can be set to 0, effectively also disabling async dirops.

Yeah, correct. I remember this now.

} else {
// force client to flush async dir operation if necessary
if (cur->filelock.is_cached())
if (!mdr->lock_cache && cur->filelock.is_cached())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not simply check that the filelock is not already in the lock_cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@lxbsz lxbsz force-pushed the wip-65607 branch 2 times, most recently from d49a096 to 2ade6e2 Compare July 17, 2024 01:51
@lxbsz
Copy link
Member Author

lxbsz commented Jul 17, 2024

jenkins test windows

if (cur->filelock.is_cached())
if (cur->filelock.is_cached() &&
!(mdr->lock_cache &&
static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deciding if we need to take a lock by checking if it's already taken is a huge code smell. How is this safe? If it is safe, please comment with some justification.

Is there really not a way to know up front what the expected and needed state is? How do we ensure that the lock state doesn't change, if we don't own the lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a lock_cache is created the locks belong to it won't be dropped until the lock_cache is removed.

And for a lock_cache it will hold a reference for each mdr. So here it will be safe that to check whether the cur->filelock is owned by the lock_cache's or not.

Comment on lines +8518 to +8526
if (cur->filelock.is_cached() &&
!(mdr->lock_cache &&
static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {
lov.add_wrlock(&cur->filelock);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so it's correct that the current design of the lock caching is that we skip adding locks if held by the lock_cache.

What confuses me here is that this fix presupposes that the mdr is taking this else path. From what I can see in the log posted on the tracker, the mdr should be taking if path and not adding the filelock at all.

I do think this patch is fixing a genuine issue however. I think perhaps the right approach is:

Suggested change
if (cur->filelock.is_cached() &&
!(mdr->lock_cache &&
static_cast<const MutationImpl*>(mdr->lock_cache)->is_wrlocked(&cur->filelock))) {
lov.add_wrlock(&cur->filelock);
}
if (cur->filelock.is_cached()) {
if (mdr->lock_cache)
mds->locker->put_lock_cache(mdr->lock_cache);
lov.add_wrlock(&cur->filelock);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so it's correct that the current design of the lock caching is that we skip adding locks if held by the lock_cache.

What confuses me here is that this fix presupposes that the mdr is taking this else path. From what I can see in the log posted on the tracker, the mdr should be taking if path and not adding the filelock at all.

This is because the (dnl->is_null() || !want_inode) was also false for [inode 0x100244d7fcb [...2,head] /clusteradmin/weiler/deph-test/trash/.

And then went to the else path.

I do think this patch is fixing a genuine issue however. I think perhaps the right approach is:

I am afraid we will introduce other issues here.

Because in the front of the path_traverse() it will attach the lock_cache to current mdr and then skip to call try_rdlock_snap_layout():

 8362   if (flags & MDS_TRAVERSE_CHECK_LOCKCACHE)
 8363     mds->locker->find_and_attach_lock_cache(mdr, cur);                                                                                                                                                                                                  
 8364  
 8365   if (mdr && mdr->lock_cache) {
 8366     if (flags & MDS_TRAVERSE_WANT_DIRLAYOUT)
 8367       mdr->dir_layout = mdr->lock_cache->get_dir_layout();
 8368   } else if (rdlock_snap) {
 8369     int n = (flags & MDS_TRAVERSE_RDLOCK_SNAP2) ? 1 : 0; 
 8370     if ((n == 0 && !(mdr->locking_state & MutationImpl::SNAP_LOCKED)) ||
 8371         (n == 1 && !(mdr->locking_state & MutationImpl::SNAP2_LOCKED))) {
 8372       bool want_layout = (flags & MDS_TRAVERSE_WANT_DIRLAYOUT);
 8373       if (!mds->locker->try_rdlock_snap_layout(cur, mdr, n, want_layout))
 8374         return 1;
 8375     }
 8376   }

If we just drop the lock_cache here we should to do the above beforehand. Or retry the path_traverse().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also I can be sure that the lock_cache feature will be disabled always, because we need to drop the lock_cache always.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

lxbsz added 2 commits August 28, 2024 15:14
The lock cache is buggy and we need to disable it as a workaround.

Fixes: https://tracker.ceph.com/issues/65607
Signed-off-by: Xiubo Li <xiubli@redhat.com>
In case the current request has lock cache attached and then the
lock cache must have already acquired the wrlock of filelock. So
currently the path_traverse() will acquire the wrlock twice and
possibly caused deadlock by itself.

Fixes: https://tracker.ceph.com/issues/65607
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Aug 28, 2024

Nothing changed, just rebased it and resolved the conflicts.

@vshankar
Copy link
Contributor

jenkins retest this please

@vshankar vshankar self-assigned this Sep 30, 2024
@Sunnatillo
Copy link
Contributor

Hi @lxbsz @vshankar @batrick.
We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

@smoshiur1237
Copy link

Hi @lxbsz @vshankar @batrick. We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

We can confirm that PR is resolving the mds locking issue. is there any plan to take this changes in?

@vshankar
Copy link
Contributor

Hi @lxbsz @vshankar @batrick. We tested this PR on our side and it is fixing the problem we are facing. I see progress of this PR bit slowed down. What would be the next step?

I also commented our findings on ceph tracker issue.

We can confirm that PR is resolving the mds locking issue. is there any plan to take this changes in?

This has to be reviewed and tested again before it can be merged and backported. We will try to get it included soon...

@vshankar vshankar requested review from a team and batrick November 27, 2024 12:42
@vshankar
Copy link
Contributor

cc @kotreshhr for review (@mchangir please keep an eye on this for QA testing once the change has been reviewed and approved).

@kashifest
Copy link

@vshankar @kotreshhr can we take this forward? This issue is critical for us.

@vshankar
Copy link
Contributor

vshankar commented Dec 4, 2024

@vshankar @kotreshhr can we take this forward? This issue is critical for us.

Bumping this up @kotreshhr - PTAL

@erichweiler
Copy link

@vshankar @kotreshhr I'd like to add that this is a critical issue for us as well, and am excited to see this fix merged!

@erichweiler
Copy link

@vshankar @kotreshhr just a bump on this item!

@Sunnatillo
Copy link
Contributor

Sunnatillo commented Jan 7, 2025

I opened new PR according to our discussion on cephfs standup as author has shifted focus from this project.

I am new to ceph community, all comments and suggestions for improvements are welcome.

New PR is here

@vshankar
Copy link
Contributor

vshankar commented Jan 7, 2025

I opened new PR according to our discussion on cephfs standup as author has shifted focus from this project.

I am new to ceph community, all comments and suggestions for improvements are welcome.

New PR is here

Thanks for the patch @Sunnatillo - will have it reviewed.

@vshankar
Copy link
Contributor

Superseded by #61250

@vshankar vshankar closed this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System common

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants