Skip to content

client: wait rename to finish#40787

Merged
rishabh-d-dave merged 1 commit intoceph:mainfrom
lxbsz:rename
Jul 13, 2023
Merged

client: wait rename to finish#40787
rishabh-d-dave merged 1 commit intoceph:mainfrom
lxbsz:rename

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Apr 11, 2021

In rare case during the rename if another thread tries to lookup the
dst dentry, it may get an inconsistent result that both src dentry
and dst dentry will link to the same inode at the same time.

Will wait the rename to finish and try it again.

Fixes: https://tracker.ceph.com/issues/49912
Signed-off-by: Xiubo Li xiubli@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@lxbsz lxbsz requested review from batrick and jtlayton April 11, 2021 04:30
@github-actions github-actions bot added the cephfs Ceph File System label Apr 11, 2021
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful job tracking this down @lxbsz ! Do you have a test case for this? I realize it's racy but you could use a config to help trigger the race with barriers (similar to what we do with mds_inject_migrator_session_race)?

@lxbsz
Copy link
Member Author

lxbsz commented Apr 12, 2021

Wonderful job tracking this down @lxbsz ! Do you have a test case for this? I realize it's racy but you could use a config to help trigger the race with barriers (similar to what we do with mds_inject_migrator_session_race)?

Not yet and I will add it.

@lxbsz
Copy link
Member Author

lxbsz commented Apr 12, 2021

 1988 void Client::encode_dentry_release(Dentry *dn, MetaRequest *req,
 1989                            mds_rank_t mds, int drop, int unless)
 1990 {
 1991   ldout(cct, 20) << __func__ << " enter(dn:"
 1992            << *dn << ")" << dendl;
 1993   int released = 0;
 1994   if (dn->dir)
 1995     released = encode_inode_release(dn->dir->parent_inode, req,
 1996                                     mds, drop, unless, 1);
 1997   if (released && dn->lease_mds == mds) {
 1998     ldout(cct, 25) << "preemptively releasing dn to mds" << dendl;
 1999     auto& rel = req->cap_releases.back();
 2000     rel.item.dname_len = dn->name.length();
 2001     rel.item.dname_seq = dn->lease_seq;
 2002     rel.dname = dn->name;
 2003     dn->lease_mds = -1;
 2004   }
 2005   ldout(cct, 25) << __func__ << " exit(dn:"
 2006            << *dn << ")" << dendl;
 2007 }

When doing the rename and in some cases, such as if the client_use_random_mds option is enabled, and if the replies of the rename operation are delayed for some reasons, such as delayed in the network route, then the Line#2003 above won't be executed and the _lookup(new dentry) will succeed. In this case both the new and old dentries will link the same inode at the same time.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link
Contributor

@jtlayton jtlayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lxbsz
Copy link
Member Author

lxbsz commented Apr 13, 2021

Rebased it by resolving the conflict in commom/options.cc

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

ldout(cct, 1) << __func__ << " dir " << *dir
<< " rename is on the way, will wait for dn '"
<< dname << "'" << dendl;
wait_on_list(waiting_for_rename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What protects this condition and wait? It looks like the client_lock. Will that still be the case after your work to break up the client_lock? I assume we'll want to avoid taking that lock in lookup if we can help it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently I am using the client_lock for this and queue the waiters in a global list in the Client instead of each Dentry.

And after the inode lock feature is finished or in that patch set I will switch this to inode lock instead.

@lxbsz
Copy link
Member Author

lxbsz commented Apr 15, 2021

jenkins test api

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions github-actions bot added the stale label Jan 17, 2023
@vshankar vshankar added the wip-vshankar-backlog Backlog of CephFS PRs to pick for testing label Jan 25, 2023
@github-actions github-actions bot removed the stale label Jan 25, 2023
@lxbsz
Copy link
Member Author

lxbsz commented Jan 31, 2023

@lxbsz I'm seeing test failures relating to rename which makes me think if it has do with this change. Sample

Sorry @vshankar , I missed this comment.

Currently I couldn't login the sepia node and couldn't checked the logs in detail, but from the teuthology.logs they looks like a similar issue with this.

I will check it more after I can login to the node.

@rzarzynski
Copy link
Contributor

jenkins test make check

@rishabh-d-dave
Copy link
Contributor

@lxbsz Should I put this through QA? cc @vshankar

@lxbsz
Copy link
Member Author

lxbsz commented Apr 3, 2023

@lxbsz Should I put this through QA? cc @vshankar

Sure, please. Thanks.

@rishabh-d-dave rishabh-d-dave added wip-rishabh-testing Rishabh's testing label and removed wip-vshankar-backlog Backlog of CephFS PRs to pick for testing labels Apr 4, 2023
@github-actions
Copy link

github-actions bot commented Jun 3, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

In rare case during the rename if another thread tries to lookup the
dst dentry, it may get an inconsistent result that both src dentry
and dst dentry will link to the same inode at the same time.

Will wait the rename to finish and try it again.

Fixes: https://tracker.ceph.com/issues/49912
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Jul 4, 2023

Updated the PR to fix one bug, which is when the rename request fails it will miss waking up the waiters:

11fb16d#diff-7a3052fe46aebfed0382c9d0bb9880ea1328add824e0b10c5d551ddfee282cd1R14464-R14469

Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this comment just for record. This PR failed integration testing but Xiubo has fixed at least of one of the failures. I'll wait for rest of the issue(s) to be fixed before re-testing this PR.

@rishabh-d-dave rishabh-d-dave removed needs-qa wip-rishabh-testing Rishabh's testing label labels Jul 5, 2023
@rishabh-d-dave
Copy link
Contributor

@lxbsz Please let me know when this PR is ready. As per our chat one failure needs to be fixed.

@lxbsz
Copy link
Member Author

lxbsz commented Jul 6, 2023

@lxbsz Please let me know when this PR is ready. As per our chat one failure needs to be fixed.

Sure, I have gone through most failures, only the pjd ones are related to this PR, all the others are known issues. I will ping you after I finish that.

Thanks.

@rishabh-d-dave rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label Jul 8, 2023
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh-d-dave rishabh-d-dave dismissed batrick’s stale review July 13, 2023 14:37

PR author has fixed the issue that was reported for this PR.

@rishabh-d-dave
Copy link
Contributor

jenkins test docs

@rishabh-d-dave
Copy link
Contributor

This PR is ready for merge, I'll merge it as soon as this CI job passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System common tests wip-rishabh-testing Rishabh's testing label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants