Skip to content

mds: wait reintegrate to finish when unlinking#49553

Merged
batrick merged 3 commits intoceph:mainfrom
lxbsz:wip-58340
May 22, 2023
Merged

mds: wait reintegrate to finish when unlinking#49553
batrick merged 3 commits intoceph:mainfrom
lxbsz:wip-58340

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Dec 23, 2022

Just before unlinking one dentry and another MDS triggerred a
reintegrate_stray request, which will finally be a RENAME client
request to current MDS. The rename and unlink requests will deadlock.

This fix will block the unlink the request if the reintegrate_stray
is going for current dentry.

Fixes: https://tracker.ceph.com/issues/58340
Fixes: https://tracker.ceph.com/issues/59657
Signed-off-by: Xiubo Li xiubli@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@vshankar
Copy link
Contributor

vshankar commented Jan 9, 2023

jenkins test api

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good but could you explain in the commit message what's the actual deadlock that's being hit with out the fix ?

@lxbsz
Copy link
Member Author

lxbsz commented Jan 11, 2023

1, In one MDS when evaling the dentry it will send a reintegrate rename request to itself.
2, The MDS will handle the reintegrate rename request and then find this dentry is a remote dentry and which the CInode is linked in the stray in another MDS:

mds.2.locker req remote auth_pin of [inode 0x20000000515 [...2,head] ~mds1/stray0/20000000515 ...

And then the reintegrate rename request was added into a waiter list.

And in mds1 the auth pin will be frozen, so no other new requests could get the authpin later.

3, One client sends a unlink request to the same dentry, but it will do the remote authpin:

mds.2.locker requesting remote auth_pins from mds.1

And then the unlink request is added into the waiter list too, but before that this dentry's state was set to STATE_UNLINKING.

Since in Step2 the authpin for dentry in mds1 was already frozen, so the unlink will wait for the reintegrate rename to finish.

4, Then the reintegrate rename request is retried and then finds that the dentry is being unlinked and then will be added another wait list.

5, So the unlink request and reintegrate rename request will be deadlock.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, test run https://pulpito.ceph.com/vshankar-2023-03-08_15:12:36-fs-wip-vshankar-testing-20230308.112059-testing-default-smithi/7197021/ has the same (fsstress.sh) hanging, but its only running a single active mds.

class ScrubStack;
class C_ExecAndReply;

struct MDSMetaRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really required? Cannot the MClientRequest (for the rename op to the peer mds) be enough for tracking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Client.cc it will use the MetaRequest {} to track the inflight requests in each session. While the MClientRequest will only be used to do the real request to MDS, we cannot tracker the request with this.

This is why I added MDSMetaRequest in MDS side.

}

if (client_inst.name.is_mds() && reply->get_op() == CEPH_MDS_OP_RENAME) {
mds->send_message(reply, mdr->client_request->get_connection());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. the reintegration (rename) operation was never replied back to the peer (request initiator) mds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

@lxbsz
Copy link
Member Author

lxbsz commented Mar 14, 2023

FWIW, test run https://pulpito.ceph.com/vshankar-2023-03-08_15:12:36-fs-wip-vshankar-testing-20230308.112059-testing-default-smithi/7197021/ has the same (fsstress.sh) hanging, but its only running a single active mds.

This time the fsstress.sh test passed. The mds log files are filled full with getattr Fsr #0x10000005a52 from client.24351

This should be another different issue :

2023-03-08T17:03:50.413+0000 7f34ba077700  1 -- [v2:172.21.15.39:6826/722455264,v1:172.21.15.39:6827/722455264] <== client.24351 v1:192.168.0.1:0/3921107847 12428771 ==== client_request(client.24351:13291264 getattr Fsr #0x10000005a52 2023-03-08T17:03:50.413624+0000 caller_uid=1000, caller_gid=1268{6,36,1000,1268,}) v6 ==== 180+0+0 (unknown 3839491394 0 0) 0x561d523b1b00 con 0x561d524a2400
2023-03-08T17:03:50.413+0000 7f34ba077700  4 mds.0.server handle_client_request client_request(client.24351:13291264 getattr Fsr #0x10000005a52 2023-03-08T17:03:50.413624+0000 caller_uid=1000, caller_gid=1268{6,36,1000,1268,}) v6
...

The kernel have the same issue with https://tracker.ceph.com/issues/58564, IMO it should be caused by the deadlock issue from kclient.

@lxbsz
Copy link
Member Author

lxbsz commented Mar 14, 2023

jenkins retest this please

@lxbsz
Copy link
Member Author

lxbsz commented Mar 14, 2023

Rebased it.

@lxbsz
Copy link
Member Author

lxbsz commented Apr 21, 2023

Rebased it and updated the commit comments.

@vshankar
Copy link
Contributor

jenkins test make check

@vshankar
Copy link
Contributor

jenkins test api

@vshankar
Copy link
Contributor

jenkins test windows

@vshankar
Copy link
Contributor

vshankar commented May 8, 2023

@vshankar
Copy link
Contributor

vshankar commented May 9, 2023

@vshankar
Copy link
Contributor

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220

Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

I will check it today.

lxbsz added 3 commits May 18, 2023 13:28
Fixes: https://tracker.ceph.com/issues/58340
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will allow the MDS could handle the MClientReply requests.

Fixes: https://tracker.ceph.com/issues/58340
Signed-off-by: Xiubo Li <xiubli@redhat.com>
When unlinking a remote dentry while primary dentry has been just
unlinked and was trying to reintegrate, which will finally be a
RENAME client request to current MDS, the stray dentry.

If the RENAME request is sleeping and waiting for some resources
just after authpin the CInode, so the unlinking request will mark
the remote dentry as UNLINKING state and then add itself to the
wait list just waiting for the CInode to be authpined.

Then after the RENAME request is retried later it will add to a
wait list again and waiting for the UNLINKING state to be cleared.
And finally the RENAME and UNLINK requests will be deadlock.

This fix will block the UNLINK request if the reintegrate_stray
is going on for current dentry.

Fixes: https://tracker.ceph.com/issues/58340
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

The failure was introduced by the new commit cdb2275. I will remove it for now. This is a separated issue and will fix it in a later PR.

@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

The failure was introduced by the new commit cdb2275. I will remove it for now. This is a separated issue and will fix it in a later PR.

Sorry, I misreaded the logs. It should be a new issue. Such as for https://pulpito.ceph.com/pdonnell-2023-05-11_23:23:58-fs-wip-pdonnell-testing-20230511.185220-distro-default-smithi/7272082/, there have some slow requests, but the slow requests were caused by the recursive stray reintergrating.

2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.913131 seconds old, received at 2023-05-12T05:29:25.734362+0000: client_request(mds.4:110 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently acquired locks
2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.910954 seconds old, received at 2023-05-12T05:29:25.736540+0000: client_request(client.4836:17788 unlink #0x20000000f31/l118 2023-05-12T05:29:25.726057+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) currently acquired locks
2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.426230 seconds old, received at 2023-05-12T05:29:26.221263+0000: client_request(mds.4:111 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-12T05:53:27.648+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.311113 seconds old, received at 2023-05-12T05:29:26.336380+0000: client_request(mds.4:112 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-12T05:53:27.648+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.212995 seconds old, received at 2023-05-12T05:29:26.434498+0000: client_request(mds.4:113 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched

Note: the above unlink request is okay and it release all the resources already. And in another failure there has no unlink request and also has the same issue, please see https://pulpito.ceph.com/pdonnell-2023-05-15_12:34:27-fs-wip-pdonnell-testing-20230511.185220-distro-default-smithi/7273788/:

2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 720.504186 seconds old, received at 2023-05-15T13:50:55.136671+0000: client_request(mds.3:74 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 722.411774 seconds old, received at 2023-05-15T13:50:53.229083+0000: client_request(mds.3:71 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently acquired locks
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 720.318309 seconds old, received at 2023-05-15T13:50:55.322548+0000: client_request(mds.0:531 rename #0x625/20000000d04 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 721.572030 seconds old, received at 2023-05-15T13:50:54.068828+0000: client_request(mds.3:72 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 721.089038 seconds old, received at 2023-05-15T13:50:54.551819+0000: client_request(mds.3:73 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched

When the first stray dn reintegrating is triggered and when received the rename request from the same MDS, and after the rdlock_two_paths_xlock_destdn() succeeded, it will try to call:

 8818   CInode *oldin = 0;
 8819   if (!destdnl->is_null()) {
 8820     //dout(10) << "dest dn exists " << *destdn << dendl;
 8821     oldin = mdcache->get_dentry_inode(destdn, mdr, true);
 8822     if (!oldin) return;
 8823     dout(10) << " oldin " << *oldin << dendl;
 8824 

And in mdcache->get_dentry_inode() it will trigger another stray dn reinstegrating, which is the same with previous one.

Checked the logs I found that:

2023-05-12T05:29:26.219+0000 13511700  7 mds.4.cache get_dentry_inode linking in remote in [inode 0x20000000fb7 [...2,head] ~mds1/stray6/20000000fb7 rep@1.2 symlink='xxxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/x' v1508 snaprealm=0xc85d370 s=1702 n(v0 rc2023-05-12T05:28:58.908577+0000 b1702 1=1+0) (iversion lock) caps={4836=p/-@1} | request=1 lock=0 caps=1 remoteparent=1 0x1bc115f0]
2023-05-12T05:29:26.219+0000 13511700 10 mds.4.cache.strays eval_remote [dentry #0x1/client.0/tmp/tmp.SEF9eFwF43/p8/d1/d33/d55/l118 [2,head] auth REMOTE(lnk) (dn xlock x=1 by 0x1bcda410) (dversion lock w=1) v=474 ap=2 ino=0x20000000fb7 state=1610612992|reintegrating | request=2 lock=2 purging=2 dirty=1 waiter=1 authpin=1 0x1c967560]
2023-05-12T05:29:26.219+0000 13511700 20 mds.4.cache.strays _eval_stray_remote [dentry #0x101/stray6/20000000fb7 [2,head] rep@1.1 (dn lock x=1 by 0x1bcda410) (dversion lock) v=0 ino=0x20000000fb7 state=64|bottomlru | request=1 lock=1 inodepin=1 0xc718fb0]

The dstdn, which is [dentry #0x1/client.0/tmp/tmp.SEF9eFwF43/p8/d1/d33/d55/l118 is NOT primary. And then it will trigger a new stray dn reintegrating.

This should happen:

1, create a fileA. <== This should be the primary ?
2, hard link fileB to fileA.
3, remove fileA and then it will be moved to the straydnA.
4, trigger reintegrate straydnA to fileB.
5, in the rename code it will call the mdcache->get_dentry_inode(dn for fileB)
6, since the dn for fileB is not primary then it will trigger a new reintegrate.

I still have no idea why we couldn't see this before ? How this is related to this PR ?

@vshankar Could you have a double check ? Thanks

@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

...

This should happen:

1, create a fileA. <== This should be the primary ? 2, hard link fileB to fileA. 3, remove fileA and then it will be moved to the straydnA. 4, trigger reintegrate straydnA to fileB. 5, in the rename code it will call the mdcache->get_dentry_inode(dn for fileB) 6, since the dn for fileB is not primary then it will trigger a new reintegrate.

I still have no idea why we couldn't see this before ? How this is related to this PR ?

@vshankar Could you have a double check ? Thanks

Checked this again and this should be a new issue:

In the multiple MDS case and if a dentry in mds.1 has a hard link in mds.4. So after the primary denty in mds.1 is removed the mds.1 will trigger a migrate_stray to migrate the straydn to mds.4. This is why we can see something like:

2023-05-12T05:29:26.221+0000 13511700 20 mds.4.locker  must xlock (ilink sync) [inode 0x20000000fb7 [...2,head] ~mds1/stray6/20000000fb7 rep@1.2 

The mds.4 will have mds.1's stray direcotries.

And then in mds.4 it will trigger a reintegrate_stray to reintegrate the ~mds1/stray6/20000000fb7 to the hard link dentry 0x20000000f31/l118, which will send a rename request to mds.4 itself.

But currently the dentry of ~mds1/stray6/20000000fb7 will be the primary dentry for the inode 0x20000000fb7. So in the handle_client_rename(destdn = 0x20000000f31/l118, srcdn = ~mds1/stray6/20000000fb7) --> mdcache->get_dentry_inode(destdn, mdr, true); --> destdn->link_remote(dnl, in); --> StrayManager::eval_remote(destdn):

589 void StrayManager::eval_remote(CDentry *remote_dn)
590 {  
...
607   // refers to stray?
608   CDentry *primary_dn = in->get_projected_parent_dn();
609   ceph_assert(primary_dn != NULL);
610   if (primary_dn->get_dir()->get_inode()->is_stray()) {
611     _eval_stray_remote(primary_dn, remote_dn);                                                                                                                                                                                     
612   } else {
613     dout(20) << __func__ << ": inode's primary dn not stray" << dendl;
614   }
615 }  

In Line#680 the primary_dn will be the dentry of ~mds1/stray6/20000000fb7 and the mds.4 is the auth, then in:

631 void StrayManager::_eval_stray_remote(CDentry *stray_dn, CDentry *remote_dn)
632 {
...
664   if (!remote_dn->is_projected()) {
665     if (remote_dn->is_auth()) {
666       if (remote_dn->dir->can_auth_pin()) {
667         reintegrate_stray(stray_dn, remote_dn);
668       } else {
669         remote_dn->dir->add_waiter(CDir::WAIT_UNFREEZE, new C_RetryEvalRemote(this, remote_dn));
670         dout(20) << __func__ << ": not reintegrating (can't authpin remote parent)" << dendl;
671       }
672 
...

It will trigger a new reintegrate_stray() again recursively.

While this PR doesn't change this logic.

@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

It should be a similar issue with cdb2275, the last commit. And it didn't fix all cases. I just remove it from this PR for now.

And will fix it later in one separate PR.

batrick added a commit to batrick/ceph that referenced this pull request May 18, 2023
* refs/pull/49553/head:
	mds: wait migrate to finish when linking
	mds: wait reintegrate to finish when unlinking
	message: make MClientReply inherit MMDSOp
	mds: notify the waiters in replica MDSs
@batrick
Copy link
Member

batrick commented May 19, 2023

@vshankar
Copy link
Contributor

It should be a similar issue with cdb2275, the last commit. And it didn't fix all cases. I just remove it from this PR for now.

And will fix it later in one separate PR.

Looking into this now @lxbsz

(Sorry for the delay)

@vshankar
Copy link
Contributor

jenkins test make check

@vshankar
Copy link
Contributor

jenkins test make check arm64

batrick added a commit to batrick/ceph that referenced this pull request May 20, 2023
* refs/pull/49553/head:
	mds: wait reintegrate to finish when unlinking
	message: make MClientReply inherit MMDSOp
	mds: notify the waiters in replica MDSs
@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

@lxbsz fyi, I included your PR in this run: https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/

Thanks @batrick, there have 2 fsstress test failures, but they are all not relevant and were caused by tracker#61148.

The kernel_untar_build.sh was a known issue the same with tracker#59342.

For the other failures I didn't check them all one by one, but mostly also caused by tracker#61148, such as the following pjd test failure:

https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/7279676/

Thanks

@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

jenkins test make check

1 similar comment
@batrick
Copy link
Member

batrick commented May 22, 2023

jenkins test make check

@vshankar
Copy link
Contributor

@lxbsz fyi, I included your PR in this run: https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/

@batrick @lxbsz please go ahead and merge the PRs if the test runs a fine.

@batrick
Copy link
Member

batrick commented May 22, 2023

jenkins test make check

@vshankar
Copy link
Contributor

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

@lxbsz
Copy link
Member Author

lxbsz commented Jun 23, 2023

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

@vshankar

I just checked one failure, it's not the same issue:

https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

2023-06-15T06:36:24.325+0000 7fde92313700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 181.032430 secs
2023-06-15T06:36:24.325+0000 7fde92313700  0 log_channel(cluster) log [WRN] : slow request 181.032429 seconds old, received at 2023-06-15T06:33:23.298087+0000: client_request(client.4941:17403 setattr size=1406446 #0x3000000025b 2023-06-15T06:33:23.297873+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) currently failed to xlock, waiting

The setattr request was waiting the xlock. I haven't gone through the log in detail yet.

[EDIT]

The client has dropped the Fb caps:

2023-06-15T06:33:23.297+0000 7fbdd3fff700 10 client.4941 send_cap 0x3000000025b.head(faked_ino=0 nref=20 ll_ref=23 cap_refs={4=0,1024=0,2048=0,4096=0,8192=0} open={1=0,2=0} mode=100666 size=5984600/12582912 nlink=2 btime=2023-06-15T06:33:01.855043+0000 mtime=2023-06-15T06:33:20.586624+0000 ctime=2023-06-15T06:33:20.586624+0000 change_attr=136 caps=pAsLsXsFcb(2=pAsLsXsFcb) objectset[0x3000000025b ts 0/0 objects 2 dirty_or_tx 0] parents=0x3000000024f.head["f28"], 0x30000000255.head["f2c"] 0x7fbdcd494930) mds.2 seq 18 used Fc want - flush - retain pAsLsXsFsc held pAsLsXsFcrb revoking Fr dropping Fb
2023-06-15T06:33:23.297+0000 7fbdd3fff700 15 client.4941 auth cap, reset requested_max_size due to not wanting any file write cap
2023-06-15T06:33:23.297+0000 7fbdd3fff700  1 -- 192.168.0.1:0/1731885576 --> [v2:172.21.15.29:6832/1469037096,v1:172.21.15.29:6833/1469037096] -- client_caps(update ino 0x3000000025b 1 seq 18 caps=pAsLsXsFc dirty=- wanted=- follows 0 size 5984600/12582912 ts 1/18446744073709551615 mtime 2023-06-15T06:33:20.586624+0000 ctime 2023-06-15T06:33:20.586624+0000 change_attr 136 tws 1) v12 -- 0x7fbda4002770 con 0x7fbde402b2b0

But in the mds side, it didn't correctly evaled the locker state:

2023-06-15T06:33:23.297+0000 7fde93b16700  1 -- [v2:172.21.15.29:6832/1469037096,v1:172.21.15.29:6833/1469037096] <== client.4941 192.168.0.1:0/1731885576 4980 ==== client_caps(update ino 0x3000000025b 1 seq 18 caps=pAsLsXsFc dirty=- wanted=- follows 0 size 5984600/12582912 ts 1/18446744073709551615 mtime 2023-06-15T06:33:20.586624+0000 ctime 2023-06-15T06:33:20.586624+0000 change_attr 136 tws 1) v12 ==== 260+0+0 (crc 0 0 0) 0x556a8282d180 con 0x556a81cc5c00
2023-06-15T06:33:23.297+0000 7fde93b16700  7 mds.2.locker handle_client_caps  on 0x3000000025b tid 0 follows 0 op update flags 0x0
2023-06-15T06:33:23.297+0000 7fde93b16700 20 mds.2.12 get_session have 0x556a81c89900 client.4941 192.168.0.1:0/1731885576 state open
...
ty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker simple_eval (ipolicy sync) on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile lock->sync) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFc/pAsLsXsFcb/-@19},l=4941(-1) | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker eval done

It was caused by seqs mismatched between revoke and cap update requests.
Raised one PR for this https://tracker.ceph.com/issues/61782.

@lxbsz
Copy link
Member Author

lxbsz commented Jun 27, 2023

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

There is another xfstests-dev test case reproduced this again, but was introduce by this PR:

https://pulpito.ceph.com/xiubli-2023-06-26_02:38:43-fs:functional-wip-lxb-xlock-20230619-0716-distro-default-smithi/7316098/

I created a tracker to fix this https://tracker.ceph.com/issues/61818.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants