mds: wait reintegrate to finish when unlinking by lxbsz · Pull Request #49553 · ceph/ceph

lxbsz · 2022-12-23T10:11:20Z

Just before unlinking one dentry and another MDS triggerred a
reintegrate_stray request, which will finally be a RENAME client
request to current MDS. The rename and unlink requests will deadlock.

This fix will block the unlink the request if the reintegrate_stray
is going for current dentry.

Fixes: https://tracker.ceph.com/issues/58340
Fixes: https://tracker.ceph.com/issues/59657
Signed-off-by: Xiubo Li xiubli@redhat.com

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

vshankar · 2023-01-09T06:34:34Z

jenkins test api

src/mds/MDSRank.cc

kotreshhr

The code looks good but could you explain in the commit message what's the actual deadlock that's being hit with out the fix ?

src/mds/Server.cc

lxbsz · 2023-01-11T04:32:55Z

1, In one MDS when evaling the dentry it will send a reintegrate rename request to itself.
2, The MDS will handle the reintegrate rename request and then find this dentry is a remote dentry and which the CInode is linked in the stray in another MDS:

mds.2.locker req remote auth_pin of [inode 0x20000000515 [...2,head] ~mds1/stray0/20000000515 ...

And then the reintegrate rename request was added into a waiter list.

And in mds1 the auth pin will be frozen, so no other new requests could get the authpin later.

3, One client sends a unlink request to the same dentry, but it will do the remote authpin:

mds.2.locker requesting remote auth_pins from mds.1

And then the unlink request is added into the waiter list too, but before that this dentry's state was set to STATE_UNLINKING.

Since in Step2 the authpin for dentry in mds1 was already frozen, so the unlink will wait for the reintegrate rename to finish.

4, Then the reintegrate rename request is retried and then finds that the dentry is being unlinked and then will be added another wait list.

5, So the unlink request and reintegrate rename request will be deadlock.

vshankar

FWIW, test run https://pulpito.ceph.com/vshankar-2023-03-08_15:12:36-fs-wip-vshankar-testing-20230308.112059-testing-default-smithi/7197021/ has the same (fsstress.sh) hanging, but its only running a single active mds.

vshankar · 2023-03-09T11:21:31Z

src/mds/MDSRank.h

 class ScrubStack;
 class C_ExecAndReply;

+struct MDSMetaRequest {


Is this really required? Cannot the MClientRequest (for the rename op to the peer mds) be enough for tracking?

in Client.cc it will use the MetaRequest {} to track the inflight requests in each session. While the MClientRequest will only be used to do the real request to MDS, we cannot tracker the request with this.

This is why I added MDSMetaRequest in MDS side.

vshankar · 2023-03-09T11:46:12Z

src/mds/Server.cc

  }

+  if (client_inst.name.is_mds() && reply->get_op() == CEPH_MDS_OP_RENAME) {
+    mds->send_message(reply, mdr->client_request->get_connection());


Hmmm.. the reintegration (rename) operation was never replied back to the peer (request initiator) mds.

lxbsz · 2023-03-14T07:48:15Z

FWIW, test run https://pulpito.ceph.com/vshankar-2023-03-08_15:12:36-fs-wip-vshankar-testing-20230308.112059-testing-default-smithi/7197021/ has the same (fsstress.sh) hanging, but its only running a single active mds.

This time the fsstress.sh test passed. The mds log files are filled full with getattr Fsr #0x10000005a52 from client.24351

This should be another different issue :

2023-03-08T17:03:50.413+0000 7f34ba077700  1 -- [v2:172.21.15.39:6826/722455264,v1:172.21.15.39:6827/722455264] <== client.24351 v1:192.168.0.1:0/3921107847 12428771 ==== client_request(client.24351:13291264 getattr Fsr #0x10000005a52 2023-03-08T17:03:50.413624+0000 caller_uid=1000, caller_gid=1268{6,36,1000,1268,}) v6 ==== 180+0+0 (unknown 3839491394 0 0) 0x561d523b1b00 con 0x561d524a2400
2023-03-08T17:03:50.413+0000 7f34ba077700  4 mds.0.server handle_client_request client_request(client.24351:13291264 getattr Fsr #0x10000005a52 2023-03-08T17:03:50.413624+0000 caller_uid=1000, caller_gid=1268{6,36,1000,1268,}) v6
...

The kernel have the same issue with https://tracker.ceph.com/issues/58564, IMO it should be caused by the deadlock issue from kclient.

lxbsz · 2023-03-14T07:48:33Z

jenkins retest this please

lxbsz · 2023-03-14T08:48:49Z

Rebased it.

lxbsz · 2023-04-21T02:29:47Z

Rebased it and updated the commit comments.

vshankar · 2023-04-24T04:18:56Z

jenkins test make check

vshankar · 2023-04-24T15:10:31Z

jenkins test api

vshankar · 2023-04-27T11:05:10Z

jenkins test windows

vshankar · 2023-05-08T03:14:35Z

fs suite run: https://pulpito.ceph.com/vshankar-2023-05-08_02:52:50-fs-wip-vshankar-testing-20230507.032515-testing-default-smithi/

vshankar · 2023-05-09T09:07:45Z

https://pulpito.ceph.com/vshankar-2023-05-09_07:29:33-fs-wip-vshankar-testing-20230509.054005-testing-default-smithi/

(after removing a problematic PR)

vshankar · 2023-05-18T04:48:19Z

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:

https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220

Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

lxbsz · 2023-05-18T05:13:33Z

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

I will check it today.

Fixes: https://tracker.ceph.com/issues/58340 Signed-off-by: Xiubo Li <xiubli@redhat.com>

This will allow the MDS could handle the MClientReply requests. Fixes: https://tracker.ceph.com/issues/58340 Signed-off-by: Xiubo Li <xiubli@redhat.com>

When unlinking a remote dentry while primary dentry has been just unlinked and was trying to reintegrate, which will finally be a RENAME client request to current MDS, the stray dentry. If the RENAME request is sleeping and waiting for some resources just after authpin the CInode, so the unlinking request will mark the remote dentry as UNLINKING state and then add itself to the wait list just waiting for the CInode to be authpined. Then after the RENAME request is retried later it will add to a wait list again and waiting for the UNLINKING state to be cleared. And finally the RENAME and UNLINK requests will be deadlock. This fix will block the UNLINK request if the reintegrate_stray is going on for current dentry. Fixes: https://tracker.ceph.com/issues/58340 Signed-off-by: Xiubo Li <xiubli@redhat.com>

lxbsz · 2023-05-18T05:34:23Z

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

The failure was introduced by the new commit cdb2275. I will remove it for now. This is a separated issue and will fix it in a later PR.

lxbsz · 2023-05-18T07:24:04Z

@vshankar here was my testing branch that included everything in wip-pdonnell-testing2:
https://pulpito.ceph.com/?branch=wip-pdonnell-testing-20230511.185220
Feel free to have a cursory look at those results but I don't see anything caused by this PR. So this is good to merge from my side.

There are 3 fsstress.sh workunit failures with a couple of them being timeouts (ret: 124). Looking at those logs, the failures do not resemble the hangs that were seen without this change, but we'd need to be 100% sure that nothing else breaks. I'll have a deeper look when I get a bit more time. I would suggest @lxbsz to have a look too.

The failure was introduced by the new commit cdb2275. I will remove it for now. This is a separated issue and will fix it in a later PR.

Sorry, I misreaded the logs. It should be a new issue. Such as for https://pulpito.ceph.com/pdonnell-2023-05-11_23:23:58-fs-wip-pdonnell-testing-20230511.185220-distro-default-smithi/7272082/, there have some slow requests, but the slow requests were caused by the recursive stray reintergrating.

2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.913131 seconds old, received at 2023-05-12T05:29:25.734362+0000: client_request(mds.4:110 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently acquired locks
2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.910954 seconds old, received at 2023-05-12T05:29:25.736540+0000: client_request(client.4836:17788 unlink #0x20000000f31/l118 2023-05-12T05:29:25.726057+0000 caller_uid=1000, caller_gid=1265{6,36,1000,1265,}) currently acquired locks
2023-05-12T05:53:27.647+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.426230 seconds old, received at 2023-05-12T05:29:26.221263+0000: client_request(mds.4:111 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-12T05:53:27.648+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.311113 seconds old, received at 2023-05-12T05:29:26.336380+0000: client_request(mds.4:112 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-12T05:53:27.648+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 1441.212995 seconds old, received at 2023-05-12T05:29:26.434498+0000: client_request(mds.4:113 rename #0x20000000f31/l118 #0x610/20000000fb7 caller_uid=0, caller_gid=0{}) currently dispatched

Note: the above unlink request is okay and it release all the resources already. And in another failure there has no unlink request and also has the same issue, please see https://pulpito.ceph.com/pdonnell-2023-05-15_12:34:27-fs-wip-pdonnell-testing-20230511.185220-distro-default-smithi/7273788/:

2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 720.504186 seconds old, received at 2023-05-15T13:50:55.136671+0000: client_request(mds.3:74 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 722.411774 seconds old, received at 2023-05-15T13:50:53.229083+0000: client_request(mds.3:71 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently acquired locks
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 720.318309 seconds old, received at 2023-05-15T13:50:55.322548+0000: client_request(mds.0:531 rename #0x625/20000000d04 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently failed to xlock, waiting
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 721.572030 seconds old, received at 2023-05-15T13:50:54.068828+0000: client_request(mds.3:72 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched
2023-05-15T14:02:55.640+0000 14d14700  0 log_channel(cluster) log [WRN] : slow request 721.089038 seconds old, received at 2023-05-15T13:50:54.551819+0000: client_request(mds.3:73 rename #0x20000000d44/c17 #0x607/20000000d04 caller_uid=0, caller_gid=0{}) currently dispatched

When the first stray dn reintegrating is triggered and when received the rename request from the same MDS, and after the rdlock_two_paths_xlock_destdn() succeeded, it will try to call:

 8818   CInode *oldin = 0;
 8819   if (!destdnl->is_null()) {
 8820     //dout(10) << "dest dn exists " << *destdn << dendl;
 8821     oldin = mdcache->get_dentry_inode(destdn, mdr, true);
 8822     if (!oldin) return;
 8823     dout(10) << " oldin " << *oldin << dendl;
 8824

And in mdcache->get_dentry_inode() it will trigger another stray dn reinstegrating, which is the same with previous one.

Checked the logs I found that:

2023-05-12T05:29:26.219+0000 13511700  7 mds.4.cache get_dentry_inode linking in remote in [inode 0x20000000fb7 [...2,head] ~mds1/stray6/20000000fb7 rep@1.2 symlink='xxxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/xxxxxxxxx/x' v1508 snaprealm=0xc85d370 s=1702 n(v0 rc2023-05-12T05:28:58.908577+0000 b1702 1=1+0) (iversion lock) caps={4836=p/-@1} | request=1 lock=0 caps=1 remoteparent=1 0x1bc115f0]
2023-05-12T05:29:26.219+0000 13511700 10 mds.4.cache.strays eval_remote [dentry #0x1/client.0/tmp/tmp.SEF9eFwF43/p8/d1/d33/d55/l118 [2,head] auth REMOTE(lnk) (dn xlock x=1 by 0x1bcda410) (dversion lock w=1) v=474 ap=2 ino=0x20000000fb7 state=1610612992|reintegrating | request=2 lock=2 purging=2 dirty=1 waiter=1 authpin=1 0x1c967560]
2023-05-12T05:29:26.219+0000 13511700 20 mds.4.cache.strays _eval_stray_remote [dentry #0x101/stray6/20000000fb7 [2,head] rep@1.1 (dn lock x=1 by 0x1bcda410) (dversion lock) v=0 ino=0x20000000fb7 state=64|bottomlru | request=1 lock=1 inodepin=1 0xc718fb0]

The dstdn, which is [dentry #0x1/client.0/tmp/tmp.SEF9eFwF43/p8/d1/d33/d55/l118 is NOT primary. And then it will trigger a new stray dn reintegrating.

This should happen:

1, create a fileA. <== This should be the primary ?
2, hard link fileB to fileA.
3, remove fileA and then it will be moved to the straydnA.
4, trigger reintegrate straydnA to fileB.
5, in the rename code it will call the mdcache->get_dentry_inode(dn for fileB)
6, since the dn for fileB is not primary then it will trigger a new reintegrate.

I still have no idea why we couldn't see this before ? How this is related to this PR ?

@vshankar Could you have a double check ? Thanks

lxbsz · 2023-05-18T10:52:25Z

...

This should happen:

1, create a fileA. <== This should be the primary ? 2, hard link fileB to fileA. 3, remove fileA and then it will be moved to the straydnA. 4, trigger reintegrate straydnA to fileB. 5, in the rename code it will call the mdcache->get_dentry_inode(dn for fileB) 6, since the dn for fileB is not primary then it will trigger a new reintegrate.

I still have no idea why we couldn't see this before ? How this is related to this PR ?

@vshankar Could you have a double check ? Thanks

Checked this again and this should be a new issue:

In the multiple MDS case and if a dentry in mds.1 has a hard link in mds.4. So after the primary denty in mds.1 is removed the mds.1 will trigger a migrate_stray to migrate the straydn to mds.4. This is why we can see something like:

2023-05-12T05:29:26.221+0000 13511700 20 mds.4.locker  must xlock (ilink sync) [inode 0x20000000fb7 [...2,head] ~mds1/stray6/20000000fb7 rep@1.2

The mds.4 will have mds.1's stray direcotries.

And then in mds.4 it will trigger a reintegrate_stray to reintegrate the ~mds1/stray6/20000000fb7 to the hard link dentry 0x20000000f31/l118, which will send a rename request to mds.4 itself.

But currently the dentry of ~mds1/stray6/20000000fb7 will be the primary dentry for the inode 0x20000000fb7. So in the handle_client_rename(destdn = 0x20000000f31/l118, srcdn = ~mds1/stray6/20000000fb7) --> mdcache->get_dentry_inode(destdn, mdr, true); --> destdn->link_remote(dnl, in); --> StrayManager::eval_remote(destdn):

589 void StrayManager::eval_remote(CDentry *remote_dn)
590 {  
...
607   // refers to stray?
608   CDentry *primary_dn = in->get_projected_parent_dn();
609   ceph_assert(primary_dn != NULL);
610   if (primary_dn->get_dir()->get_inode()->is_stray()) {
611     _eval_stray_remote(primary_dn, remote_dn);                                                                                                                                                                                     
612   } else {
613     dout(20) << __func__ << ": inode's primary dn not stray" << dendl;
614   }
615 }

In Line#680 the primary_dn will be the dentry of ~mds1/stray6/20000000fb7 and the mds.4 is the auth, then in:

631 void StrayManager::_eval_stray_remote(CDentry *stray_dn, CDentry *remote_dn)
632 {
...
664   if (!remote_dn->is_projected()) {
665     if (remote_dn->is_auth()) {
666       if (remote_dn->dir->can_auth_pin()) {
667         reintegrate_stray(stray_dn, remote_dn);
668       } else {
669         remote_dn->dir->add_waiter(CDir::WAIT_UNFREEZE, new C_RetryEvalRemote(this, remote_dn));
670         dout(20) << __func__ << ": not reintegrating (can't authpin remote parent)" << dendl;
671       }
672 
...

It will trigger a new reintegrate_stray() again recursively.

While this PR doesn't change this logic.

lxbsz · 2023-05-18T11:29:07Z

It should be a similar issue with cdb2275, the last commit. And it didn't fix all cases. I just remove it from this PR for now.

And will fix it later in one separate PR.

* refs/pull/49553/head: mds: wait migrate to finish when linking mds: wait reintegrate to finish when unlinking message: make MClientReply inherit MMDSOp mds: notify the waiters in replica MDSs

batrick · 2023-05-19T20:27:55Z

@lxbsz fyi, I included your PR in this run: https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/

vshankar · 2023-05-20T01:45:29Z

It should be a similar issue with cdb2275, the last commit. And it didn't fix all cases. I just remove it from this PR for now.

And will fix it later in one separate PR.

Looking into this now @lxbsz

(Sorry for the delay)

vshankar · 2023-05-20T10:20:21Z

jenkins test make check

vshankar · 2023-05-20T10:20:29Z

jenkins test make check arm64

* refs/pull/49553/head: mds: wait reintegrate to finish when unlinking message: make MClientReply inherit MMDSOp mds: notify the waiters in replica MDSs

lxbsz · 2023-05-22T01:31:34Z

@lxbsz fyi, I included your PR in this run: https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/

Thanks @batrick, there have 2 fsstress test failures, but they are all not relevant and were caused by tracker#61148.

The kernel_untar_build.sh was a known issue the same with tracker#59342.

For the other failures I didn't check them all one by one, but mostly also caused by tracker#61148, such as the following pjd test failure:

https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/7279676/

Thanks

lxbsz · 2023-05-22T01:45:40Z

jenkins test make check

batrick · 2023-05-22T13:58:53Z

jenkins test make check

vshankar · 2023-05-22T15:04:35Z

@lxbsz fyi, I included your PR in this run: https://pulpito.ceph.com/pdonnell-2023-05-19_20:26:49-fs-wip-pdonnell-testing-20230519.164114-distro-default-smithi/

@batrick @lxbsz please go ahead and merge the PRs if the test runs a fine.

batrick · 2023-05-22T16:17:28Z

jenkins test make check

vshankar · 2023-06-20T04:49:35Z

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

lxbsz · 2023-06-23T13:34:16Z

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

@vshankar

I just checked one failure, it's not the same issue:

https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

2023-06-15T06:36:24.325+0000 7fde92313700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 181.032430 secs
2023-06-15T06:36:24.325+0000 7fde92313700  0 log_channel(cluster) log [WRN] : slow request 181.032429 seconds old, received at 2023-06-15T06:33:23.298087+0000: client_request(client.4941:17403 setattr size=1406446 #0x3000000025b 2023-06-15T06:33:23.297873+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) currently failed to xlock, waiting

The setattr request was waiting the xlock. I haven't gone through the log in detail yet.

[EDIT]

The client has dropped the Fb caps:

2023-06-15T06:33:23.297+0000 7fbdd3fff700 10 client.4941 send_cap 0x3000000025b.head(faked_ino=0 nref=20 ll_ref=23 cap_refs={4=0,1024=0,2048=0,4096=0,8192=0} open={1=0,2=0} mode=100666 size=5984600/12582912 nlink=2 btime=2023-06-15T06:33:01.855043+0000 mtime=2023-06-15T06:33:20.586624+0000 ctime=2023-06-15T06:33:20.586624+0000 change_attr=136 caps=pAsLsXsFcb(2=pAsLsXsFcb) objectset[0x3000000025b ts 0/0 objects 2 dirty_or_tx 0] parents=0x3000000024f.head["f28"], 0x30000000255.head["f2c"] 0x7fbdcd494930) mds.2 seq 18 used Fc want - flush - retain pAsLsXsFsc held pAsLsXsFcrb revoking Fr dropping Fb
2023-06-15T06:33:23.297+0000 7fbdd3fff700 15 client.4941 auth cap, reset requested_max_size due to not wanting any file write cap
2023-06-15T06:33:23.297+0000 7fbdd3fff700  1 -- 192.168.0.1:0/1731885576 --> [v2:172.21.15.29:6832/1469037096,v1:172.21.15.29:6833/1469037096] -- client_caps(update ino 0x3000000025b 1 seq 18 caps=pAsLsXsFc dirty=- wanted=- follows 0 size 5984600/12582912 ts 1/18446744073709551615 mtime 2023-06-15T06:33:20.586624+0000 ctime 2023-06-15T06:33:20.586624+0000 change_attr 136 tws 1) v12 -- 0x7fbda4002770 con 0x7fbde402b2b0

But in the mds side, it didn't correctly evaled the locker state:

2023-06-15T06:33:23.297+0000 7fde93b16700  1 -- [v2:172.21.15.29:6832/1469037096,v1:172.21.15.29:6833/1469037096] <== client.4941 192.168.0.1:0/1731885576 4980 ==== client_caps(update ino 0x3000000025b 1 seq 18 caps=pAsLsXsFc dirty=- wanted=- follows 0 size 5984600/12582912 ts 1/18446744073709551615 mtime 2023-06-15T06:33:20.586624+0000 ctime 2023-06-15T06:33:20.586624+0000 change_attr 136 tws 1) v12 ==== 260+0+0 (crc 0 0 0) 0x556a8282d180 con 0x556a81cc5c00
2023-06-15T06:33:23.297+0000 7fde93b16700  7 mds.2.locker handle_client_caps  on 0x3000000025b tid 0 follows 0 op update flags 0x0
2023-06-15T06:33:23.297+0000 7fde93b16700 20 mds.2.12 get_session have 0x556a81c89900 client.4941 192.168.0.1:0/1731885576 state open
...
ty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker simple_eval (ipolicy sync) on [inode 0x3000000025b [...2,head] /client.0/tmp/tmp.o8Xb5OZu0n/p8/db/d23/f28 auth v128 ap=2 snaprealm=0x556a8355cb40 DIRTYPARENT s=5984600 nl=2 n(v0 rc2023-06-15T06:33:20.586624+0000 b5984600 1=1+0) (ifile lock->sync) (iversion lock) cr={4941=0-12582912@1} caps={4941=pAsLsXsFc/pAsLsXsFcb/-@19},l=4941(-1) | ptrwaiter=0 request=1 lock=1 caps=1 remoteparent=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x556a83664000]
2023-06-15T06:33:23.297+0000 7fde93b16700 10 mds.2.locker eval done

It was caused by seqs mismatched between revoke and cap update requests.
Raised one PR for this https://tracker.ceph.com/issues/61782.

lxbsz · 2023-06-27T04:28:47Z

@lxbsz @batrick FYI, I still see fsstress failures with multimds: https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/7305053/

I'm going over the logs to see if its a new issue or a corner case which is still lurking around.

There is another xfstests-dev test case reproduced this again, but was introduce by this PR:

https://pulpito.ceph.com/xiubli-2023-06-26_02:38:43-fs:functional-wip-lxb-xlock-20230619-0716-distro-default-smithi/7316098/

I created a tracker to fix this https://tracker.ceph.com/issues/61818.

lxbsz requested review from a team and vshankar December 23, 2022 10:11

github-actions bot added the cephfs Ceph File System label Dec 23, 2022

orange30 mentioned this pull request Jan 6, 2023

mds：when want_auth is true, path_traverse should ensure the return Inode is auth. #49652

Closed

14 tasks

kotreshhr reviewed Jan 9, 2023

View reviewed changes

src/mds/MDSRank.cc Show resolved Hide resolved

dparmar18 reviewed Jan 9, 2023

View reviewed changes

src/mds/MDSRank.cc Outdated Show resolved Hide resolved

lxbsz force-pushed the wip-58340 branch from af088f8 to 9afa54e Compare January 10, 2023 08:47

kotreshhr reviewed Jan 10, 2023

View reviewed changes

src/mds/Server.cc Outdated Show resolved Hide resolved

src/mds/Server.cc Show resolved Hide resolved

lxbsz force-pushed the wip-58340 branch from 9afa54e to 22b59b3 Compare January 11, 2023 04:43

lxbsz requested review from dparmar18 and kotreshhr January 11, 2023 04:44

lxbsz force-pushed the wip-58340 branch from 22b59b3 to ca95a23 Compare January 11, 2023 07:07

gregsfortytwo requested a review from batrick January 11, 2023 14:20

vshankar reviewed Mar 9, 2023

View reviewed changes

lxbsz force-pushed the wip-58340 branch from ca95a23 to 5605c45 Compare March 14, 2023 08:48

lxbsz force-pushed the wip-58340 branch from 5605c45 to 4d1392c Compare April 21, 2023 02:29

lxbsz force-pushed the wip-58340 branch from 4d1392c to c77c0e5 Compare April 24, 2023 03:06

vshankar added the wip-vshankar-testing3 label Apr 25, 2023

lxbsz mentioned this pull request May 5, 2023

mds: wait migrate to finish when linking #51363

Closed

14 tasks

lxbsz added 3 commits May 18, 2023 13:28

mds: notify the waiters in replica MDSs

478db14

Fixes: https://tracker.ceph.com/issues/58340 Signed-off-by: Xiubo Li <xiubli@redhat.com>

message: make MClientReply inherit MMDSOp

4e03934

This will allow the MDS could handle the MClientReply requests. Fixes: https://tracker.ceph.com/issues/58340 Signed-off-by: Xiubo Li <xiubli@redhat.com>

lxbsz force-pushed the wip-58340 branch from cdb2275 to c3b3672 Compare May 18, 2023 05:35

lxbsz force-pushed the wip-58340 branch from df6932b to c3b3672 Compare May 18, 2023 11:27

lxbsz added the wip-lxb-testing label May 18, 2023

batrick merged commit 6f1eae8 into ceph:main May 22, 2023

This was referenced May 23, 2023

reef: mds: wait reintegrate to finish when unlinking #51684

Merged

quincy: mds: wait reintegrate to finish when unlinking #51685

Merged

pacific: mds: wait reintegrate to finish when unlinking #51686

Merged

Conversation

lxbsz commented Dec 23, 2022 • edited by batrick Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

vshankar commented Jan 9, 2023

Uh oh!

Uh oh!

Uh oh!

kotreshhr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lxbsz commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

vshankar Mar 9, 2023

Choose a reason for hiding this comment

Uh oh!

lxbsz Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

vshankar Mar 9, 2023

Choose a reason for hiding this comment

Uh oh!

lxbsz Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

lxbsz commented Mar 14, 2023

Uh oh!

lxbsz commented Mar 14, 2023

Uh oh!

lxbsz commented Mar 14, 2023

Uh oh!

lxbsz commented Apr 21, 2023

Uh oh!

vshankar commented Apr 24, 2023

Uh oh!

vshankar commented Apr 24, 2023

Uh oh!

vshankar commented Apr 27, 2023

Uh oh!

vshankar commented May 8, 2023

Uh oh!

vshankar commented May 9, 2023

Uh oh!

vshankar commented May 18, 2023

Uh oh!

lxbsz commented May 18, 2023

Uh oh!

lxbsz commented May 18, 2023

Uh oh!

lxbsz commented May 18, 2023

Uh oh!

lxbsz commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lxbsz commented May 18, 2023

Uh oh!

batrick commented May 19, 2023

Uh oh!

vshankar commented May 20, 2023

Uh oh!

vshankar commented May 20, 2023

Uh oh!

vshankar commented May 20, 2023

Uh oh!

lxbsz commented May 22, 2023

Uh oh!

lxbsz commented May 22, 2023

Uh oh!

batrick commented May 22, 2023

Uh oh!

vshankar commented May 22, 2023

Uh oh!

lxbsz commented Dec 23, 2022 •

edited by batrick

Loading

lxbsz commented Jan 11, 2023 •

edited

Loading

lxbsz commented May 18, 2023 •

edited

Loading

lxbsz commented Jun 23, 2023 •

edited

Loading