client: introduce timeout for client shutdown by vshankar · Pull Request #35496 · ceph/ceph

vshankar · 2020-06-09T07:43:03Z

Client::shutdown() could indefinitely wait when tearing down
MDS sessions if an MDS is unreachable during shutdown, but a
valid session existed, i.e., Client::mount() was successfull.

These failures were initially observed in mgr/volumes tests in
teuthology and lately during rados/mgr selftests where manager
plugins do not respawn as volumes plugin waits for cleaning up
libcephfs handles via Client::shutdown().

Fixes: http://tracker.ceph.com/issues/44276
Signed-off-by: Venky Shankar vshankar@redhat.com

jtlayton · 2020-06-09T11:44:32Z

src/common/options.cc

+    .add_tag("client"),
+
+    Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
+    .set_default(30)


Is 30s enough? Suppose I have a network hiccup and a bunch of outstanding data to write back. Might this lead to data corruption when writeback occurs? Note that the client may not end up blacklisted in this case, as 30s is not long enough for that to occur.

I think the default probably ought to be on the order of the auto-blacklisting timeout in the MDS. IOW, If you're going to end up blacklisted anyway then you might as well just give up. Otherwise, you probably want to wait for the MDS to come back.

Or just default to how it is now which is indefinite. Is there a similar config in kclient?

Before closing sessions, clients should have flushed buffered data, caps and synchronously wait for those flushes to get acked (on unmount). That should be enough to not run into any kind of data corruption.

I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?

This is called via unmount or client calling shutdown directly. Calling shutdown directly without flushing buffered data is surely problematic and there are no checks in shutdown to catch this. For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

Not often. The sequence that was seen in teuthology tests:

Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open

mds rank 0 goes offline

mount timeout kicks in

shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565

(Note that I haven't handled the case when MDS is back up online before timeout)

For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...

Yes, it should. That said, if the MDSs are unresponsive then you'll probably end up stuck before this point and the timeout won't kick in.

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

Not often. The sequence that was seen in teuthology tests:

* Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open * mds rank 0 goes offline * mount timeout kicks in * shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565

(Note that I haven't handled the case when MDS is back up online before timeout)

Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?

It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?

Timeout should be configurable IMO.

It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

@batrick I think we should keep the default as 0 (no timeout, current behaviour).

src/client/Client.cc

batrick · 2020-06-11T23:52:49Z

src/common/options.cc

+    .add_tag("client"),
+
+    Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
+    .set_default(30)


It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

src/client/Client.cc

src/common/options.cc

Client::shutdown() could indefinitely wait when tearing down MDS sessions if an MDS is unreachable during shutdown, but a valid session existed, i.e., Client::mount() was successfull. These failures were initially observed in mgr/volumes tests in teuthology and lately during rados/mgr selftests where manager plugins do not respawn as volumes plugin waits for cleaning up libcephfs handles via Client::shutdown(). Fixes: http://tracker.ceph.com/issues/44276 Signed-off-by: Venky Shankar <vshankar@redhat.com>

batrick

Got this test failure:

2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 unmounting
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=6 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 _ll_drop_pins
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.get on 0x7f2360006ac0 0x1.head now 2
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 _ll_put 0x7f2360006ac0 0x1 6 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 remove_cap mds.0 on 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 client.7159 remove_cap last one, closing snaprealm 0x7f23600069e0
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 put_snap_realm 0x1 0x7f23600069e0 1 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode deleting 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=- COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 flush_caps_sync
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 wait_sync_caps want 1 (last is 1, 0 total flushing)
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 trim_cache size 0 max 16384
2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 _close_mds_session mds.0 seq 0
2020-06-25T00:22:10.647+0000 7f23876931c0  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:6834/4194277431,v1:172.21.15.31:6835/4194277431] -- client_session(request_close) v4 -- 0x555661c38c00 con 0x555661c34c00
2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 waiting for 1 mds session(s) to close (timeout: 30s)
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 msgr2=0x555661c2b400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 0x555661c2b400 secure :-1 s=READY pgs=1093 cs=0 l=1 rev1=1 rx=0x7f2368002f30 tx=0x7f2368013000).stop
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] conn(0x555661c1c830 0x7f2350003f30 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x7f2350004490 0x7f2350006920 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] conn(0x7f2350007040 0x7f23500094e0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009c10 con 0x7f2350007040
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009d80 con 0x7f2350004490
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009f40 con 0x555661c1c830
2020-06-25T00:22:40.646+0000 7f23876931c0  1 client.7159 1 mds(s) did not respond to session close -- timing out.

/ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176692/remote/smithi031/log/ceph-client.1.53480.log.gz

Client timed out the close operation within a second. Looks like this wait_for is buggy.

vshankar · 2020-06-26T04:20:14Z

Client timed out the close operation within a second. Looks like this wait_for is buggy.

How is that a second? Wait started at

2020-06-25T00:22:10.647+0000 7f23876931c0 2 client.7159 waiting for 1 mds session(s) to close (timeout: 30s)

timed out at:

2020-06-25T00:22:40.646+0000 7f23876931c0 1 client.7159 1 mds(s) did not respond to session close -- timing out.

i.e., from 00:22:10.647 to 00:22:40.646 -- that's 30 seconds -- shutdown timeout.

I'll take a look at the test anyway.

batrick · 2020-06-26T19:14:19Z

Oh, I need new glasses! I'll take another look too. Odd.

batrick · 2020-06-26T20:13:29Z

So the next thing the client does after getting a client_reply from the MDS is renew caps:

2020-06-25T00:22:10.391+0000 7f235ffff700 10 client.7159 renew_caps()
2020-06-25T00:22:10.391+0000 7f235ffff700 15 client.7159 renew_caps requesting from mds.0
2020-06-25T00:22:10.391+0000 7f235ffff700 10 client.7159 renew_caps mds.0
2020-06-25T00:22:10.391+0000 7f235ffff700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:6834/4194277431,v1:172.21.15.31:6835/4194277431] -- client_session(request_renewcaps seq 2) v4 -- 0x7f2340001d10 con 0x555661c34c00

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

vshankar · 2020-06-30T13:22:46Z

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

Ummm... maybe include this pr in your subsequent run to see if it happens again.

batrick · 2020-06-30T16:51:52Z

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

Ummm... maybe include this pr in your subsequent run to see if it happens again.

Sure. I've an issue for this now: https://tracker.ceph.com/issues/46282

batrick · 2020-07-04T20:38:01Z

retest this please

vshankar added the cephfs Ceph File System label Jun 9, 2020

vshankar requested a review from a team June 9, 2020 07:43

jtlayton reviewed Jun 9, 2020

View reviewed changes

jtlayton requested a review from a team June 9, 2020 11:47

batrick requested changes Jun 11, 2020

View reviewed changes

vshankar force-pushed the wip-client-shutdown-timeout branch from 7dd9dc1 to 7c716be Compare June 24, 2020 06:00

batrick approved these changes Jun 24, 2020

View reviewed changes

batrick added wip-pdonnell-testing needs-qa labels Jun 24, 2020

batrick requested changes Jun 25, 2020

View reviewed changes

batrick removed needs-qa wip-pdonnell-testing labels Jun 25, 2020

batrick assigned vshankar Jun 25, 2020

batrick added the needs-qa label Jun 30, 2020

batrick added the wip-pdonnell-testing label Jun 30, 2020

batrick approved these changes Jul 6, 2020

View reviewed changes

batrick merged commit 9b913a1 into ceph:master Jul 6, 2020

smithfarm mentioned this pull request Jul 7, 2020

octopus: client: introduce timeout for client shutdown #35962

Merged

Conversation

vshankar commented Jun 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

vshankar commented Jun 26, 2020

Uh oh!

batrick commented Jun 26, 2020

Uh oh!

batrick commented Jun 26, 2020

Uh oh!

vshankar commented Jun 30, 2020

Uh oh!

batrick commented Jun 30, 2020

Uh oh!

batrick commented Jul 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants