client: introduce timeout for client shutdown#35496
Conversation
| .add_tag("client"), | ||
|
|
||
| Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED) | ||
| .set_default(30) |
There was a problem hiding this comment.
Is 30s enough? Suppose I have a network hiccup and a bunch of outstanding data to write back. Might this lead to data corruption when writeback occurs? Note that the client may not end up blacklisted in this case, as 30s is not long enough for that to occur.
I think the default probably ought to be on the order of the auto-blacklisting timeout in the MDS. IOW, If you're going to end up blacklisted anyway then you might as well just give up. Otherwise, you probably want to wait for the MDS to come back.
There was a problem hiding this comment.
Or just default to how it is now which is indefinite. Is there a similar config in kclient?
Before closing sessions, clients should have flushed buffered data, caps and synchronously wait for those flushes to get acked (on unmount). That should be enough to not run into any kind of data corruption.
There was a problem hiding this comment.
I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?
Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.
There was a problem hiding this comment.
I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?
This is called via unmount or client calling shutdown directly. Calling shutdown directly without flushing buffered data is surely problematic and there are no checks in shutdown to catch this. For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...
Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.
Not often. The sequence that was seen in teuthology tests:
- Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open
- mds rank 0 goes offline
- mount timeout kicks in
- shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565
(Note that I haven't handled the case when MDS is back up online before timeout)
There was a problem hiding this comment.
For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...
Yes, it should. That said, if the MDSs are unresponsive then you'll probably end up stuck before this point and the timeout won't kick in.
Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.
Not often. The sequence that was seen in teuthology tests:
* Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open * mds rank 0 goes offline * mount timeout kicks in * shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565(Note that I haven't handled the case when MDS is back up online before timeout)
Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?
There was a problem hiding this comment.
It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.
There was a problem hiding this comment.
Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?
Timeout should be configurable IMO.
There was a problem hiding this comment.
It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a
umount.
@batrick I think we should keep the default as 0 (no timeout, current behaviour).
| .add_tag("client"), | ||
|
|
||
| Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED) | ||
| .set_default(30) |
There was a problem hiding this comment.
It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.
Client::shutdown() could indefinitely wait when tearing down MDS sessions if an MDS is unreachable during shutdown, but a valid session existed, i.e., Client::mount() was successfull. These failures were initially observed in mgr/volumes tests in teuthology and lately during rados/mgr selftests where manager plugins do not respawn as volumes plugin waits for cleaning up libcephfs handles via Client::shutdown(). Fixes: http://tracker.ceph.com/issues/44276 Signed-off-by: Venky Shankar <vshankar@redhat.com>
7dd9dc1 to
7c716be
Compare
batrick
left a comment
There was a problem hiding this comment.
Got this test failure:
2020-06-25T00:22:10.647+0000 7f23876931c0 2 client.7159 unmounting
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=6 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 _ll_drop_pins
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.get on 0x7f2360006ac0 0x1.head now 2
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 _ll_put 0x7f2360006ac0 0x1 6 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 remove_cap mds.0 on 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 client.7159 remove_cap last one, closing snaprealm 0x7f23600069e0
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 put_snap_realm 0x1 0x7f23600069e0 1 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode deleting 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=- COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 flush_caps_sync
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 wait_sync_caps want 1 (last is 1, 0 total flushing)
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 trim_cache size 0 max 16384
2020-06-25T00:22:10.647+0000 7f23876931c0 2 client.7159 _close_mds_session mds.0 seq 0
2020-06-25T00:22:10.647+0000 7f23876931c0 1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:6834/4194277431,v1:172.21.15.31:6835/4194277431] -- client_session(request_close) v4 -- 0x555661c38c00 con 0x555661c34c00
2020-06-25T00:22:10.647+0000 7f23876931c0 2 client.7159 waiting for 1 mds session(s) to close (timeout: 30s)
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 -- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 msgr2=0x555661c2b400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 0x555661c2b400 secure :-1 s=READY pgs=1093 cs=0 l=1 rev1=1 rx=0x7f2368002f30 tx=0x7f2368013000).stop
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] conn(0x555661c1c830 0x7f2350003f30 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x7f2350004490 0x7f2350006920 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] conn(0x7f2350007040 0x7f23500094e0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009c10 con 0x7f2350007040
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009d80 con 0x7f2350004490
2020-06-25T00:22:40.395+0000 7f235d7fa700 1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009f40 con 0x555661c1c830
2020-06-25T00:22:40.646+0000 7f23876931c0 1 client.7159 1 mds(s) did not respond to session close -- timing out.
/ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176692/remote/smithi031/log/ceph-client.1.53480.log.gz
Client timed out the close operation within a second. Looks like this wait_for is buggy.
How is that a second? Wait started at
timed out at:
i.e., from 00:22:10.647 to 00:22:40.646 -- that's 30 seconds -- shutdown timeout. I'll take a look at the test anyway. |
|
Oh, I need new glasses! I'll take another look too. Odd. |
|
So the next thing the client does after getting a That message was never received by the MDS. This appears to be a networking issue, possibly caused by |
Ummm... maybe include this pr in your subsequent run to see if it happens again. |
Sure. I've an issue for this now: https://tracker.ceph.com/issues/46282 |
|
retest this please |
Client::shutdown() could indefinitely wait when tearing down
MDS sessions if an MDS is unreachable during shutdown, but a
valid session existed, i.e., Client::mount() was successfull.
These failures were initially observed in mgr/volumes tests in
teuthology and lately during rados/mgr selftests where manager
plugins do not respawn as volumes plugin waits for cleaning up
libcephfs handles via Client::shutdown().
Fixes: http://tracker.ceph.com/issues/44276
Signed-off-by: Venky Shankar vshankar@redhat.com