Skip to content

client: introduce timeout for client shutdown#35496

Merged
batrick merged 1 commit intoceph:masterfrom
vshankar:wip-client-shutdown-timeout
Jul 6, 2020
Merged

client: introduce timeout for client shutdown#35496
batrick merged 1 commit intoceph:masterfrom
vshankar:wip-client-shutdown-timeout

Conversation

@vshankar
Copy link
Contributor

@vshankar vshankar commented Jun 9, 2020

Client::shutdown() could indefinitely wait when tearing down
MDS sessions if an MDS is unreachable during shutdown, but a
valid session existed, i.e., Client::mount() was successfull.

These failures were initially observed in mgr/volumes tests in
teuthology and lately during rados/mgr selftests where manager
plugins do not respawn as volumes plugin waits for cleaning up
libcephfs handles via Client::shutdown().

Fixes: http://tracker.ceph.com/issues/44276
Signed-off-by: Venky Shankar vshankar@redhat.com

@vshankar vshankar added the cephfs Ceph File System label Jun 9, 2020
@vshankar vshankar requested a review from a team June 9, 2020 07:43
.add_tag("client"),

Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
.set_default(30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 30s enough? Suppose I have a network hiccup and a bunch of outstanding data to write back. Might this lead to data corruption when writeback occurs? Note that the client may not end up blacklisted in this case, as 30s is not long enough for that to occur.

I think the default probably ought to be on the order of the auto-blacklisting timeout in the MDS. IOW, If you're going to end up blacklisted anyway then you might as well just give up. Otherwise, you probably want to wait for the MDS to come back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just default to how it is now which is indefinite. Is there a similar config in kclient?

Before closing sessions, clients should have flushed buffered data, caps and synchronously wait for those flushes to get acked (on unmount). That should be enough to not run into any kind of data corruption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a similar setting in the kclient. If the data got written back, then that's probably ok for regular files. Could we end up waiting here while there are still waiting on async dirops to complete?

This is called via unmount or client calling shutdown directly. Calling shutdown directly without flushing buffered data is surely problematic and there are no checks in shutdown to catch this. For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

Not often. The sequence that was seen in teuthology tests:

  • Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open
  • mds rank 0 goes offline
  • mount timeout kicks in
  • shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565

(Note that I haven't handled the case when MDS is back up online before timeout)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the normal case of this getting invoke via unmount -- that should have taken care of any async ops I guess...

Yes, it should. That said, if the MDSs are unresponsive then you'll probably end up stuck before this point and the timeout won't kick in.

Also, how often do we end up in a situation where the only thing we're hung on is the session close request? I do wonder if we might plumb this in only to find that we end up similarly hung trying to send a MDS request prior to this point when umounting.

Not often. The sequence that was seen in teuthology tests:

* Client does a mount -- libcephfs picks up rank 0 from mdsmap and initiates session open

* mds rank 0 goes offline

* mount timeout kicks in

* shutdown is initiated -- normally there is no need of shutdown when mount did not go through, however, shutdown initiates session close even if the filesystem is not mounted since sessions might exist to send MDS commands: https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L565

(Note that I haven't handled the case when MDS is back up online before timeout)

Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok. This sounds reasonably harmless then. If this is unlikely to have any effect on normal operation, maybe we should just dispense with the tunable and hardcode a timeout?

Timeout should be configurable IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

@batrick I think we should keep the default as 0 (no timeout, current behaviour).

@jtlayton jtlayton requested a review from a team June 9, 2020 11:47
.add_tag("client"),

Option("client_shutdown_timeout", Option::TYPE_SECS, Option::LEVEL_ADVANCED)
.set_default(30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to me. In the future, we can use this same config to broaden the scope so that we catch MDS/OSDs becoming unavailable during a umount.

Client::shutdown() could indefinitely wait when tearing down
MDS sessions if an MDS is unreachable during shutdown, but a
valid session existed, i.e., Client::mount() was successfull.

These failures were initially observed in mgr/volumes tests in
teuthology and lately during rados/mgr selftests where manager
plugins do not respawn as volumes plugin waits for cleaning up
libcephfs handles via Client::shutdown().

Fixes: http://tracker.ceph.com/issues/44276
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar force-pushed the wip-client-shutdown-timeout branch from 7dd9dc1 to 7c716be Compare June 24, 2020 06:00
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this test failure:

2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 unmounting
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=6 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 _ll_drop_pins
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.get on 0x7f2360006ac0 0x1.head now 2
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 _ll_put 0x7f2360006ac0 0x1 6 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 1
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode on 0x1.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 inode.put on 0x7f2360006ac0 0x1.head now 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 remove_cap mds.0 on 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=pLsXsFs(0=pLsXsFs) COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 15 client.7159 remove_cap last one, closing snaprealm 0x7f23600069e0
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 put_snap_realm 0x1 0x7f23600069e0 1 -> 0
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 put_inode deleting 0x1.head(faked_ino=0 ref=0 ll_ref=0 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2020-06-25T00:21:57.143098+0000 mtime=2020-06-25T00:21:57.143098+0000 ctime=2020-06-25T00:22:10.021482+0000 caps=- COMPLETE has_dir_layout 0x7f2360006ac0)
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 flush_caps_sync
2020-06-25T00:22:10.647+0000 7f23876931c0 10 client.7159 wait_sync_caps want 1 (last is 1, 0 total flushing)
2020-06-25T00:22:10.647+0000 7f23876931c0 20 client.7159 trim_cache size 0 max 16384
2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 _close_mds_session mds.0 seq 0
2020-06-25T00:22:10.647+0000 7f23876931c0  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:6834/4194277431,v1:172.21.15.31:6835/4194277431] -- client_session(request_close) v4 -- 0x555661c38c00 con 0x555661c34c00
2020-06-25T00:22:10.647+0000 7f23876931c0  2 client.7159 waiting for 1 mds session(s) to close (timeout: 30s)
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 msgr2=0x555661c2b400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x555661c1be90 0x555661c2b400 secure :-1 s=READY pgs=1093 cs=0 l=1 rev1=1 rx=0x7f2368002f30 tx=0x7f2368013000).stop
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] conn(0x555661c1c830 0x7f2350003f30 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] conn(0x7f2350004490 0x7f2350006920 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 --2- 192.168.0.2:0/1017137669 >> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] conn(0x7f2350007040 0x7f23500094e0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0).connect
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3300/0,v1:172.21.15.12:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009c10 con 0x7f2350007040
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009d80 con 0x7f2350004490
2020-06-25T00:22:40.395+0000 7f235d7fa700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.12:3301/0,v1:172.21.15.12:6790/0] -- mon_getmap magic: 0 v1 -- 0x7f2350009f40 con 0x555661c1c830
2020-06-25T00:22:40.646+0000 7f23876931c0  1 client.7159 1 mds(s) did not respond to session close -- timing out.

/ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176692/remote/smithi031/log/ceph-client.1.53480.log.gz

Client timed out the close operation within a second. Looks like this wait_for is buggy.

@vshankar
Copy link
Contributor Author

Client timed out the close operation within a second. Looks like this wait_for is buggy.

How is that a second? Wait started at

2020-06-25T00:22:10.647+0000 7f23876931c0 2 client.7159 waiting for 1 mds session(s) to close (timeout: 30s)

timed out at:

2020-06-25T00:22:40.646+0000 7f23876931c0 1 client.7159 1 mds(s) did not respond to session close -- timing out.

i.e., from 00:22:10.647 to 00:22:40.646 -- that's 30 seconds -- shutdown timeout.

I'll take a look at the test anyway.

@batrick
Copy link
Member

batrick commented Jun 26, 2020

Oh, I need new glasses! I'll take another look too. Odd.

@batrick
Copy link
Member

batrick commented Jun 26, 2020

So the next thing the client does after getting a client_reply from the MDS is renew caps:

2020-06-25T00:22:10.391+0000 7f235ffff700 10 client.7159 renew_caps()
2020-06-25T00:22:10.391+0000 7f235ffff700 15 client.7159 renew_caps requesting from mds.0
2020-06-25T00:22:10.391+0000 7f235ffff700 10 client.7159 renew_caps mds.0
2020-06-25T00:22:10.391+0000 7f235ffff700  1 -- 192.168.0.2:0/1017137669 --> [v2:172.21.15.31:6834/4194277431,v1:172.21.15.31:6835/4194277431] -- client_session(request_renewcaps seq 2) v4 -- 0x7f2340001d10 con 0x555661c34c00

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

@vshankar
Copy link
Contributor Author

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

Ummm... maybe include this pr in your subsequent run to see if it happens again.

@batrick
Copy link
Member

batrick commented Jun 30, 2020

That message was never received by the MDS. This appears to be a networking issue, possibly caused by mnt.0 getting torn down along with its network bridge. I don't see how that's possible though.

Ummm... maybe include this pr in your subsequent run to see if it happens again.

Sure. I've an issue for this now: https://tracker.ceph.com/issues/46282

@batrick
Copy link
Member

batrick commented Jul 4, 2020

retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants