Skip to content

mgr: refresh registered client list to monitor on mgr shutdown/respawn#33272

Closed
vshankar wants to merge 5 commits intoceph:masterfrom
vshankar:wip-43943
Closed

mgr: refresh registered client list to monitor on mgr shutdown/respawn#33272
vshankar wants to merge 5 commits intoceph:masterfrom
vshankar:wip-43943

Conversation

@vshankar
Copy link
Contributor

No description provided.

self.fs.mount(filesystem_name=self.fs_name.encode('utf-8'))
log.debug("Connection to cephfs '{0}' complete".format(self.fs_name))
self.mgr._ceph_register_client(self.fs.get_addrs())
r = self.mgr._ceph_register_client(self.fs.get_addrs())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick commit 969b660 adds a guard in _ceph_register_client() to deny registering a client when manager waits for ack from mon for the last sent beacon. between handling this failure and calling disconnect, the ack could arrive and the manager would either shutdown (exit(0)) or respawn (active -> standby). For the shutdown case its not a problem since sending the last beacon (and waiting for an ack) is executed from singal handler context, but, when respawning, manager modules could still be executing when manager waits for the last beacon ack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue even when shutting down from a signal handler -- a module could instantiate a cephfs connection (init, mount) and before registering the handles addrs, a termination signal would never send the just instantiated client addrs resulting in the same problem. As a workaround/fix, the whole fs init/mount and register should happen when signals are blocked.

For the respawn case (comment above) -- mgr could send a SIGTERM to iself and we could execv() from the signal handler (execv is a async-singnal-safe syscall).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think closing this race is possible without looking at this a different way. I see in 4bfa4fc that you removed a lot of the tear down code in MgrStandby::shutdown. This was de facto dead code due to 3363a10, by @liewegas?

Using the shutdown hook is the most attractive way to resolve this race because modules should stop all operations so the Mgr can then fire off the final beacon including all of the latest clients. Unfortunately I think that bridge has burned down with all of the havoc it caused with the dashboard. In particular, tearing down the Python interpreter "correctly" is just not worth the effort.

So, I would suggest we add another hook named stop which generically just says: "stop the module and tear down all your state". Or alternatively, we could call all the shutdown hooks but not do any other cleanup. What do you think @liewegas @tchaikov @LenzGr

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I posted this review but found it pending. Sorry for the delay...

self.fs.mount(filesystem_name=self.fs_name.encode('utf-8'))
log.debug("Connection to cephfs '{0}' complete".format(self.fs_name))
self.mgr._ceph_register_client(self.fs.get_addrs())
r = self.mgr._ceph_register_client(self.fs.get_addrs())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think closing this race is possible without looking at this a different way. I see in 4bfa4fc that you removed a lot of the tear down code in MgrStandby::shutdown. This was de facto dead code due to 3363a10, by @liewegas?

Using the shutdown hook is the most attractive way to resolve this race because modules should stop all operations so the Mgr can then fire off the final beacon including all of the latest clients. Unfortunately I think that bridge has burned down with all of the havoc it caused with the dashboard. In particular, tearing down the Python interpreter "correctly" is just not worth the effort.

So, I would suggest we add another hook named stop which generically just says: "stop the module and tear down all your state". Or alternatively, we could call all the shutdown hooks but not do any other cleanup. What do you think @liewegas @tchaikov @LenzGr

@vshankar
Copy link
Contributor Author

vshankar commented Mar 2, 2020

@batrick -- apologies for delay -- I'll get to this in a day or two...

@gregsfortytwo
Copy link
Member

@vshankar looks like this still needs some work?

@vshankar
Copy link
Contributor Author

@vshankar looks like this still needs some work?

right -- testing now -- will push when done.

@vshankar vshankar changed the title [WIP] mgr: refresh registered client list to monitor on mgr shutdown/respawn mgr: refresh registered client list to monitor on mgr shutdown/respawn Mar 30, 2020
@vshankar
Copy link
Contributor Author

vshankar commented Apr 6, 2020

@gregsfortytwo @tchaikov please take a look

@vshankar
Copy link
Contributor Author

vshankar commented Apr 8, 2020

a minor update addressing Patrick's comment to reorganize header file

vshankar added 5 commits May 5, 2020 02:02
`stop` service request will be invoked by manager during termination
or respawn. This is currently designed to stop dispatching commands
to modules with returning -ESHUTDOWN error code.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
cleanup helper threads (purge and async cloner) when stop service
request is invoked.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
beacon messages from manager to monitor carries a sequence
number which the monitor piggybacks in the beacon reply
messages. this machinary would be required to send and wait
on sequence number when manager restarts or respawns and
has to transmit and updated client list to the monitor for
blacklisting.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
.. and wait for beacon ack from monitor. The last beacon message
sends the updated client list to monitor for potential blacklisting.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar
Copy link
Contributor Author

vshankar commented May 5, 2020

@batrick please take a look

EDIT: update is just a rebase

@batrick batrick self-requested a review May 5, 2020 21:49
@vshankar
Copy link
Contributor Author

Need to resolve the test failures.

taking a look

@batrick
Copy link
Member

batrick commented May 22, 2020

this pr went through the cephfs suites without issue

@vshankar
Copy link
Contributor Author

vshankar commented May 25, 2020

this pr went through the cephfs suites without issue

The issue is seen when running rados:mgr suite which loads the selftest mgr plugin leading to mgr restart. ceph-mgr does not restart as expected due to the following sequence (seen on a vstart cluster via vstart_runner):

  • vstart_runner kills ceph mds
  • vstart_runner triggers mgr shutdown
  • mgr/volumes threads (purge/cloner) try to fetch entries for processing (requires instantiating cephfs filehandle)
  • Client::mount() tries to initiate connection with MDS
020-05-25T09:28:33.560-0400 7f6d3a4f4700  0 [volumes DEBUG mgr_util] CephFS mounting...
2020-05-25T09:28:33.560-0400 7f6d334e6700 10 client.0 ms_handle_connect on v2:192.168.0.5:40653/0
2020-05-25T09:28:33.564-0400 7f6d3a4f4700 10 client.464142 fetch_fsmap learned FSMap version 247
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 fetch_fsmap finished waiting for FSMap version 247
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 Subscribing to map 'mdsmap.1'
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 20 client.464142 trim_cache size 0 max 16384
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 20 client.464142 populate_metadata read hostname 'cephalopod-octo'
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 did not get mds through better means, so chose random mds -1
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 20 client.464142 mds is -1
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142  target mds.-1 not active, waiting for new mdsmap
2020-05-25T09:28:33.568-0400 7f6d334e6700  1 client.464142 handle_mds_map epoch 247
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 did not get mds through better means, so chose random mds 0
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 20 client.464142 mds is 0
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 _open_mds_session mds.0
2020-05-25T09:28:33.568-0400 7f6d3a4f4700 10 client.464142 waiting for session to mds.0 to open

  • At a later point, mount() times out (note, debugging "make_request=" log). For this test client_mount_timeout was set to a lower value (~5 seconds). The default for this config is 300 seconds.
2020-05-25T09:28:37.568-0400 7f6d34ce9700 20 client.464142 trim_cache size 0 max 16384
2020-05-25T09:28:38.568-0400 7f6d34ce9700 20 client.464142 trim_cache size 0 max 16384
020-05-25T09:28:38.568-0400 7f6d3a4f4700 10 client.464142 make_request=-110
2020-05-25T09:28:38.568-0400 7f6d3a4f4700  1 client.464142 shutdown
2020-05-25T09:28:38.568-0400 7f6d3a4f4700  2 client.464142 _close_mds_session mds.0 seq 0
2020-05-25T09:28:38.568-0400 7f6d3a4f4700  2 client.464142 waiting for 1 mds sessions to close
  • Client::shutdown initiates an MDS shutdown which probably does not have a timeout (unless there is a msgr timeout that kicks in, but the test starts failing at this point).
2020-05-25T09:28:38.568-0400 7f6d3a4f4700  2 client.464142 _close_mds_session mds.0 seq 0
2020-05-25T09:28:38.568-0400 7f6d3a4f4700  2 client.464142 waiting for 1 mds sessions to close
  • The mgr soft shutdown kicks in which is blocked on a lock that is held while mounting.
2020-05-25T09:28:38.644-0400 7f6d5cfea700  0 [volumes INFO volumes.fs.volume] shutting down
2020-05-25T09:28:38.644-0400 7f6d5cfea700  0 [volumes INFO mgr_util] shutting down

The shutdown part in Client makes me recall this tracker: https://tracker.ceph.com/issues/44276

Need to see if Client::shutdown can be effectively handled. Will post when I get something.

EDIT: Note that Client::shutdown() gets called after Client::mount() times out (i.e., call is not initiated by mgr/volumes).

@batrick
Copy link
Member

batrick commented May 26, 2020

Need to see if Client::shutdown can be effectively handled. Will post when I get something.

Is it as simple as just adding timeouts for the Client shutdown?

@vshankar
Copy link
Contributor Author

Is it as simple as just adding timeouts for the Client shutdown?

That's probably the safest way to go about this or some messenger config options that times out when no reply is received.

@sebastian-philipp
Copy link
Contributor

(just note that the MGR modules won't be cleaned up when the MGRs is shutting down. They'll be killed instead)

@vshankar
Copy link
Contributor Author

vshankar commented Jun 5, 2020

(just note that the MGR modules won't be cleaned up when the MGRs is shutting down. They'll be killed instead)

right -- this pr implements "soft shutdown" for plugins to perform minimal cleanup. Right now, even soft shutdown is pretty delicate. I'll post an update w/ the fixes.

@vshankar
Copy link
Contributor Author

vshankar commented Jun 8, 2020

Couple of issues with the current pr:

  • "soft shutdown" (calling ->stop() for each plugin) was leading to each module throwing "failed to load " in cluster log. Fixing this is straightforward and involves not setting couple of variables that hint the manager that the module has failed (note that, shutdown call sets these variables, however, soft shutdown need to set them).

  • mgr respawns when a module is enabled -- respawn happens during handling mgrmap (dispatcher). Before respawning we need to send updated client list to monitor (for blacklisting). The client list is sent via a beacon to the monitor. However, mgr is waiting for the beacon reply before it can respawn, but the beacon reply message is in the dispatch queue since the earlier message (mgrmap) is still under processing. We could have the beacon in a separate dispatcher but I guess that would not fix this either since there is a single dispatcher thread. We could respawn asynchronously (via finisher), but that might just open up races elsewhere.

@batrick @badone -- any thoughts?

@batrick
Copy link
Member

batrick commented Jun 8, 2020

* mgr respawns when a module is enabled -- respawn happens during handling mgrmap (dispatcher). Before respawning we need to send updated client list to monitor (for blacklisting). The client list is sent via a beacon to the monitor. However, mgr is waiting for the beacon reply before it can respawn, but the beacon reply message is in the dispatch queue since the earlier message (mgrmap) is still under processing. We could have the beacon in a separate dispatcher but I guess that would not fix this either since there is a single dispatcher thread. We could respawn asynchronously (via finisher), but that might just open up races elsewhere.

Ya I don't think this is a situation we can resolve in a satisfying way. We have a similar but unresolveable issue with the mgr getting removed via ceph mgr fail. The mgr cannot send a beacon the mon will read because it's been kicked out. I think it's acceptable to just fallback to normal timeout-based eviction for these scenarios.

However, that means we'll still see these warnings in QA. Maybe it's time to throw in the towel and whitelist the warning.

@vshankar
Copy link
Contributor Author

vshankar commented Jun 9, 2020

Right -- I'll close this pr for now and whitelist the warning. There are some (general) fixes in mgr/volumes though. I'll push a separate pr for that.

@vshankar vshankar closed this Jun 9, 2020
@badone
Copy link
Contributor

badone commented Jun 9, 2020

Maybe we could open these issues up to a broader discussion so we have maximum chance of getting helpful feedback? Could one of you send an email to the dev list outlining the issues and the problems with currently known solutions? At the least this will help to make the issue more broadly known and that, in itself, may lead to a more satisfying resolution sooner.

@vshankar
Copy link
Contributor Author

vshankar commented Jun 9, 2020

Maybe we could open these issues up to a broader discussion so we have maximum chance of getting helpful feedback? Could one of you send an email to the dev list outlining the issues and the problems with currently known solutions? At the least this will help to make the issue more broadly known and that, in itself, may lead to a more satisfying resolution sooner.

ACK -- I'll initiate a discussion in devel list.

vshankar added a commit to vshankar/ceph that referenced this pull request Jun 11, 2020
…mgr client getting backlisted""

This reverts commit a7994a0.

Failed attempt at solving the issue is in PR ceph#33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
smithfarm pushed a commit to smithfarm/ceph that referenced this pull request Jul 11, 2020
…mgr client getting backlisted""

This reverts commit a7994a0.

Failed attempt at solving the issue is in PR ceph#33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 60b8f7a)
smithfarm pushed a commit to smithfarm/ceph that referenced this pull request Jul 18, 2020
…mgr client getting backlisted""

This reverts commit a7994a0.

Failed attempt at solving the issue is in PR ceph#33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 60b8f7a)
ideepika pushed a commit to ideepika/ceph that referenced this pull request Aug 7, 2020
…mgr client getting backlisted""

This reverts commit a7994a0.

Failed attempt at solving the issue is in PR ceph#33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
ideepika pushed a commit to ideepika/ceph that referenced this pull request Sep 21, 2020
…mgr client getting backlisted""

This reverts commit a7994a0.

Failed attempt at solving the issue is in PR ceph#33272. Until we
find a clean solution for this, whiltelisting the warning is
probably the best thing for now.

Fixes: http://tracker.ceph.com/issues/43943
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix cephfs Ceph File System mgr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants