Conversation
959fe99 to
0cbce01
Compare
sloretz
left a comment
There was a problem hiding this comment.
I like the change to install signal handlers in rclpy.init() and uninstall them in rclpy.shutdown().
Something odd is this makes rclpy nodes not exit on receiving SIGTERM. Before, sending SIGTERM will cause the process to exit like this:
$ kill -SIGTERM 431136
$ ros2 run demo_nodes_py talker
[INFO] [1633734719.062607437] [talker]: Publishing: "Hello World: 0"
[INFO] [1633734720.050625060] [talker]: Publishing: "Hello World: 1"
...
[INFO] [1633734731.050637979] [talker]: Publishing: "Hello World: 12"
[ros2run]: Terminated
With this PR, sending SIGTERM does nothing. The talker executor keeps running.
There's no Python signal handler for SIGTERM, so no exception gets raised. Triggering the sigint/sigterm guard condition appears to only wake the executor; so rclpy.spin(...) keeps running. Maybe SIGTERM could additionally trigger a shutdown call on all of the Context instances?
rclpy/src/rclpy/signal_handler.cpp
Outdated
| "Stop triggering a guard condition when SIGINT occurs." | ||
| }, | ||
| {NULL, NULL, 0, NULL} /* sentinel */ | ||
| /// This should be updated to match signals.SignalHandlerOptions in python. |
There was a problem hiding this comment.
Nit, to avoid duplicating these values I would recommend defining the enum in Pybind11 like here:
rclpy/rclpy/src/rclpy/_rclpy_pybind11.cpp
Lines 55 to 59 in 6dd9540
And exposing that to Python like here:
Line 20 in 6dd9540
There was a problem hiding this comment.
Alternatively, pybind11 wrapping register_sigint_signal_handler and register_sigterm_signal_handler and doing the logic of void install_signal_handlers(int options) in the python method def install_signal_handlers(options) sounds like another good option as it would remove the need for a C version of this enum entirely
There was a problem hiding this comment.
sounds good, I will apply this
Thanks for noticing that, I didn't think about that issue.
Yes, that's an option. We could maybe have |
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
9be1a1d to
d69d0f2
Compare
|
Now when running the demo_nodes_py listener demo and sending sigterm the following exception is thrown: which seems like a reasonable error. I tried to use the python Maybe, we should be using an approach similar to rclcpp: having an extra thread waiting on a semaphore, trigger it in a signal safe fashion, and from that thread trigger guard conditions and shutdown contexts. |
That's interesting - the intended behavior is to release the GIL while the waitset is waiting. Is there another spot blocking? rclpy/rclpy/src/rclpy/wait_set.cpp Line 246 in 6f4e42c Another downside is the Pure python signal handler can only be unregistered from the same thread that registered it (#728 (comment)). Maybe that's less of a problem with this PR registering and unregistering it in |
Sorry, the GIL is not blocked. But the waitset is blocked waiting on a condition variable. In this case, it seems we really need a C signal handler to at least trigger a condition variable. I've used a python sigterm handler only for shutting down contexts, which is safer than doing that from C, though it adds more complexity. |
sloretz
left a comment
There was a problem hiding this comment.
demo_nodes_py talker segfaults when receiving SIGTERM. Is that a sign of an issue here?
$ /home/osrf/ws/ros2/install/demo_nodes_py/lib/demo_nodes_py/talker
[INFO] [1634342485.847740003] [talker]: Publishing: "Hello World: 0"
[INFO] [1634342486.836984592] [talker]: Publishing: "Hello World: 1"
[INFO] [1634342487.837171725] [talker]: Publishing: "Hello World: 2"
[INFO] [1634342488.836918431] [talker]: Publishing: "Hello World: 3"
[INFO] [1634342489.837115967] [talker]: Publishing: "Hello World: 4"
[INFO] [1634342490.837159863] [talker]: Publishing: "Hello World: 5"
[INFO] [1634342491.836732510] [talker]: Publishing: "Hello World: 6"
[INFO] [1634342492.836944020] [talker]: Publishing: "Hello World: 7"
Segmentation fault (core dumped)$ kill -SIGTERM 856298
Yes, I will test that. |
It seems that when I tested on Friday I was sending the signal to the ros2cli process launching the node instead that to the node itself. |
This reverts commit 515f0a3. Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
|
Force-pushed to fix DCO |
| if (g_signal_handler_installed.load()) { | ||
| trigger_guard_conditions(); | ||
| if (g_is_sigterm.exchange(false)) { | ||
| rclpy::shutdown_contexts(); |
There was a problem hiding this comment.
@ivanpauno why the difference in behavior between SIGINT and SIGTERM? AFAICS rclcpp treats them equally.
There was a problem hiding this comment.
SIGINT raises keyboard interrupt, so you don't need to asynchronously shutdown.
For SIGTERM, I need to do something else from the signal handler, the options are shutting down or raising an exception from there (which can safely be done from the python signal handler).
edit: I also didn't want to modify what we currently do when SIGINT is send.
Using only the python signal module is also not possible, see #830 (comment) and #830 (comment).
There was a problem hiding this comment.
SIGINT raises keyboard interrupt, so you don't need to asynchronously shutdown.
Does that ensure contexts are shutdown? That's what itches me. SIGTERM for sure does, SIGINT, I don't know.
There was a problem hiding this comment.
Does that ensure contexts are shutdown? That's what itches me. SIGTERM for sure does, SIGINT, I don't know.
No, it does not.
Context are not shutdown when SIGINT is signaled in rclpy, that was also the behavior before this PR.
We could maybe change that, shutting down contexts in both cases will probably not change behavior.
There was a problem hiding this comment.
Context are not shutdown when SIGINT is signaled in rclpy, that was also the behavior before this PR.
I see. I don't mind if we change it here or in a follow-up PR, but the difference in signal handling behavior between rclpy and rclcpp seems confusing to me for no obvious reason. WDYT?
There was a problem hiding this comment.
don't mind if we change it here or in a follow-up PR, but the difference in signal handling behavior between rclpy and rclcpp seems confusing to me for no obvious reason. WDYT?
Sounds good, I will try that in a follow-up PR, for the moment I will move forward with this one.
| rcl_ret_t ret = rcl_shutdown(c); | ||
| (void)ret; | ||
| } | ||
| } |
There was a problem hiding this comment.
@ivanpauno nit: any reason not to move both function and variables into the Context class (via static specifiers)?
There was a problem hiding this comment.
not really, it doesn't change anything though, so I will ignore the comment for the moment
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
|
As you hit on in the conversation above, I have one concern with the behavior of this PR, and it's that you can't handle SIGINT and SIGTERM the same within your Python application. If I make a test as follows: If I run this program, and send a SIGINT to the process, then the following happens: However if you send a SIGTERM, then the following unclean shutdown happens: I don't think it makes sense to require different shutdown logic between these signals - did you come to a resolution for which one of the behaviors it should be? |
I'm honestly not sure. Python raises a We could also raise a Maybe we could raise a different exception when sigterm happens, which would be pretty similar to the sigint case (e.g. The following code should work cleanly in both cases: import rclpy
from demo_nodes_py.topics.talker import Talker
def main():
rclpy.init()
node = Talker()
try:
rclpy.spin(node)
except KeyboardInterrupt:
print('Got SIGINT')
rclpy.shutdown()
except rclpy.executors.ExternalShutdownException:
print('Got SIGTERM')
print('Destroying')
node.destroy_node()
if __name__ == '__main__':
main() |
|
I think that the other alternative is to also shutdown asynchronously when sigint happens, so you don't really need to call rclpy.shutdown() in either case. IIRC @hidmic preferred this alternative. |
|
I think it makes sense for the signals that were configured as a shutdown event upon |
Yeah, this also makes sense to me. Then, in practice would we say a ROS 2 Python |
Exactly. |
Depends on #814.Equivalent to ros2/rclcpp#1771.