-
Notifications
You must be signed in to change notification settings - Fork 214
Signal propagation by the ros2 run verb #895
Description
Bug report
Required Info:
- Operating System:
- Ubuntu 22.04 Jammy
- Installation type:
- ROS 2 Humble binaries installed from http://packages.ros.org/ros2/ubuntu
- Version or commit hash:
$ dpkg -l | grep ros2cli ii ros-humble-ros2cli 0.18.9-1jammy.20240217.070501 amd64 Framework for ROS 2 command line tools. ii ros-humble-ros2cli-common-extensions 0.1.1-4jammy.20240217.081520 amd64 Meta package for ros2cli common extensions - DDS implementation:
- Fast-RTPS and Cyclone
- Client library (if applicable):
- N/A
Steps to reproduce issue
#!/bin/bash
# Launch a child process (e.g. in a shell script) with ros2 run and remember its PID.
ros2 run demo_nodes_cpp talker &
PID=$!
# Do something else, start more processes, or just wait a bit.
sleep 5
# Try to kill the background node and wait.
kill $PID # same as kill -TERM $PID
waitExpected behavior
The talker process should cleanly terminate, as if the signal was sent directly to the child process. Some processes may have to perform some cleanup work, like close files, log something etc.
If I do not use ros2 run and launch the node executable directly, everything does work as expected:
#!/bin/bash
# Launch a child process (e.g. in a shell script) by invoking the executable directly and remember its PID.
/opt/ros/humble/lib/demo_nodes_cpp/talker &
PID=$!
# Do something else, start more processes, or just wait a bit.
sleep 5
# Try to kill the background node and wait.
kill $PID # same as kill -TERM $PID
waitWith this variant, the talker process terminates cleanly, then wait returns.
The shell script is only an exemplary example - there are other cases where a supervisor process monitors its children and expects them to terminate after sending a TERM signal, e.g. systemd or the Docker daemon. Other signals should also be propagated and may be needed by some processes, for example SIGPIPE or SIGWINCH or user signals SIGUSR1 and SIGUSR1.
Actual behavior
The SIGTERM signal is handled by the ros2 run Python process in the default way, and it terminates and detaches from its child talker, but the signal is not propagated and the node keeps running and talking. I did not expect that ros2 run does not replace itself by the launched process, like rosrun in ROS 1 did, apparently only for the sake of logging when the child process terminated and its exit status.
It would be possible to work around by sending the signal to all child processes of $PID (e.g. pkill -P $PID), or other techniques to send the signal to the actual process instead of the direct child process. Or use ROS 2 launch.
Pressing Ctrl-C in the same terminal before ros2 run process was terminated works, because the shell sends the SIGINT signal to all processes in its process group then, including the talker. After the talker was detached and the script finished, Ctrl-C has no effect anymore on the running node process.
A similar (undesired) effect when running in Docker:
$ docker run --rm -d osrf/ros:humble-desktop ros2 run demo_nodes_cpp talker
4f6054bf2eea3a32d9883d2717286ad4821411e96f19819923d051bfeaf8466d
$ docker stop 4f6054bf2eea3a32d9883d2717286ad4821411e96f19819923d051bfeaf8466d
[...]
# The container processes are forcefully killed after 10 seconds only by default.
# The initial SIGTERM signal sent by the docker daemon was not propagated to talker by ros2 run.While this works fine and docker stop returns after 1-2 seconds:
$ docker run --rm -d osrf/ros:humble-desktop /opt/ros/humble/lib/demo_nodes_cpp/talker
25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
$ docker stop 25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
$ Feature request
Feature description
ros2 runshould propagate all signals that can be handled to its child process, or at least SIGINT and SIGTERM.- Alternatively, it could use
os.execve()and its variants, to replace the current process by the child process, likerosrunin ROS 1 did.
- Alternatively, it could use
ros2 runprocess should never terminate and leave its child running detached.- In any case the two shell scripts above are expected to behave in exactly the same way.
Implementation considerations
I am not sure how other platforms, like Windows, are dealing with this and whether replacing the process with its child would work there.