Skip to content

Signal propagation by the ros2 run verb #895

@meyerj

Description

@meyerj

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04 Jammy
  • Installation type:
  • Version or commit hash:
    $ dpkg -l | grep ros2cli
    ii  ros-humble-ros2cli                                0.18.9-1jammy.20240217.070501               amd64        Framework for ROS 2 command line tools.
    ii  ros-humble-ros2cli-common-extensions              0.1.1-4jammy.20240217.081520                amd64        Meta package for ros2cli common extensions
    
  • DDS implementation:
    • Fast-RTPS and Cyclone
  • Client library (if applicable):
    • N/A

Steps to reproduce issue

#!/bin/bash
# Launch a child process (e.g. in a shell script) with ros2 run and remember its PID.
ros2 run demo_nodes_cpp talker &
PID=$!

# Do something else, start more processes, or just wait a bit.
sleep 5

# Try to kill the background node and wait.
kill $PID        # same as kill -TERM $PID
wait

Expected behavior

The talker process should cleanly terminate, as if the signal was sent directly to the child process. Some processes may have to perform some cleanup work, like close files, log something etc.

If I do not use ros2 run and launch the node executable directly, everything does work as expected:

#!/bin/bash
# Launch a child process (e.g. in a shell script) by invoking the executable directly and remember its PID.
/opt/ros/humble/lib/demo_nodes_cpp/talker &
PID=$!

# Do something else, start more processes, or just wait a bit.
sleep 5

# Try to kill the background node and wait.
kill $PID        # same as kill -TERM $PID
wait

With this variant, the talker process terminates cleanly, then wait returns.

The shell script is only an exemplary example - there are other cases where a supervisor process monitors its children and expects them to terminate after sending a TERM signal, e.g. systemd or the Docker daemon. Other signals should also be propagated and may be needed by some processes, for example SIGPIPE or SIGWINCH or user signals SIGUSR1 and SIGUSR1.

Actual behavior

The SIGTERM signal is handled by the ros2 run Python process in the default way, and it terminates and detaches from its child talker, but the signal is not propagated and the node keeps running and talking. I did not expect that ros2 run does not replace itself by the launched process, like rosrun in ROS 1 did, apparently only for the sake of logging when the child process terminated and its exit status.

It would be possible to work around by sending the signal to all child processes of $PID (e.g. pkill -P $PID), or other techniques to send the signal to the actual process instead of the direct child process. Or use ROS 2 launch.

Pressing Ctrl-C in the same terminal before ros2 run process was terminated works, because the shell sends the SIGINT signal to all processes in its process group then, including the talker. After the talker was detached and the script finished, Ctrl-C has no effect anymore on the running node process.

A similar (undesired) effect when running in Docker:

$ docker run --rm -d osrf/ros:humble-desktop ros2 run demo_nodes_cpp talker
4f6054bf2eea3a32d9883d2717286ad4821411e96f19819923d051bfeaf8466d
$ docker stop 4f6054bf2eea3a32d9883d2717286ad4821411e96f19819923d051bfeaf8466d
[...]
# The container processes are forcefully killed after 10 seconds only by default.
# The initial SIGTERM signal sent by the docker daemon was not propagated to talker by ros2 run.

While this works fine and docker stop returns after 1-2 seconds:

$ docker run --rm -d osrf/ros:humble-desktop /opt/ros/humble/lib/demo_nodes_cpp/talker
25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
$ docker stop 25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
25c535f175c6d976897e1e12507af9c94bddcb9fbca732d0dcaf65e31edac1cb
$ 

Feature request

Feature description

  • ros2 run should propagate all signals that can be handled to its child process, or at least SIGINT and SIGTERM.
  • ros2 run process should never terminate and leave its child running detached.
  • In any case the two shell scripts above are expected to behave in exactly the same way.

Implementation considerations

I am not sure how other platforms, like Windows, are dealing with this and whether replacing the process with its child would work there.

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions