Skip to content

Add ignore signal exit handler#27

Merged
jacquelinekay merged 12 commits intomasterfrom
ignore_signal
Apr 29, 2016
Merged

Add ignore signal exit handler#27
jacquelinekay merged 12 commits intomasterfrom
ignore_signal

Conversation

@jacquelinekay
Copy link
Copy Markdown
Contributor

This is needed for the C services tests. To test remote services, the service process must stick around until the client receives a response from the service. Rather than introducing signal handling in the rcl tests, we will instead make the service process block forever and use the exit handler introduced by this PR to ignore only SIGINT or SIGTERM in the service process when it is interrupted. I opted not to use the ignore all return codes exit handler in case there are other errors in the service handler.

@jacquelinekay jacquelinekay added the in progress Actively being worked on (Kanban column) label Apr 20, 2016
@jacquelinekay jacquelinekay self-assigned this Apr 20, 2016
@jacquelinekay jacquelinekay added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 20, 2016
@jacquelinekay
Copy link
Copy Markdown
Contributor Author

@dirk-thomas I think I've implemented your suggestions but with one thing that seems like a bit of a hack--when the process is killed from SIGINT, task_state.returncode is -2, while signal.SIGINT is defined as 2. So when I check that the task_state.returncode is equal to SIGINT I check the absolute value. Not sure if there's a better way.

@dirk-thomas
Copy link
Copy Markdown
Member

For subprocesses the minus will come from here: https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.returncode

I don't think the comparison should use abs because then signals as well as user defined return codes are not distinguishable. We should come up with a consistent conventions what external programs are supposed to return as a user return code to ensure that the signals are distinguishable.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

I'm not sure we can do much better than checking the return codes defined in the signal module and making sure that our code and user code doesn't define return codes that overlap with system-defined return codes (e.g. SIGINT and SIGKILL). I guess that in launch the processes will always be child processes, so in Unix the return codes for tasks will always be negative. Are you suggesting that I improve the logic of the ignore_signal_exit_handler or are you implying there should be a larger change?

https://github.com/ros2/launch/pull/27/files#diff-c102f8d6ec360dfe1b5780f069b1231cR100

@dirk-thomas
Copy link
Copy Markdown
Member

I am arguing that we need to clarify what a return code can / should be. What values are the executables supposed to return in case of user error and signals. At least in the case where we want to test for signals we need to distinguish them.

Based on your comment ("on Unix the return codes for tasks will always be negative") there should be no need for using abs in this patch.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

I've updated the ignore_signal exit handler so that the return code is ignored if the process gets -SIGTERM on Windows or -SIGINT on Linux.

The new exit handler is needed in ros2/rcl#48 to make the services test stable in rcl. I can also make a unit test for the exit handler in test_launch if it is needed to get this merged.

@dhood
Copy link
Copy Markdown
Member

dhood commented Apr 25, 2016

I wanted to point out that test_communication is using a return code of 2 here which might conflict with use of signal.SIGINT

at the moment there's no problem since it doesn't use this exit handler, but if we decide it's best practice not to use return codes in a particular range, that's an example of a test that we should remember to update

@dirk-thomas
Copy link
Copy Markdown
Member

Why should SIGTERM be only checked for on Windows? The code doesn't distinguish the platform anywhere.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

According to several places in the code, sending SIGINT to a process is not supported on Windows. So processes that are terminated with SIGINT on Unix end up receiving SIGTERM on Windows instead.

https://github.com/ros2/launch/blob/master/launch/launch/launcher.py#L211-L222

@dirk-thomas
Copy link
Copy Markdown
Member

The comment states that SIGINT is not supported on Windows and only send to processes on non-Windows.

But SIGTERM is later send to all remaining subprocesses independent of the platform:

# sending SIGTERM to remaining processes
for index in all_futures.values():
p = self.task_descriptors[index]
if 'protocol' in dir(p):
if not p.protocol.exit_future.done():
self._process_message(p, 'signal SIGTERM')
try:
p.transport.send_signal(signal.SIGTERM)
except ProcessLookupError:
pass

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

I thought about it some more and I think ignoring SIGTERM is OK. Feels like I'm running in circles... f318973

@tfoote
Copy link
Copy Markdown
Contributor

tfoote commented Apr 26, 2016

From discussion: It would be valuable to extend the API such that the handler will have additional information about whether the launcher sent a signal to the process so that if the SIGINT was sent from the launcher, it can be ignored.

It would also be valuable to potentially change the RMW and RCL error enum values to deduplicate return/exit codes for the future. Pick number over at least the standard 15-20. Possibly reserve specific ranges for user return/exit codes.

@dhood
Copy link
Copy Markdown
Member

dhood commented Apr 26, 2016

for reference, this is what I was referring to about there already being something checking if the process was in teardown in the ros2/examples rclcpp_examples tests: https://github.com/ros2/launch/blob/master/launch/launch/exit_handler.py#L30

@dirk-thomas
Copy link
Copy Markdown
Member

dirk-thomas commented Apr 26, 2016

The default_exit_handler lets the first exiting process (which uses this exit handler) determine the return code of launch.

With the additional information mentioned above it will be possible to more granular decide if the return code of the other processes should override the return code of launch.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

jacquelinekay commented Apr 26, 2016

eaf87de is a start for making the logic of this exit handler a bit smarter, adding a terminated_from_launch boolean to TaskState which is true if launch sent SIGINT or SIGKILL to the task on teardown.

To make it even more granular I could record which signal was sent to terminate the process (e.g. SIGINT or SIGKILL).

@dirk-thomas
Copy link
Copy Markdown
Member

The boolean flag doesn't allow to filter the return code in the error handler yet. Please record a list of signals which have been sent to the process instead.

if task_state.terminated_from_launch:
sigint_received = signal.SIGINT in context.task_state.signals_received
sigterm_received = signal.SIGTERM in context.task_state.signals_received
if sigint_received or sigterm_received:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not replace the three lines with if p.task_state.signals_received:?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I wrote it like this to be extra specific in case we end up sending more signals to tasks in the future. But I guess it's okay to say that the ignore_signal_exit_handler is for ignoring all kinds of signals sent to the task by launch.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

ok, in 7882939 the new field is changed to signals_received, a list that accumulates the enums defined by the signal module when send_signal is called in launcher.py.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

I made a branch in rcl with the same name as this one that is identical to ros2/rcl#48 and launched CI:

http://ci.ros2.org/job/ci_linux/1238/
http://ci.ros2.org/job/ci_osx/992/
http://ci.ros2.org/job/ci_windows/1276

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

The pyflakes failure in CI is fixed in 65a957c.

@jacquelinekay
Copy link
Copy Markdown
Contributor Author

Any thoughts on the current state of ignore_signal_exit_handler?

@dirk-thomas
Copy link
Copy Markdown
Member

lgtm

@jacquelinekay jacquelinekay merged commit 0274104 into master Apr 29, 2016
@jacquelinekay jacquelinekay deleted the ignore_signal branch April 29, 2016 22:45
@jacquelinekay jacquelinekay removed the in review Waiting for review (Kanban column) label Apr 29, 2016
wjwwood pushed a commit that referenced this pull request Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants