Conversation
|
@ivanpauno may I ask for a review of this one, especially since changes to handle sigterm like sigint are probably coming? |
ivanpauno
left a comment
There was a problem hiding this comment.
LGTM!
I'm worried about signal-safety of some parts of the code, e.g. I don't think that triggering a guard condition is necessarily signal-safe, but that's preexistent code.
| rcl_guard_condition_t ** old_gcs = g_guard_conditions.exchange(new_gcs); | ||
| if (NULL != old_gcs) { | ||
| allocator.deallocate(old_gcs, allocator.state); | ||
| } |
There was a problem hiding this comment.
I know this is preexistent code, but why are we using atomics here?
If this is for thread-safety, it doesn't look okay.
We're maybe alraedy taking a mutex in the python code, if that's the case this is okay.
If not we should be taking a global mutex here.
There was a problem hiding this comment.
We're maybe alraedy taking a mutex in the python code, if that's the case this is okay.
There is a global mutex in the Python code; it's holding onto the GIL. I think the atomics are to make it safe to be interrupted by the signal handler.
There was a problem hiding this comment.
There is a global mutex in the Python code; it's holding onto the GIL.
Yes, but the signal handler might get executed without the GIL locked, and that might be an issue here (main thread executing the signal handler and another thread deallocating the array).
I think we should be using the python signal module instead of C signal handlers, which only flags the main thread to run the signal handler later, avoiding most of this issues.
I would left this comment unaddressed here, as it's unrelated to this PR and I want to avoid conflicts with #830.
There was a problem hiding this comment.
Yes, but the signal handler might get executed without the GIL locked, and that might be an issue here (main thread executing the signal handler and another thread deallocating the array)
Ah, I didn't realize other threads kept running when a signal is received.
|
@sloretz friendly ping |
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@openrobotics.org>
959fe99 to
0cbce01
Compare
|
Windows failures re not in the most recent nightly Windows repeated job, so investigating these. OSX Failures are also all in the most recent nightlty OSX repeated job, so it's unlikely they're caused by this PR: https://ci.ros2.org/view/nightly/job/nightly_osx_repeated/2493/#showFailuresLink |
|
Windows CI failure is definitely not due to this PR. It's due to a client not getting a service response. I don't know why that is, but I opened ros2/launch_ros#273 to improve how the test responds to that case. |
|
@ivanpauno if this still looks good to you after the new commit, I think CI is good enough to merge. |
|
🕵️♂️ I think is causing test lots of test regressions in windows CI. DetailsIt seems that Windows can't find the correct path to Can I ask you to take a quick look? @sloretz You may have more context about what's going on. |
This uses pybind11 for the signal handler APIs. It replaces
rcutilsatomics with withstd::atomicbecause C and C++ atomics headers can't be mixed (until C++23?). Finally this deletes the now unused code for handle, pycapsule, and rclpy_common.Replaces #728
Closes #665 (finally 🎉)