[Humble] Revert "Add diagnostics (backport #820) (#922)"#1015
[Humble] Revert "Add diagnostics (backport #820) (#922)"#1015bmagyar merged 1 commit intoros-controls:humblefrom
Conversation
This reverts commit dca10f4. Signed-off-by: Joe Schornak <joe.schornak@gmail.com>
|
If this is just temporary, then we should just add a protector for this. This changes the user's interface and we should do this twice. Is there another solution? |
|
Given that things seem to be moving fairly fast on the referenced rclcpp PR, I think the simplest for us is to sit this one out. Syncs on the buildfarm don't happen overnight unless it's literally the day before the sync ;) |
While the rclcpp folks have started reviewing ros2/rclcpp#2183, they've also said that there isn't a firm timeline on when they'll be able to get it in because they're focusing on the Iron release for the next few weeks. The issue this PR resolves is a critical failure and I think it's important to get some kind of fix out even if it means users have to build the package from source. Without this fix there is a random chance of the controller manager segfaulting while the robot is running, and as the system runs longer and longer the chance that the segfault will happen eventually gets higher and higher. For the robots in our lab we see segfaults about once per day per robot without this fix.
Tracking down every place where the controller state is accessed and wrapping it in a mutex lock would be much more time-intensive and invasive than reverting the problematic commit. More broadly, I think the diagnostic publisher feature should not have been backported to Humble, since it both breaks ABI and exposes a thread race condition that causes a segfault. I don't think it's worth keeping the diagnostic publisher if it causes my robots to crash every day. |
|
Points taken. Thanks for the through feedback! We do intend to have some breaking changes in humble from time to time since we'd really like it to be a feature-solid LTS so let's not rule diagnostics out completely :) |
Thank you!
Understood and definitely reasonable -- I was talking to some of my colleagues today about how to achieve this type of balance. |
|
@bmagyar have you seen my post on how MoveIt approaches the LTS Humble? https://github.com/orgs/ros-planning/discussions/2190 I know in the past this project has been much more willing to break API on humble than MoveIt has. I wonder if we included ros2_control in our apt repo if it would make sense for this project to be more stable on humble and encourage users who want the latest features to use that instead? Then you wouldn't be releasing breaking changes into the official buildfarm and going against the guidelines of ROS for stability. Also, @schornakj, amazing work here getting this issue sorted from all the different angles. |
|
FYI @tonynajjar |
|
Humble release without diagnostics: ros/rosdistro#37294 |
|
Ugh, apologies for introducing this bug with the diagnostics PR, it felt to me like a harmless small improvement back then. I see you guys have this already figured out, if there is some way I can still help, please let me know. |
|
Forgot to mention that I put the release out straight away that day: ros/rosdistro#37294 |
This reverts commit dca10f4. Signed-off-by: Joe Schornak <joe.schornak@gmail.com>
This fixes the most frequent cause of the segfault described in #979.
In Humble accessing the state of a lifecycle node is not thread-safe, and because the diagnostic updater that was added in #922 gets the node state within a timer callback separately from the main
update()cycle, it eventually causes the controller manager to crash.Removing the diagnostic updater prevents this situation from happening. We can add it back in once this PR to resolve the underlying thread safety issue in Humble rclcpp has been merged.
This reverts commit dca10f4.