Store graph listener inside the context instead of the node graph#2952
Conversation
start_if_not_started in GraphListenerstart_if_not_started in GraphListener
|
Seems that the one failing test |
|
Why the new interface, would it not be sufficient to move the lock down in start_if_not_started() ? |
My original thinking was that the behavior I factored out seemed like it could be its own functionality and that If others agree that the check for |
start_if_not_started in GraphListenershutdown_mutex_ in GraphListener::start_if_not_started
|
I guess the shutdown handler needs to be registered after the start. If not, it could be called (In the super race from hell) before the listener thread is started and you'll end up in a weird state. |
|
ok, after speaking with @wjwwood at today's client library WG meeting, I think we've got a solid idea for how to move forward. Rather than having the |
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
06b4b29 to
0440c31
Compare
shutdown_mutex_ in GraphListener::start_if_not_started|
Along with the known flaky test, the only other failing test in CI appears to be another extremely rare flake. from my clean rolling workspace: |
|
Pulls: #2952 |
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
|
@skyegalaxy thanks for fixing this up. lgtm with just a nit, which is just comment block. i do not think we need to re-run the CI after the comment fix. |
Done! 0cbaecf |
Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
7e70733 to
0cbaecf
Compare
|
@fujitatomoya thanks for reviewing and merging my PR! Since we caught this in Jazzy, Is this worth backporting to older distros as well? |
|
@Mergifyio backport kilted jazzy |
✅ Backports have been createdDetails
|
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> (cherry picked from commit 715a983) # Conflicts: # rclcpp/src/rclcpp/context.cpp
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> (cherry picked from commit 715a983) # Conflicts: # rclcpp/src/rclcpp/context.cpp
…s2#2952) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
) * factor shutdown hook registration into its own method Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * run ament_uncrustify Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * consistent use of is_shutdown in graph-listener Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * this doesn't actually need to be a separate function Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * whitespace Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * fix typo Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * move graph listener into the context Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * make cpplint happy Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> * Update rclcpp/src/rclcpp/context.cpp Signed-off-by: Skyler Medeiros <smedeiros@irobot.com> --------- Signed-off-by: Skyler Medeiros <smedeiros@irobot.com>
Description
This PR fixes a lock order inversion bug detected by Thread Sanitizer between
GraphListener::shutdown_mutex_andContext::on_shutdown_callbacks_mutex_, by storing and instantiating theGraphListenerinside of theContextinstead ofNodeGraph. With this change, we avoid locking the context weak pointer and inverting the order of lock acquisition on startup and shutdown.Usage of
is_shutdown_->load()was also replaced with the existingis_shutdown()method inGraphListenerfor consistency.Fixes # (2946)
Is this user-facing behavior change?
no
Did you use Generative AI?
no
Additional Information
This code appears to have remained unchanged for a long time, and I was able to reproduce this tsan error on iron, jazzy and rolling with this example. Tsan identified a lock order inversion as well as one potential data race in other parts of the code. With this change, the lock order inversion warning is gone, no additional tsan warnings were introduced, and all unit tests passed locally when these changes were built against both
jazzyandrolling.