-
Notifications
You must be signed in to change notification settings - Fork 491
Closed
Labels
testsFailing or missing testsFailing or missing tests
Description
The executors unit test is being expanded in #1222, and running it consecutively hundreds of times shows that about 5-6% it will exit with a segfault across several different executors in several different functions. Unfortunately, simple tracing with gdb is not yielding much info.
Copying the results of the investigation here:
Here were all the instances of segfaults with duplicates.
TestExecutorsStable/SingleThreadedExecutor.addTemporaryNode
TestExecutorsStable/SingleThreadedExecutor.spinSome
TestExecutorsStable/SingleThreadedExecutor.spinSome
TestExecutorsStable/SingleThreadedExecutor.spinAll
TestExecutorsStable/SingleThreadedExecutor.spinAll
TestExecutorsStable/SingleThreadedExecutor.spinAll
TestExecutors/MultiThreadedExecutor.testSpinUntilSharedFutureComplete
TestExecutorsStable/MultiThreadedExecutor.addTemporaryNode
TestExecutorsStable/MultiThreadedExecutor.spinSome
TestExecutorsStable/MultiThreadedExecutor.spinSome
TestExecutorsStable/MultiThreadedExecutor.spinSome
TestExecutorsStable/MultiThreadedExecutor.spinAll
TestExecutorsStable/MultiThreadedExecutor.spinAll
TestExecutorsStable/MultiThreadedExecutor.spinAll
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilFutureComplete
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilFutureComplete
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilFutureCompleteNoTimeout
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilFutureCompleteNoTimeout
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilFutureCompleteWithTimeout
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilSharedFutureComplete
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilSharedFutureComplete
TestExecutors/StaticSingleThreadedExecutor.testSpinUntilSharedFutureComplete
Running valgrind on this test executable results in:
==2896250==
==2896250== HEAP SUMMARY:
==2896250== in use at exit: 12,390,587 bytes in 26,540 blocks
==2896250== total heap usage: 1,762,920 allocs, 1,736,380 frees, 178,520,908 bytes allocated
==2896250==
==2896250== LEAK SUMMARY:
==2896250== definitely lost: 5,496 bytes in 189 blocks
==2896250== indirectly lost: 155,231 bytes in 636 blocks
==2896250== possibly lost: 2,204,381 bytes in 2,155 blocks
==2896250== still reachable: 10,025,479 bytes in 23,560 blocks
==2896250== of which reachable via heuristic:
==2896250== multipleinheritance: 1,728 bytes in 9 blocks
==2896250== suppressed: 0 bytes in 0 blocks
==2896250== Rerun with --leak-check=full to see details of leaked memory
==2896250==
==2896250== For lists of detected and suppressed errors, rerun with: -s
==2896250== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
And a read error
==2896250== Thread 19:
==2896250== Invalid read of size 8
==2896250== at 0x6529FBB: node_listener(rmw_context_t*) (listener_thread.cpp:150)
==2896250== by 0x652C110: void std::__invoke_impl<void, void (*)(rmw_context_t*), rmw_context_t*>(std::__invoke_other, void (*&&)(rmw_context_t*), rmw_context_t*&&) (invoke.h:60)
==2896250== by 0x652BF86: std::__invoke_result<void (*)(rmw_context_t*), rmw_context_t*>::type std::__invoke<void (*)(rmw_context_t*), rmw_context_t*>(void (*&&)(rmw_context_t*), rmw_context_t*&&) (invoke.h:95)
==2896250== by 0x652BE0E: void std::thread::_Invoker<std::tuple<void (*)(rmw_context_t*), rmw_context_t*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (thread:244)
==2896250== by 0x652BD24: std::thread::_Invoker<std::tuple<void (*)(rmw_context_t*), rmw_context_t*> >::operator()() (thread:251)
==2896250== by 0x652BCD1: std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(rmw_context_t*), rmw_context_t*> > >::_M_run() (thread:195)
==2896250== by 0x5827CB3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==2896250== by 0x5956608: start_thread (pthread_create.c:477)
==2896250== by 0x5A92102: clone (clone.S:95)
==2896250== Address 0x609cba8 is 72 bytes inside a block of size 192 free'd
==2896250== at 0x4842A3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2896250== by 0x56DD14E: __default_deallocate (allocator.c:42)
==2896250== by 0x5658538: __cleanup_context (context.c:155)
==2896250== by 0x5657DA5: rcl_context_fini (context.c:59)
==2896250== by 0x4FB597A: __delete_context(rcl_context_t*) (context.cpp:179)
==2896250== by 0x4FC9AFD: std::_Sp_counted_deleter<rcl_context_t*, void (*)(rcl_context_t*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
==2896250== by 0x27C502: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:155)
==2896250== by 0x2750E2: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==2896250== by 0x2743C0: std::__shared_ptr<rcl_context_t, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==2896250== by 0x4FBBAB8: std::__shared_ptr<rcl_context_t, (__gnu_cxx::_Lock_policy)2>::reset() (shared_ptr_base.h:1287)
==2896250== by 0x4FB8EFE: rclcpp::Context::clean_up() (context.cpp:452)
==2896250== by 0x4FB6074: rclcpp::Context::init(int, char const* const*, rclcpp::InitOptions const&) (context.cpp:201)
==2896250== Block was alloc'd at
==2896250== at 0x4843D99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2896250== by 0x56DD1CD: __default_zero_allocate (allocator.c:56)
==2896250== by 0x565D282: rcl_init (init.c:88)
==2896250== by 0x4FB61B8: rclcpp::Context::init(int, char const* const*, rclcpp::InitOptions const&) (context.cpp:204)
==2896250== by 0x51EA08B: rclcpp::init(int, char const* const*, rclcpp::InitOptions const&) (utilities.cpp:34)
==2896250== by 0x29135F: TestExecutors<rclcpp::executors::StaticSingleThreadedExecutor>::SetUp() (test_executors.cpp:46)
==2896250== by 0x3C5D6B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2447)
==2896250== by 0x3B8176: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2483)
==2896250== by 0x36CA2D: testing::Test::Run() (gtest.cc:2517)
==2896250== by 0x36E12D: testing::TestInfo::Run() (gtest.cc:2703)
==2896250== by 0x36EE64: testing::TestCase::Run() (gtest.cc:2825)
==2896250== by 0x3883F8: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5216)
==2896250==
Unrelated to the segfaults, these stderr messages are also outputted.
/home/brawner/workspace/ros2_master/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/listener_thread.cpp":__function__:157"__rmw_take failed"
Metadata
Metadata
Assignees
Labels
testsFailing or missing testsFailing or missing tests