-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Fix intermittent issue in scheduler_tests #6171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Don't clear `stopRequested` and `stopWhenEmpty` at the top of `serviceQueue`, as this results in a race condition: on systems under heavy load, some of the threads only get scheduled on the CPU when the other threads have already finished their work. This causes the flags to be cleared post-hoc and thus those threads to wait forever. The potential drawback of this change is that the scheduler cannot be restarted after being stopped (an explicit reset would be needed), but we don't use this functionality anyway.
e5ea8a0 to
bdcf5de
Compare
|
utACK. That's what I get for trying to be too clever.... |
bdcf5de Fix intermittent hang issue in scheduler_tests (Wladimir J. van der Laan)
|
FYI, I'm still seeing intermittent deadlocks in |
|
Hmmm, I'm only seeing occasional deadlocks in Here's what I found: % eu-strip -f .../bin/test_bitcoin.debug .../bin/test_bitcoin
% # Run test_bitcoin in another shell and wait for it to hang
% gdb -s .../bin/test_bitcoin.debug .../bin/test_bitcoin <PID>
...
Reading symbols from .../bin/test_bitcoin...Reading symbols from .../bin/test_bitcoin.debug...done.
done.
Attaching to program: .../bin/test_bitcoin, process <PID>
...
(gdb) bt
#0 0x00007fa0334e308f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fa0359f8a9b in boost::condition_variable::wait (this=0x7fa036fde358, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
#2 0x00007fa034e455ac in boost::thread::join_noexcept (this=this@entry=0x7fa036fd9180) at libs/thread/src/pthread/thread.cpp:312
#3 0x00007fa0359f93e6 in join (this=0x7fa036fd9180) at /usr/include/boost/thread/detail/thread.hpp:756
#4 boost::thread_group::join_all (this=this@entry=0x7ffe608fbf90) at /usr/include/boost/thread/detail/thread_group.hpp:118
#5 0x00007fa0359f6c7e in scheduler_tests::manythreads::test_method (this=this@entry=0x7ffe608fcc87) at test/scheduler_tests.cpp:109
#6 0x00007fa0359f70ce in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:43
#7 0x00007fa035970bf7 in invoke<void (*)()> (this=<optimized out>, f=<optimized out>) at /usr/include/boost/test/utils/callback.hpp:56
#8 boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=<optimized out>)
at /usr/include/boost/test/utils/callback.hpp:89
#9 0x00007fa0349ed8f1 in operator() (this=<optimized out>) at ./boost/test/utils/callback.hpp:118
#10 operator() (this=<optimized out>) at ./boost/test/impl/unit_test_monitor.ipp:41
#11 invoke<boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<> > > (this=<optimized out>, f=...)
at ./boost/test/utils/callback.hpp:42
#12 boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke (this=<optimized out>) at ./boost/test/utils/callback.hpp:89
#13 0x00007fa0349c46de in operator() (this=0x7ffe608fd7d0) at ./boost/test/utils/callback.hpp:118
#14 do_invoke<boost::scoped_ptr<boost::detail::translate_exception_base>, boost::unit_test::callback0<int> > (F=..., tr=...)
at ./boost/test/impl/execution_monitor.ipp:281
#15 boost::execution_monitor::catch_signals (
this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
at ./boost/test/impl/execution_monitor.ipp:885
#16 0x00007fa0349c4f33 in boost::execution_monitor::execute (
this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
at ./boost/test/impl/execution_monitor.ipp:1211
#17 0x00007fa0349eda05 in boost::unit_test::unit_test_monitor_t::execute_and_translate (
this=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, tc=...)
at ./boost/test/impl/unit_test_monitor.ipp:69
#18 0x00007fa0349d44af in boost::unit_test::framework_impl::visit (this=0x7fa034c2d8c0 <boost::unit_test::(anonymous namespace)::s_frk_impl()::the_inst>,
tc=...) at ./boost/test/impl/framework.ipp:156
#19 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
#20 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
#21 0x00007fa0349cf9fa in boost::unit_test::framework::run (id=1, id@entry=4294967295, continue_test=continue_test@entry=true)
at ./boost/test/impl/framework.ipp:442
#22 0x00007fa0349eb697 in boost::unit_test::unit_test_main (init_func=<optimized out>, argc=<optimized out>, argv=<optimized out>)
at ./boost/test/impl/unit_test_main.ipp:185
#23 0x00007fa03314fb45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#24 0x00007fa035957b7e in _start ()
(gdb) generate-core-file
warning: Memory read failed for corefile section, 8192 bytes at 0x7ffe6097a000.
Saved corefile core.<PID>
...It looks like it's hanging on a join, but I don't know why. See Core file (dumped from |
|
Yes, could be that there is another deadlock. Haven't noticed one myself yet, though, not even on travis. |
Don't clear
stopRequestedandstopWhenEmptyat the top ofserviceQueue, as this results in a race condition: on systems under heavy load, some of the threads only get scheduled on the CPU when the other threads have already finished their work. This causes the flags to be cleared post-hoc and thus those threads to wait forever.The potential drawback of this change is that the scheduler cannot be restarted after being stopped (an explicit
resetmethod would be needed), but we don't use this functionality anyway.