Bug #64184
opentest_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed
0%
Description
2024-01-25T03:01:00.979 INFO:tasks.notification_tests:Running bucket-notifications-tests...
2024-01-25T03:01:00.979 DEBUG:teuthology.orchestra.run.smithi046:bucket notification tests against different endpoints> BNTESTS_CONF=/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/bn-tests.client.0.conf /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/bin/python -m nose -s /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py -v -a kafka_test
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:*** Caught signal (Aborted) **
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: in thread 7f69ab974640 thread_name:kafka_manager
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: ceph version 19.0.0-814-g1a8bb77b (1a8bb77be00267ce596e60f3b1141a4463aab767) squid (dev)
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7f6add654db0]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 2: /lib64/libc.so.6(+0xa154c) [0x7f6add6a154c]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 3: raise()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 4: abort()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 5: /lib64/libc.so.6(+0x29130) [0x7f6add629130]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 6: /lib64/libc.so.6(+0x4daf7) [0x7f6add64daf7]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 7: /lib64/libc.so.6(+0xa7d18) [0x7f6add6a7d18]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send_to_completion_async(ceph::common::CephContext*, rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x558c16fa84e5]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0xef) [0x558c170cc1ff]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f6ade4776ef]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f6ade4ad862]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 12: rd_kafka_poll()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 13: (rgw::kafka::Manager::run()+0x350) [0x558c170ce5a0]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 14: /lib64/libstdc++.so.6(+0xdb924) [0x7f6addadb924]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 15: /lib64/libc.so.6(+0x9f802) [0x7f6add69f802]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 16: /lib64/libc.so.6(+0x3f450) [0x7f6add63f450]
Updated by Casey Bodley about 2 years ago
- Related to Bug #63314: kafka crashed during message callback in teuthology added
Updated by Casey Bodley about 2 years ago
Updated by Casey Bodley about 2 years ago
@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?
Updated by Yuval Lifshitz about 2 years ago
another crash trace from kafka test:
#0 0x00007f45788a154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007f4578854d06 in raise () from /lib64/libc.so.6
#2 0x00007f45788287f3 in abort () from /lib64/libc.so.6
#3 0x00007f457ba161f4 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) [clone .cold] () from /lib64/libtcmalloc.so.4
#4 0x00007f457ba1a7e3 in (anonymous namespace)::InvalidFree(void*) () from /lib64/libtcmalloc.so.4
#5 0x000055b1f503a2d2 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55b1f80cdf10) at /usr/include/c++/11/bits/shared_ptr_base.h:168
#6 0x000055b1f53ef08b in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#7 std::__shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#8 std::shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const>::~shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122
#9 std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::~basic_regex (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/regex.h:535
#10 rgw::parse_url_authority (url=..., host="localhost", user="", password="") at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_url.cc:33
#11 0x000055b1f553e8d9 in rgw::kafka::Manager::connect (this=0x55b1f71623c0, broker="localhost", url=..., use_ssl=<optimized out>, verify_ssl=<optimized out>, ca_location=..., mechanism=...)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:558
#12 0x000055b1f541b763 in rgw::kafka::connect (mechanism=..., ca_location=..., verify_ssl=true, use_ssl=<optimized out>, url="kafka://localhost", broker="localhost")
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:692
#13 RGWPubSubKafkaEndpoint::RGWPubSubKafkaEndpoint (_cct=<optimized out>, args=..., _topic=..., _endpoint="kafka://localhost", this=0x55b203d51c80)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:306
#14 RGWPubSubEndpoint::create (endpoint="kafka://localhost", topic=..., args=..., cct=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:394
#15 0x000055b1f540c0dc in rgw::notify::publish_commit (obj=0x55b1fe586f00, size=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="", event_type=rgw::notify::ObjectRemovedDelete, res=..., dpp=<optimized out>)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_notify.cc:1129
#16 0x000055b1f54b85ef in rgw::sal::RadosNotification::publish_commit (this=this@entry=0x55b204a14360, dpp=dpp@entry=0x55b200d52480, size=size@entry=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="")
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_sal_rados.cc:2850
#17 0x000055b1f529250f in RGWDeleteObj::execute (this=<optimized out>, y=...) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_op.cc:5398
#18 0x000055b1f512bca2 in rgw_process_authenticated (handler=<optimized out>, op=@0x7f4445fb93e0: 0x55b200d52480, req=<optimized out>, s=<optimized out>, y=..., driver=0x55b1f7f91c40, skip_retarget=false)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:255
#19 0x000055b1f512ec4d in process_request (penv=..., req=0x7f4445fba4a0, frontend_prefix=..., client_io=0x7f4445fba550, yield=..., scheduler=0x55b1f82a2458, user=0x7f4445fba870, latency=0x7f4445fba478, http_ret=0x7f4445fba474)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:389
#20 0x000055b1f59b9280 in (anonymous namespace)::handle_connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >(boost::asio::io_context&, RGWProcessEnv&, boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>&, rgw::basic_timeout_timer<ceph::coarse_mono_clock, boost::asio::any_io_executor, (anonymous namespace)::Connection>&, unsigned long, boost::beast::flat_static_buffer<65536ul>&, bool, ceph::async::SharedMutex<boost::asio::any_io_executor>&, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::system::error_code&, spawn::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::any_io_executor> >) [clone .constprop.0] (context=..., env=..., stream=..., timeout=..., header_limit=16384, buffer=..., pause_mutex=..., scheduler=0x55b1f82a2458, uri_prefix="", ec=..., yield=...,
is_ssl=false) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:290
#21 0x000055b1f5091fc4 in operator() (yield=..., __closure=0x55b20394c458) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:1061
#22 operator() (c=..., __closure=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/spawn/include/spawn/impl/spawn.hpp:390
#23 std::__invoke_impl<boost::context::continuation, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#24 std::__invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/bits/invoke.h:97
#25 std::invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/functional:98
#26 boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> >::run (fctx=<optimized out>, this=<optimized out>)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:160
#27 boost::context::detail::context_entry<boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> > >(boost::context::detail::transfer_t) (t=...)
at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:97
#28 0x000055b1f5a2d52f in make_fcontext ()
#29 0x0000000000000000 in ?? ()
Updated by Yuval Lifshitz about 2 years ago
Casey Bodley wrote:
@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?
amqp also has some unexplained failures. i have this PR: https://github.com/ceph/ceph/pull/55666
to run the http and basic tests before kafka and amqp, so when we get the expected failures we know that the rest of the tests were passing
Updated by Yuval Lifshitz about 2 years ago
similar crash, but with "Attempt to free invalid pointer" in tcmalloc:
2024-03-13T12:46:17.456 INFO:tasks.rgw.client.0.smithi007.stdout:src/tcmalloc.cc:333] Attempt to free invalid pointer 0x564200000000 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:*** Caught signal (Aborted) ** 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout: in thread 7f9fb6d43640 thread_name:io_context_pool 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
Updated by J. Eric Ivancich over 1 year ago
Apparently this is still occurring. Any thoughts, Yuval?
Updated by Yuval Lifshitz over 1 year ago
J. Eric Ivancich wrote in #note-7:
Apparently this is still occurring. Any thoughts, Yuval?
could you please link the run that has the crash?
Updated by Casey Bodley over 1 year ago
i'll start sharing links every time i see this happen. with more data, hopefully we can narrow it down to distro/versions
Updated by Casey Bodley over 1 year ago
spotted on quincy, centos 9 with librdkafka-1.6.1-102.el9.x86_64.rpm
Updated by Yuval Lifshitz over 1 year ago
in the latest, the tests that failed in the amqp test and the backtrace is showing a crash in amqp (not kafka):
/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/bin/python -m nose -s /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py -v -a amqp_test
2024-07-18T19:06:25.904 INFO:tasks.rgw.client.0.smithi105.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
2024-07-18T19:06:25.904 INFO:tasks.rgw.client.0.smithi105.stdout:*** Caught signal (Aborted) **
2024-07-18T19:06:25.904 INFO:tasks.rgw.client.0.smithi105.stdout: in thread 7f53b4ff9640 thread_name:amqp_manager
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: ceph version 17.2.7-1102-g9a202024 (9a2020246f85463d4fdb115c94aa40b463caa1d2) quincy (stable)
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 1: /lib64/libc.so.6(+0x3e6f0) [0x7f54b443e6f0]
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 2: /lib64/libc.so.6(+0x8b94c) [0x7f54b448b94c]
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 3: raise()
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 4: abort()
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 5: /lib64/libc.so.6(+0x29130) [0x7f54b4429130]
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 6: /lib64/libc.so.6(+0x371d7) [0x7f54b44371d7]
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 7: /lib64/libc.so.6(+0x92128) [0x7f54b4492128]
2024-07-18T19:06:25.905 INFO:tasks.rgw.client.0.smithi105.stdout: 8: (std::_Function_handler<void (int), RGWPubSubAMQPEndpoint::send_to_completion_async(ceph::common::CephContext*, rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0xae) [0x7f54b533a2be]
2024-07-18T19:06:25.906 INFO:tasks.rgw.client.0.smithi105.stdout: 9: (rgw::amqp::Manager::run()+0xbf6) [0x7f54b55cf0f6]
2024-07-18T19:06:25.906 INFO:tasks.rgw.client.0.smithi105.stdout: 10: /lib64/libstdc++.so.6(+0xdbad4) [0x7f54b48dbad4]
2024-07-18T19:06:25.906 INFO:tasks.rgw.client.0.smithi105.stdout: 11: /lib64/libc.so.6(+0x89c02) [0x7f54b4489c02]
2024-07-18T19:06:25.906 INFO:tasks.rgw.client.0.smithi105.stdout: 12: /lib64/libc.so.6(+0x10ec40) [0x7f54b450ec40]
2024-07-18T19:06:25.906 INFO:tasks.rgw.client.0.smithi105.stdout:2024-07-18T19:06:25.904+0000 7f53b4ff9640 -1 *** Caught signal (Aborted) **
also, looks like the crash is happening in the amqp thread, when invoking the callback that should unblock the waiter on the frontend thread
Updated by Yuval Lifshitz over 1 year ago
also note that this fix: https://tracker.ceph.com/issues/63314
was not backported to quincy. so, it is hard to tell the root cause there
Updated by Yuval Lifshitz over 1 year ago
- Status changed from New to In Progress
- Pull request ID set to 58765
Updated by J. Eric Ivancich over 1 year ago
- Status changed from In Progress to Pending Backport
- Backport set to reef, squid
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67309: squid: test_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67310: reef: test_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 202258faa3524928397817236727cc5040297ff6
- Fixed In set to v19.3.0-3907-g202258faa35
- Upkeep Timestamp set to 2025-07-09T14:05:32+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3907-g202258faa35 to v19.3.0-3907-g202258faa3
- Upkeep Timestamp changed from 2025-07-09T14:05:32+00:00 to 2025-07-14T17:41:26+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2321
- Upkeep Timestamp changed from 2025-07-14T17:41:26+00:00 to 2025-11-01T00:58:12+00:00