Conversation
When multiple backports are concentrated into a single backport PR (a common practice), the script would only resolve the first one, which was an annoying shortcoming. In addition to implementing the feature, take the opportunity to pay some technical debt (put functions in alphabetical order, move code units into dedicated functions/methods, etc.). Signed-off-by: Nathan Cutler <ncutler@suse.com>
Fixes: https://tracker.ceph.com/issues/41738 Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Refuse to work on a backport that is * not in "New" or "Need More Info" status * targeting a release that is not among the GitHub active milestones Signed-off-by: Nathan Cutler <ncutler@suse.com>
When --milestones is provided, the script will: 1. determine the set of active milestones by querying GitHub API 2. for each active milestone, get all PRs targeting the milestone, regardless of actual milestone setting 3. for each PR targeting an active milestone, check that actual milestone setting matches the base branch 4. if a problem is detected, fix it and "flag" the PR 5. at the end, print a report summarizing any problems that were detected Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Before, the script was satisfied if the variable github_token was set to *any* value. With this commit, it is only satisfied if the GitHub API recognizes it as a valid token. Signed-off-by: Nathan Cutler <ncutler@suse.com>
When the script is run from outside the git clone, that's a sign that the user may be a new user who has not yet completed setup. Signed-off-by: Nathan Cutler <ncutler@suse.com>
It's very common for folks to be logged into their local machine as something other than their GitHub username. When github_user is not set, assume the user has not yet done setup: emit an error and explain how to get setup advice. Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Two of the variables used were not declared as local. Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
And really stop after cherry-pick phase. Signed-off-by: Nathan Cutler <ncutler@suse.com>
Added some small improvements Fixes: https://tracker.ceph.com/issues/42489 Signed-off-by: Laura Paduano <lpaduano@suse.com>
Fixes: https://tracker.ceph.com/issues/42507 Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
error: ‘class rocksdb::Env’ has no member named ‘SetAllowNonOwnerAccess’
opt.env->SetAllowNonOwnerAccess(false);
^~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ilsoo Byun <ilsoobyun@linecorp.com>
Otherwise, we get errors like 2019-10-26T17:29:42.004 INFO:tasks.workunit.client.0.mira109.stderr:+ sudo /usr/sbin/ceph-daemon shell -- ceph -v 2019-10-26T17:29:42.004 INFO:tasks.workunit.client.0.mira109.stderr:+ grep 'ceph version' 2019-10-26T17:29:42.149 INFO:tasks.workunit.client.0.mira109.stderr:the input device is not a TTY Fixes: https://tracker.ceph.com/issues/42499 Signed-off-by: Sage Weil <sage@redhat.com>
5b98503 to
c7ff770
Compare
* refs/pull/31181/head: ceph-daemon: only pass podman -it if need an interactive shell Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Michael Fritch <mfritch@suse.com>
* refs/pull/31201/head: doc: document rank option for journal reset Reviewed-by: Venky Shankar <vshankar@redhat.com>
* refs/pull/30369/head: doc/cephfs: add doc for cephfs io path Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
cmake: require RocksDB 5.14 or higher Reviewed-by: Kefu Chai <kchai@redhat.com>
…lestone ceph-backport.sh: implement --milestones feature and more-careful vetting Reviewed-by: Jan Fajerski <jfajerski@suse.com> Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com>
doc: Small update of SubmittingPatches-backports Reviewed-by: Nathan Cutler <ncutler@suse.com>
backport-resolve-issue: resolve multiple backport issues
mgr/dashboard: update dashboard CODEOWNERShip Reviewed-by: Lenz Grimmer <lgrimmer@suse.com> Reviewed-by: Nathan Cutler <ncutler@suse.com>
And really stop after cherry-pick phase. Signed-off-by: Nathan Cutler <ncutler@suse.com>
Instead of offering the user setup advice in the form of documentation,
provide an interactive setup routine that can be run any number of times.
This commit also deprecates the ~/bin/backport_common.sh file in favor
of a set of files:
~/.github_user
~/.github_token
~/.redmine_user_id
~/.redmine_key
(NOTE: The user need not care about these files. Their purpose is to save the
results of the interactive setup routine and make those available to future runs
of the script.)
Signed-off-by: Nathan Cutler <ncutler@suse.com>
This allows us to do things like verify that a backport cherry-picked cleanly:
ceph-backport.sh --cherry-pick-only --force [BACKPORT_TRACKER_ID]
(Currently, i.e. without --force, this fails because the backport is already in
progress.)
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Signed-off-by: Nathan Cutler <ncutler@suse.com>
smithfarm
pushed a commit
that referenced
this pull request
Feb 20, 2020
Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () #1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 #6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 #8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 #9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 ceph#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 ceph#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 ceph#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 ceph#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 ceph#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 ceph#15 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread #1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
smithfarm
pushed a commit
that referenced
this pull request
Mar 23, 2020
Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () #1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 #6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 #8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 #9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 ceph#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 ceph#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 ceph#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 ceph#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 ceph#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 ceph#15 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread #1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com> (cherry picked from commit 80da5f9) Conflicts: src/osd/OSD.cc in bool OSD::asok_command int OSD::shutdown void OSD::maybe_update_heartbeat_peers void OSD::_preboot void OSD::queue_want_up_thru void OSD::send_alive void OSD::send_failures void OSD::send_beacon MPGStats* OSD::collect_pg_stats void OSD::note_down_osd void OSD::consume_map void OSD::activate_map src/osd/OSD.h in private: dispatch_session_waiting - also use the new const OSDMapRef in places that no longer exist in master src/osd/OSD.cc in void OSDService::share_map void OSDService::send_incremental_map int OSD::_do_command void OSD::note_up_osd int OSD::init_op_flags src/osd/OSD.h in void send_incremental_map void share_map
smithfarm
pushed a commit
that referenced
this pull request
Mar 23, 2020
* no need to discard_result(). as `output_stream::close()` returns an
empty future<> already
* free the connected socket after the background task finishes, because:
we should not free the connected socket before the promise referencing it is fulfilled.
otherwise we have error messages from ASan, like
==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078
READ of size 8 at 0x611000019aa0 thread T0
#0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396
#1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428
#2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397)
#3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789
#4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687
#5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
#6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113
#7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
#8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
#9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713
ceph#10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199
ceph#11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148
ceph#12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308
ceph#13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9)
0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30)
freed by thread T0 here:
#0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487)
#1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458
#2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524
#3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396
#4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401
#5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98
#6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109
#7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
#8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
#9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81
ceph#10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr()
/usr/include/c++/10/bits/unique_ptr.h:357 ceph#11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195
ceph#12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260
ceph#13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280
ceph#14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432
ceph#15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975
ceph#16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265
ceph#17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450
ceph#18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36
Signed-off-by: Kefu Chai <kchai@redhat.com>
smithfarm
pushed a commit
that referenced
this pull request
Mar 2, 2021
Lack of this feature was the root cause of an issue in teuthology testing in which a socket failure injection happened exactly during `mon_subscribe`; after the OSD reconnected, the message hasn't been resent and entire boot process has frozen. ``` DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] --> #6 === mon_subscribe({osdmap=1}) v3 (15) DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=80 buffer.length()=80 DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=96 final_len=0 DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=48 buffer.length()=48 DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_update plaintext.length()=16 buffer.length()=64 DEBUG 2021-02-25 11:42:53,757 [shard 0] ms - authenticated_encrypt_final buffer.length()=80 final_len=0 INFO 2021-02-25 11:42:53,758 [shard 0] ms - [osd.2(client) v2:172.21.15.204:6804/33459@57376 >> mon.0 v2:172.21.15.204:3300/0] execute_ready(): fault at READY on lossy channel, going to CLOSING -- std::system_error (error crimson::net:4, read eof) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
smithfarm
pushed a commit
that referenced
this pull request
Jul 7, 2021
Otherwise, if we assert, we'll hang here: Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)): #0 0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54 #3 0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78 #4 0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2605 #5 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body") at ./src/googletest/googletest/src/gtest.cc:2641 #6 0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680 #7 0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858 #8 0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012 #9 0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723 ceph#10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605 ceph#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>, location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641 ceph#12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306 Signed-off-by: Sage Weil <sage@newdream.net> (cherry picked from commit ee5a0c9)
smithfarm
pushed a commit
that referenced
this pull request
Oct 20, 2021
exit() will call pthread_cond_destroy attempting to destroy dpdk::eal::cond
upon which other threads are currently blocked results in undefine
behavior. Link different libc version test, libc-2.17 can exit,
libc-2.27 will deadlock, the call stack is as follows:
Thread 3 (Thread 0xffff7e5749f0 (LWP 62213)):
#0 0x0000ffff7f3c422c in futex_wait_cancelable (private=<optimized out>, expected=0,
futex_word=0xaaaadc0e30f4 <dpdk::eal::cond+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>, cond=0xaaaadc0e30c8 <dpdk::eal::cond>)
at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0xaaaadc0e30c8 <dpdk::eal::cond>, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>)
at pthread_cond_wait.c:655
#3 0x0000ffff7f1f1f80 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#4 0x0000aaaad37f5078 in dpdk::eal::<lambda()>::operator()(void) const (__closure=<optimized out>, __closure=<optimized out>)
at ./src/msg/async/dpdk/dpdk_rte.cc:136
#5 0x0000ffff7f1f7ed4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#6 0x0000ffff7f3be088 in start_thread (arg=0xffffe73e197f) at pthread_create.c:463
#7 0x0000ffff7efc74ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
Thread 1 (Thread 0xffff7ee3b010 (LWP 62200)):
#0 0x0000ffff7f3c3c38 in futex_wait (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_cond_destroy (cond=0xaaaadc0e30c8 <dpdk::eal::cond>) at pthread_cond_destroy.c:54
#3 0x0000ffff7ef2be34 in __run_exit_handlers (status=-6, listp=0xffff7f04a5a0 <__exit_funcs>, run_list_atexit=255,
run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#4 0x0000ffff7ef2bf6c in __GI_exit (status=<optimized out>) at exit.c:139
#5 0x0000ffff7ef176e4 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:344
#6 0x0000aaaad2939db0 in _start () at ./src/include/buffer.h:642
Fixes: https://tracker.ceph.com/issues/42890
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Signed-off-by: luo rixin <luorixin@huawei.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.