os/osd: disable extra iterator validation by branch-predictor · Pull Request #5 · liewegas/ceph

branch-predictor · 2015-11-05T14:11:19Z

In a number of loops using ObjectMapIterator, the iterator is validated
twice, first as an loop break condition, then during iter->next() call.
Suppress the validation in next() method in those cases for better
performance by adding possibility to not call valid() from within
next().

Signed-off-by: Piotr Dałek piotr.dalek@ts.fujitsu.com

In most loops using ObjectMapIteratorImpl, there is a call to valid() followed by a call to next(), which calls valid() again. Calling valid() alone is particularly expensive, so don't do it twice if possible. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>

In a number of loops using ObjectMapIterator, the iterator is validated twice, first as an loop break condition, then during iter->next() call. Suppress the validation in next() method in those cases for better performance. Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>

os/osd: disable extra iterator validation

…_algorithm to master * commit 'c086c086207e88892abe03b97bc505b9060e6d01': Add tag and algorithm in KineticRecord API used in Kinetic Adaptation Layer Change algorithm from SHA1 to INVALID_ALGORITHM

…er instance the caller needs to check the nullity of the parameter before calling PK11_FreeSymKey or PK11_FreeSlot, otherwise if CryptoAESKeyHandler::init failed, we will hit a segfault as follows: #0 0x00007f76844f5a95 in PK11_FreeSymKey () from /lib64/libnss3.so #1 0x00007f76586b6e49 in CryptoAESKeyHandler::~CryptoAESKeyHandler() () from /lib64/librados.so.2 #2 0x00007f76586b5eea in CryptoAES::get_key_handler(ceph::buffer::ptr const&, std::string&) () from /lib64/librados.so.2 #3 0x00007f76586b4b9c in CryptoKey::_set_secret(int, ceph::buffer::ptr const&) () from /lib64/librados.so.2 #4 0x00007f76586b4e95 in CryptoKey::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 #5 0x00007f76586b7ee6 in KeyRing::set_modifier(char const*, char const*, EntityName&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&) () from /lib64/librados.so.2 #6 0x00007f76586b8882 in KeyRing::decode_plaintext(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 #7 0x00007f76586b9803 in KeyRing::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 #8 0x00007f76586b9a1f in KeyRing::load(CephContext*, std::string const&) () from /lib64/librados.so.2 #9 0x00007f76586ba04b in KeyRing::from_ceph_context(CephContext*) () from /lib64/librados.so.2 #10 0x00007f765852d0cd in MonClient::init() () from /lib64/librados.so.2 #11 0x00007f76583c15f5 in librados::RadosClient::connect() () from /lib64/librados.so.2 #12 0x00007f765838cb1c in rados_connect () from /lib64/librados.so.2 ... Signed-off-by: runsisi <runsisi@zte.com.cn>

I'm seeing sporadic single thread deadlocks on fio stat_mutex during krbd thrash runs: (gdb) info threads Id Target Id Frame * 1 Thread 0x7f89ee730740 (LWP 15604) 0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f89ed9f17b2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #2 0x00000000004429b9 in fio_mutex_down (mutex=0x7f89ee72d000) at mutex.c:170 #3 0x0000000000459704 in thread_main (data=<optimized out>) at backend.c:1639 #4 0x000000000045b013 in fork_main (offset=0, shmid=<optimized out>, sk_out=0x0) at backend.c:1778 #5 run_threads (sk_out=sk_out@entry=0x0) at backend.c:2195 #6 0x000000000045b47f in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400 #7 0x000000000040cb0c in main (argc=2, argv=0x7fffad3e3888, envp=<optimized out>) at fio.c:63 (gdb) up 2 170 pthread_cond_wait(&mutex->cond, &mutex->lock); (gdb) p mutex.lock.__data.__owner $1 = 15604 Upgrading to 2.21 seems to make these go away. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

…letion We have a race condition: 1. RGW client #1: requests an object be deleted. 2. RGW client #1: sends a prepare op to bucket index OSD #1. 3. OSD #1: prepares the op, adding pending ops to the bucket dir entry 4. RGW client #2: sends a list bucket to OSD #1 5. RGW client #2: sees that there are pending operations on bucket dir entry, and calls check_disk_state 6. RGW client #2: check_disk_state sees that the object still exists, so it sends CEPH_RGW_UPDATE to bucket index OSD (#1) 7. RGW client #1: sends a delete object to object OSD (#2) 8. OSD #2: deletes the object 9. RGW client #2: sends a complete op to bucket index OSD (#1) 10. OSD #1: completes the op 11. OSD #1: receives the CEPH_RGW_UPDATE and updates the bucket index entry, thereby **RECREATING** it Solution implemented: At step #5 the object's dir entry exists. If we get to beginning of step #11 and the object's dir entry no longer exists, we know that the dir entry was just actively being modified, and ignore the CEPH_RGW_UPDATE operation, thereby NOT recreating it. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

…letion We have a race condition: 1. RGW client #1: requests an object be deleted. 2. RGW client #1: sends a prepare op to bucket index OSD #1. 3. OSD #1: prepares the op, adding pending ops to the bucket dir entry 4. RGW client #2: sends a list bucket to OSD #1 5. RGW client #2: sees that there are pending operations on bucket dir entry, and calls check_disk_state 6. RGW client #2: check_disk_state sees that the object still exists, so it sends CEPH_RGW_UPDATE to bucket index OSD (#1) 7. RGW client #1: sends a delete object to object OSD (#2) 8. OSD #2: deletes the object 9. RGW client #2: sends a complete op to bucket index OSD (#1) 10. OSD #1: completes the op 11. OSD #1: receives the CEPH_RGW_UPDATE and updates the bucket index entry, thereby **RECREATING** it Solution implemented: At step #5 the object's dir entry exists. If we get to beginning of step #11 and the object's dir entry no longer exists, we know that the dir entry was just actively being modified, and ignore the CEPH_RGW_UPDATE operation, thereby NOT recreating it. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com> (cherry picked from commit b33f529)

…letion We have a race condition: 1. RGW client #1: requests an object be deleted. 2. RGW client #1: sends a prepare op to bucket index OSD #1. 3. OSD #1: prepares the op, adding pending ops to the bucket dir entry 4. RGW client #2: sends a list bucket to OSD #1 5. RGW client #2: sees that there are pending operations on bucket dir entry, and calls check_disk_state 6. RGW client #2: check_disk_state sees that the object still exists, so it sends CEPH_RGW_UPDATE to bucket index OSD (#1) 7. RGW client #1: sends a delete object to object OSD (#2) 8. OSD #2: deletes the object 9. RGW client #2: sends a complete op to bucket index OSD (#1) 10. OSD #1: completes the op 11. OSD #1: receives the CEPH_RGW_UPDATE and updates the bucket index entry, thereby **RECREATING** it Solution implemented: At step #5 the object's dir entry exists. If we get to beginning of step #11 and the object's dir entry no longer exists, we know that the dir entry was just actively being modified, and ignore the CEPH_RGW_UPDATE operation, thereby NOT recreating it. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com> (cherry picked from commit b33f529) Conflicts: (backported substantial changes only; omitted cleanups) src/cls/rgw/cls_rgw.cc src/rgw/rgw_rados.cc

We just took the curmap ref above; do not call get_osdmap() again. I think it may explain a weird segv I saw here in ~shared_ptr, although I'm not quite certain. Regardless, this change is correct and better. (gdb) bt #0 raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00005596e5a98261 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:326 #2 handle_fatal_signal(int) () at ./src/global/signal_handler.cc:326 #3 <signal handler called> #4 0x00005596f4fe80e0 in ?? () #5 0x00005596e5464068 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148 #6 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148 #7 0x00005596e543377f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f2b25044e28, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169 #8 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169 #9 std::shared_ptr<OSDMap const>::~shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103 #10 OSD::handle_osd_ping(MOSDPing*) () at ./src/osd/OSD.cc:4662 Signed-off-by: Sage Weil <sage@redhat.com>

Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () #1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 #6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 #8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 #9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread #1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

* no need to discard_result(). as `output_stream::close()` returns an empty future<> already * free the connected socket after the background task finishes, because: we should not free the connected socket before the promise referencing it is fulfilled. otherwise we have error messages from ASan, like ==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078 READ of size 8 at 0x611000019aa0 thread T0 #0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396 #1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428 #2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397) #3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789 #4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687 #5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60 #6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113 #7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291 #8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622 #9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713 #10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199 #11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148 #12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308 #13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9) 0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30) freed by thread T0 here: #0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487) #1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458 #2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524 #3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396 #4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401 #5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98 #6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109 #7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161 #8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161 #9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81 #10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr() /usr/include/c++/10/bits/unique_ptr.h:357 #11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195 #12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260 #13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280 #14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432 #15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975 #16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265 #17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450 #18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36 Signed-off-by: Kefu Chai <kchai@redhat.com>

Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () #1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 #6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 #8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 #9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread #1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com> (cherry picked from commit 80da5f9) Conflicts: src/osd/OSD.cc in bool OSD::asok_command int OSD::shutdown void OSD::maybe_update_heartbeat_peers void OSD::_preboot void OSD::queue_want_up_thru void OSD::send_alive void OSD::send_failures void OSD::send_beacon MPGStats* OSD::collect_pg_stats void OSD::note_down_osd void OSD::consume_map void OSD::activate_map src/osd/OSD.h in private: dispatch_session_waiting - also use the new const OSDMapRef in places that no longer exist in master src/osd/OSD.cc in void OSDService::share_map void OSDService::send_incremental_map int OSD::_do_command void OSD::note_up_osd int OSD::init_op_flags src/osd/OSD.h in void send_incremental_map void share_map

Otherwise, if we assert, we'll hang here: Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)): #0 0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54 #3 0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78 #4 0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2605 #5 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body") at ./src/googletest/googletest/src/gtest.cc:2641 #6 0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680 #7 0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858 #8 0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012 #9 0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723 #10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605 #11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>, location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641 #12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306 Signed-off-by: Sage Weil <sage@newdream.net>

Otherwise, if we assert, we'll hang here: Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)): #0 0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54 #3 0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78 #4 0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2605 #5 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body") at ./src/googletest/googletest/src/gtest.cc:2641 #6 0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680 #7 0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858 #8 0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012 #9 0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723 #10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605 #11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>, location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641 #12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306 Signed-off-by: Sage Weil <sage@newdream.net> (cherry picked from commit ee5a0c9)

f7181ab has optimized the client parallelism. To achieve that `PG::do_osd_ops()` were converted to return basically future pair of futures. Unfortunately, the life- time management of `OpsExecuter` was kept intact. In the result, the object was valid only till fullfying the outer future while, due to the `rollbacker` instances, it should be available till `all_completed` becomes available. This issue can explain the following problem has been observed in a Teuthology job [1]. ``` DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - do_op_call: method returned ret=-17, outdata.length()=0 while num_read=1, num_write=0 DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - rollback_obc_if_modified: object 19:e17d4708:test-rados-api-smithi095-38404-2::foo:head got erro r generic:17, need_rollback=false ================================================================= ==33626==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000b9320 at pc 0x560f486b8222 bp 0x7fffc467a1e0 sp 0x7fffc467a1d0 READ of size 4 at 0x60d0000b9320 thread T0 #0 0x560f486b8221 (/usr/bin/ceph-osd+0x2c610221) #1 0x560f4880c6b1 in seastar::continuation<seastar::internal::promise_base_with_type<boost::intrusive_ptr<MOSDOpReply> >, seastar::noncopy able_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_ wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd:: IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson: :errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInte rruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::error ated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > >(seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail <crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_ future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&&)::{lambda(seastar::internal::promise_base_with_type<boos t::intrusive_ptr<MOSDOpReply> >&&, seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() (/usr/bin/ceph-osd+0x2c7646b1) #2 0x560f5352c3ae (/usr/bin/ceph-osd+0x374843ae) #3 0x560f535318ef (/usr/bin/ceph-osd+0x374898ef) #4 0x560f536e395a (/usr/bin/ceph-osd+0x3763b95a) #5 0x560f532413d9 (/usr/bin/ceph-osd+0x371993d9) #6 0x560f476af95a in main (/usr/bin/ceph-osd+0x2b60795a) #7 0x7f7aa0af97b2 in __libc_start_main (/lib64/libc.so.6+0x237b2) #8 0x560f477d2e8d in _start (/usr/bin/ceph-osd+0x2b72ae8d) ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-20_07:28:16-rados-master-distro-basic-smithi/6124735/ The commit deals with the problem by repacking the outer future. An alternative could be in switching from `std::unique_ptr` to `seastar::shared_ptr` for managing `OpsExecuter`. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

The `FuturizedStore` interface imposes the `get_attr()` takes the `name` parameter as `std::string_view`, and thus burdens implementations with extending the life- time of the data the instance refers to. Unfortunately, `AlienStore` is unaware that prolonging the life of a `std::string_view` instance doesn't prolong the data memory it points to. This problem has manifested in the following use-after-free detected at Sepia: ``` rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136929$ less ./remote/smithi194/log/ceph-osd.7.log.gz ... DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - do_osd_ops_execute: object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head - handling op call DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - handling op call on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - calling method lock.lock, num_read=0, num_write=0 DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - handling op getxattr on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - getxattr on obj=14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head for attr=_lock.TestLockPP1 DEBUG 2021-05-26 20:24:54,078 [shard 0] bluestore - get_attr ================================================================= ==34068==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030001851d0 at pc 0x7f824d6a5b27 bp 0x7f822b4201c0 sp 0x7f822b41f968 READ of size 17 at 0x6030001851d0 thread T28 (alien-store-tp) ... #0 0x7f824d6a5b26 (/lib64/libasan.so.5+0x40b26) #1 0x55e2cbb2e00b (/usr/bin/ceph-osd+0x2b6dc00b) #2 0x55e2d31f086e (/usr/bin/ceph-osd+0x32d9e86e) #3 0x55e2d3467607 in crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) (/usr/bin/ceph-osd+0x33015607) #4 0x55e2d346b14a (/usr/bin/ceph-osd+0x3301914a) #5 0x7f8249d32ba2 (/lib64/libstdc++.so.6+0xc2ba2) #6 0x7f824a00d149 in start_thread (/lib64/libpthread.so.0+0x8149) #7 0x7f82486edf22 in clone (/lib64/libc.so.6+0xfcf22) 0x6030001851d0 is located 0 bytes inside of 31-byte region [0x6030001851d0,0x6030001851ef) freed by thread T0 here: #0 0x7f824d757688 in operator delete(void*) (/lib64/libasan.so.5+0xf2688) previously allocated by thread T0 here: #0 0x7f824d7567b0 in operator new(unsigned long) (/lib64/libasan.so.5+0xf17b0) Thread T28 (alien-store-tp) created by T0 here: #0 0x7f824d6b7ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3) SUMMARY: AddressSanitizer: heap-use-after-free (/lib64/libasan.so.5+0x40b26) Shadow bytes around the buggy address: 0x0c06800289e0: fd fd fd fa fa fa fd fd fd fa fa fa 00 00 00 fa 0x0c06800289f0: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd 0x0c0680028a00: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa 0x0c0680028a10: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa 0x0c0680028a20: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd =>0x0c0680028a30: fd fd fa fa fd fd fd fd fa fa[fd]fd fd fd fa fa 0x0c0680028a40: fd fd fd fd fa fa fd fd fd fd fa fa 00 00 00 07 0x0c0680028a50: fa fa 00 00 00 fa fa fa 00 00 00 fa fa fa fd fd 0x0c0680028a60: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa 0x0c0680028a70: 00 00 00 00 fa fa fd fd fd fd fa fa fd fd fd fd 0x0c0680028a80: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==34068==ABORTING ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

There is a scenario when the `active_con` is properly chosen but isn't marked as `ready_to_send`. If `renew_subs()` is called during the `on_session_opened()`, the flag will be turned on after the subscriptions are renewed which cannot happen as it requires the flag to be already set. In other words: there is a circular data dependency. The net result is stalling the subscription machinery, particularly the `OSDMap` subs. This caused a nasty peering issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo` state. ``` rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908$ less ./remote/smithi039/log/ceph-osd.1.log.gz ... DEBUG 2021-05-26 20:19:48,134 [shard 0] osd - pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r= -1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering ... DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down blocked_by {} DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0 ... DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features: 3f01cfbb7ffdffff DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000 ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing ... DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru ... DEBUG 2021-05-26 20:19:49,139 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active ... DEBUG 2021-05-26 20:19:49,142 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating ... DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered ... DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean ... DEBUG 2021-05-26 20:22:31,223 [shard 0] osd - pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset ... <a lot of flipping> ... DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000 INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at 1622060647.8158188, 1622060647.8173316 spent on 1 events DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start INFO 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000 INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at 1622060647.8516333, 0.0 spent on 0 events DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163 ) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo ... DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down blocked_by {} DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0 DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.4 ... DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131) ... DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62 ... <plenty of osd_ping messanging but no reply to the pg_query for 2.7> ... DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== ceph#772 = == osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70) DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> ceph#772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70 ``` The peering request got stuck due to awaiting for `OSDMap`. ``` DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131) DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1 DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star ``` ``` INFO 2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14) DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s ubscribe({osdmap=14}) v3 (15) ... INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4 INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15] ... INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4 INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15] ... INFO 2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14) WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty ... INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4 INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16] DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction INFO 2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16) ... INFO 2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17] DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction INFO 2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17) ... INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17) ... INFO 2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17) ... INFO 2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17) ... DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m ap(20..21 src has 1..21) v4 (41) INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4 INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21] INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19 INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18) ... DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m ap(21..22 src has 1..22) v4 (41) INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4 INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22] INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20 INFO 2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18) ... <osdmap_subscribe(18) over and over> ``` ``` 2021-05-26T20:19:42.048+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480 2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4 2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow * 2021-05-26T20:19:42.048+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ... 2021-05-26T20:19:49.129+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow so far , doing grant allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow all 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow so far , doing grant allow * 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 allow all 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime) 2021-05-26T20:19:49.129+0000 7f4712ffd700 5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4 2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features 3f01cfbb7ffdffff 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 15 622 bytes 2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 14 578 bytes 2021-05-26T20:19:49.129+0000 7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14.. 15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480 ``` ``` seastar::future<> Client::renew_subs() { if (!sub.have_new()) { logger().warn("{} - empty", __func__); return seastar::now(); } logger().trace("{}", __func__); auto m = crimson::make_message<MMonSubscribe>(); m->what = sub.get_subs(); m->hostname = ceph_get_short_hostname(); return send_message(std::move(m)).then([this] { sub.renewed(); }); } ``` ``` INFO 2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1) DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s ubscribe({osdmap=1}) v3 (15) ... INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14) DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15) ... INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14) WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty <no MMonSubcribe> ... INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17) <no MMonSubcribe> ... INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19 INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18) <no MMonSubcribe> ``` [1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

exit() will call pthread_cond_destroy attempting to destroy dpdk::eal::cond upon which other threads are currently blocked results in undefine behavior. Link different libc version test, libc-2.17 can exit, libc-2.27 will deadlock, the call stack is as follows: Thread 3 (Thread 0xffff7e5749f0 (LWP 62213)): #0 0x0000ffff7f3c422c in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xaaaadc0e30f4 <dpdk::eal::cond+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>, cond=0xaaaadc0e30c8 <dpdk::eal::cond>) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0xaaaadc0e30c8 <dpdk::eal::cond>, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>) at pthread_cond_wait.c:655 #3 0x0000ffff7f1f1f80 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6 #4 0x0000aaaad37f5078 in dpdk::eal::<lambda()>::operator()(void) const (__closure=<optimized out>, __closure=<optimized out>) at ./src/msg/async/dpdk/dpdk_rte.cc:136 #5 0x0000ffff7f1f7ed4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6 #6 0x0000ffff7f3be088 in start_thread (arg=0xffffe73e197f) at pthread_create.c:463 #7 0x0000ffff7efc74ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 1 (Thread 0xffff7ee3b010 (LWP 62200)): #0 0x0000ffff7f3c3c38 in futex_wait (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_cond_destroy (cond=0xaaaadc0e30c8 <dpdk::eal::cond>) at pthread_cond_destroy.c:54 #3 0x0000ffff7ef2be34 in __run_exit_handlers (status=-6, listp=0xffff7f04a5a0 <__exit_funcs>, run_list_atexit=255, run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 #4 0x0000ffff7ef2bf6c in __GI_exit (status=<optimized out>) at exit.c:139 #5 0x0000ffff7ef176e4 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:344 #6 0x0000aaaad2939db0 in _start () at ./src/include/buffer.h:642 Fixes: https://tracker.ceph.com/issues/42890 Signed-off-by: Chunsong Feng <fengchunsong@huawei.com> Signed-off-by: luo rixin <luorixin@huawei.com>

deploy: Tweak nginx config to allow external images

Piotr Dałek added 2 commits November 5, 2015 15:06

branch-predictor mentioned this pull request Nov 5, 2015

os/KeyValueDB: don't call self.valid() from next() ceph/ceph#6304

Closed

liewegas added a commit that referenced this pull request Nov 6, 2015

Merge pull request #5 from branch-predictor/bp-no-dbl-valid-iter

756ae40

os/osd: disable extra iterator validation

liewegas merged commit 756ae40 into liewegas:wip-kv Nov 6, 2015

branch-predictor deleted the bp-no-dbl-valid-iter branch November 10, 2015 12:07

liewegas pushed a commit that referenced this pull request Nov 5, 2021

Merge pull request #5 from ceph/wip-nginx-tweaks

a36cf47

deploy: Tweak nginx config to allow external images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

os/osd: disable extra iterator validation#5

os/osd: disable extra iterator validation#5
liewegas merged 2 commits intoliewegas:wip-kvfrom
branch-predictor:bp-no-dbl-valid-iter

branch-predictor commented Nov 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

branch-predictor commented Nov 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants