fix error: 'snprintf' was not declared in this scope by vzctl · Pull Request #4 · ceph/ceph

vzctl · 2011-10-28T12:36:53Z

Signed-off-by: Alexey Lapitsky lex@realisticgroup.com

Signed-off-by: Alexey Lapitsky <lex@realisticgroup.com>

fix error: 'snprintf' was not declared in this scope

These were static in auth/Crypto.cc, which was mostly fine, except when we got a signal shutting everything down for the gcov stuff, like so: Thread 21 (Thread 2164): #0 0x00007f31a800b3cd in open64 () from /lib/libpthread.so.0 #1 0x000000000081dee0 in __gcov_open () #2 0x000000000081e3fd in gcov_exit () #3 0x00007f31a67e64f2 in exit () from /lib/libc.so.6 #4 0x000000000054e1ca in handle_signal (signal=<value optimized out>) at osd/OSD.cc:600 #5 <signal handler called> #6 0x00007f31a8007a9a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #7 0x0000000000636d7b in Wait (this=0x2241000) at ./common/Cond.h:48 #8 SimpleMessenger::wait (this=0x2241000) at msg/SimpleMessenger.cc:2637 #9 0x00000000004a4e35 in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:343 and a racing thread would, say, accept a connection and then crash, like so: #0 0x00007f31a800ba0b in raise () from /lib/libpthread.so.0 #1 0x0000000000696eeb in reraise_fatal (signum=2164) at global/signal_handler.cc:59 #2 0x00000000006976cc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x00007f31a67e0ba5 in raise () from /lib/libc.so.6 #5 0x00007f31a67e46b0 in abort () from /lib/libc.so.6 #6 0x00007f31a70846bd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #7 0x00007f31a7082906 in ?? () from /usr/lib/libstdc++.so.6 #8 0x00007f31a7082933 in std::terminate() () from /usr/lib/libstdc++.so.6 #9 0x00007f31a708328f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6 #10 0x0000000000690e5b in CryptoKey::decrypt (this=0x7f3195a67510, in=..., out=..., error=...) at auth/Crypto.cc:404 #11 0x000000000079ccee in void decode_decrypt_enc_bl<CephXServiceTicketInfo>(CephXServiceTicketInfo&, CryptoKey, ceph::buffer::list&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) () #12 0x0000000000795ca3 in cephx_verify_authorizer (cct=0x2232000, keys=<value optimized out>, indata=..., ticket_info=<value optimized out>, reply_bl=<value optimized out>) at auth/cephx/CephxProtocol.cc:438 #13 0x00000000007a17cf in CephxAuthorizeHandler::verify_authorizer (this=<value optimized out>, cct=0x2232000, keys=0x2256000, authorizer_data=<value optimized out>, authorizer_reply=..., entity_name=..., global_id=@0x7f3195a67848, caps_info=..., auid=0x7f3195a67840) at auth/cephx/CephxAuthorizeHandler.cc:21 #14 0x00000000005577ff in OSD::ms_verify_authorizer (this=0x2267000, con=0x230da00, peer_type=<value optimized out>, protocol=<value optimized out>, authorizer_data=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at osd/OSD.cc:2723 #15 0x0000000000611ce1 in ms_deliver_verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2, authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at msg/Messenger.h:145 #16 SimpleMessenger::verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2, authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at msg/SimpleMessenger.cc:2419 #17 0x00000000006309ab in SimpleMessenger::Pipe::accept (this=0x22ce280) at msg/SimpleMessenger.cc:756 #18 0x0000000000634711 in SimpleMessenger::Pipe::reader (this=0x22ce280) at msg/SimpleMessenger.cc:1546 #19 0x00000000004a7085 in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:208 #20 0x000000000060f252 in Thread::_entry_func (arg=0x874) at common/Thread.cc:42 #21 0x00007f31a8003971 in start_thread () from /lib/libpthread.so.0 #22 0x00007f31a689392d in clone () from /lib/libc.so.6 #23 0x0000000000000000 in ?? () Instead, put these on the heap. Set them up in the ceph::crypto::init() method, and tear them down in ceph::crypto::shutdown(). Fixes: #1633 Signed-off-by: Sage Weil <sage.weil@dreamhost.com>

Shut down MonClient before messenger, to avoid race with MonClient::tick() and MonClient::shutdown(). Fixes #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f44475e2849 in _L_lock_953 () from /lib/libpthread.so.0 #2 0x00007f44475e266b in __pthread_mutex_lock (mutex=0x14d8dc8) at pthread_mutex_lock.c:61 #3 0x00000000005ae090 in Mutex::Lock (this=0x14d8db8, no_lockdep=false) at ./common/Mutex.h:108 #4 0x000000000068440e in MonClient::shutdown (this=0x14d8c30) at mon/MonClient.cc:386 #5 0x00000000005b2653 in ceph_tool_common_shutdown (ctx=0x14d84c0) at tools/common.cc:661 #6 0x00000000005ada29 in main (argc=7, argv=0x7fff8a2394c8) at tools/ceph.cc:304 vs #0 0x00007f44475e8a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x00000000005eff6b in reraise_fatal (signum=11) at global/signal_handler.cc:59 #2 0x00000000005f0165 in handle_fatal_signal (signum=11) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x0000000000000000 in ?? () #5 0x000000000068661a in MonClient::tick (this=0x14d8c30) at mon/MonClient.cc:621 #6 0x0000000000689e3b in MonClient::C_Tick::finish(int) () #7 0x000000000061b3c5 in SafeTimer::timer_thread (this=0x14d8df8) at common/Timer.cc:102 #8 0x000000000061c6f0 in SafeTimerThread::entry() () #9 0x00000000005f1219 in Thread::_entry_func (arg=0x14e1a00) at common/Thread.cc:41 #10 0x00007f44475e0971 in start_thread (arg=<value optimized out>) at pthread_create.c:304 #11 0x00007f4445ead92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? () Signed-off-by: Sage Weil <sage@newdream.net>

Before the mon, and lockdep, in particular. #0 __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50 #1 0x0000000000816092 in ceph::log::Log::submit_entry (this=0x0, e=0x2f4a270) at log/Log.cc:138 #2 0x00000000007ee0f8 in handle_fatal_signal (signum=11) at global/signal_handler.cc:100 #3 <signal handler called> #4 0x00000000008e1300 in lockdep_will_lock (name=0x959aa7 "SignalHandler::lock", id=17) at common/lockdep.cc:163 #5 0x00000000008867fc in Mutex::_will_lock (this=0x2f20428) at ./common/Mutex.h:56 #6 0x0000000000886605 in Mutex::Lock (this=0x2f20428, no_lockdep=false) at common/Mutex.cc:81 #7 0x00000000007eeb95 in SignalHandler::entry (this=0x2f20300) at global/signal_handler.cc:198 #8 0x00000000008b0bd1 in Thread::_entry_func (arg=0x2f20300) at common/Thread.cc:43 #9 0x00007f36fefd6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #10 0x00007f36fd80b6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () #0 0x00007f36fefd7e75 in pthread_join (threadid=139874129766144, thread_return=0x0) at pthread_join.c:89 #1 0x00000000008b11ec in Thread::join (this=0x2f20300, prval=0x0) at common/Thread.cc:130 #2 0x00000000007eeae7 in SignalHandler::shutdown (this=0x2f20300) at global/signal_handler.cc:186 #3 0x00000000007ee9cf in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:175 #4 0x00000000007eea58 in SignalHandler::~SignalHandler (this=0x2f20300, __in_chrg=<optimized out>) at global/signal_handler.cc:176 #5 0x00000000007ee643 in shutdown_async_signal_handler () at global/signal_handler.cc:324 #6 0x00000000006de9d2 in main (argc=7, argv=0x7fffbfb8a1e8) at ceph_mon.cc:439 Signed-off-by: Sage Weil <sage@inktank.com>

The C_CancelOp path assumes op->session != NULL. Cancel that op before we clear it. This fixes a crash like #0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:39 #1 0x00007fc82690a4b1 in RWLock::get_write (this=0x18, lockdep=<optimized out>) at ./common/RWLock.h:88 #2 0x00007fc8268f4d79 in Objecter::op_cancel (this=0x1f61830, s=0x0, tid=0, r=-110) at osdc/Objecter.cc:1850 #3 0x00007fc8268ba449 in Context::complete (this=0x1f68c20, r=<optimized out>) at ./include/Context.h:64 #4 0x00007fc8269769aa in RWTimer::timer_thread (this=0x1f61950) at common/Timer.cc:268 #5 0x00007fc82697a85d in RWTimerThread::entry (this=<optimized out>) at common/Timer.cc:200 #6 0x00007fc82651ce9a in start_thread (arg=0x7fc7e3fff700) at pthread_create.c:308 Signed-off-by: Sage Weil <sage@redhat.com>

kv/LevelDBStore, FileStore, MonDBStore: simpler code for single-key fetches

…ocks. Summary: SizeBeingCompacted was called without any lock protection. This causes crashes, especially when running db_bench with value_size=128K. The fix is to compute SizeUnderCompaction while holding the mutex and passing in these values into the call to Finalize. (gdb) where ceph#4 leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827 ceph#5 0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420 ceph#6 0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016 ceph#7 0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473 ceph#8 0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757 ceph#9 0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268 ceph#10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170 ceph#11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941 ceph#12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874 ceph#13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301 ceph#14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Test Plan: make check I am running db_bench with a value size of 128K to see if the segfault is fixed. Reviewers: MarkCallaghan, sheki, emayanke Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9279

Summary: Now this gives us the real deal stack trace: Assertion failed: (false), function GetProperty, file db/db_impl.cc, line 4072. Received signal 6 (Abort trap: 6) #0 0x7fff57ce39b9 ceph#1 abort (in libsystem_c.dylib) + 125 ceph#2 basename (in libsystem_c.dylib) + 0 ceph#3 rocksdb::DBImpl::GetProperty(rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) (in db_test) (db_impl.cc:4072) ceph#4 rocksdb::_Test_Empty::_Run() (in db_test) (testharness.h:68) ceph#5 rocksdb::_Test_Empty::_RunIt() (in db_test) (db_test.cc:1005) ceph#6 rocksdb::test::RunAllTests() (in db_test) (testharness.cc:60) ceph#7 main (in db_test) (db_test.cc:6697) ceph#8 start (in libdyld.dylib) + 1 Test Plan: added artificial assert, saw great stack trace Reviewers: haobo, dhruba, ljin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18309

fix compile error on 32-bits Reviewed-by: Sage Weil <sage@redhat.com>

Hammer iam non mock1

Attempts to install jewel ceph-common, ceph-mon, ceph-osd, and ceph-base package over infernalis ceph package fail due to files existing in both. See comment #4 in the tracker issue for a deeper analysis. http://tracker.ceph.com/issues/15047 Fixes: #15047 Signed-off-by: Nathan Cutler <ncutler@suse.com>

Signed-off-by: Ali Maredia <amaredia@redhat.com>

This implements option ceph#4 for external boost, based on upstream discussion. In option ceph#4: 1. boost is added as a submodule 2. builds default to using the attached boost module 3. building against a system-provided boost is supported, but must be configured explicitly Because all of the boost components are attached as nested submodules in the upstream boost repository, neither the nested submodules nor the root boost submodule have been cloned into modules in github.com/ceph (acked by Sage). Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

This implements option #4 for external boost, based on upstream discussion. In option #4: 1. boost is added as a submodule 2. builds default to using the attached boost module 3. building against a system-provided boost is supported, but must be configured explicitly Because all of the boost components are attached as nested submodules in the upstream boost repository, neither the nested submodules nor the root boost submodule have been cloned into modules in github.com/ceph (acked by Sage). Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

…er instance the caller needs to check the nullity of the parameter before calling PK11_FreeSymKey or PK11_FreeSlot, otherwise if CryptoAESKeyHandler::init failed, we will hit a segfault as follows: #0 0x00007f76844f5a95 in PK11_FreeSymKey () from /lib64/libnss3.so ceph#1 0x00007f76586b6e49 in CryptoAESKeyHandler::~CryptoAESKeyHandler() () from /lib64/librados.so.2 ceph#2 0x00007f76586b5eea in CryptoAES::get_key_handler(ceph::buffer::ptr const&, std::string&) () from /lib64/librados.so.2 ceph#3 0x00007f76586b4b9c in CryptoKey::_set_secret(int, ceph::buffer::ptr const&) () from /lib64/librados.so.2 ceph#4 0x00007f76586b4e95 in CryptoKey::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 ceph#5 0x00007f76586b7ee6 in KeyRing::set_modifier(char const*, char const*, EntityName&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&) () from /lib64/librados.so.2 ceph#6 0x00007f76586b8882 in KeyRing::decode_plaintext(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 ceph#7 0x00007f76586b9803 in KeyRing::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2 ceph#8 0x00007f76586b9a1f in KeyRing::load(CephContext*, std::string const&) () from /lib64/librados.so.2 ceph#9 0x00007f76586ba04b in KeyRing::from_ceph_context(CephContext*) () from /lib64/librados.so.2 ceph#10 0x00007f765852d0cd in MonClient::init() () from /lib64/librados.so.2 ceph#11 0x00007f76583c15f5 in librados::RadosClient::connect() () from /lib64/librados.so.2 ceph#12 0x00007f765838cb1c in rados_connect () from /lib64/librados.so.2 ... Signed-off-by: runsisi <runsisi@zte.com.cn>

we have to shutdown the hunting timer and reset cur_con at the same place, or the hunting timer may set cur_con before it get shutdown, which results segfault as follows: #0 0x00007fb09ffca989 in raise () from /lib64/libc.so.6 ceph#1 0x00007fb09ffcc098 in abort () from /lib64/libc.so.6 ceph#2 0x00007fb08ea52677 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /lib64/librados.so.2 ceph#3 0x00007fb08e93144c in ceph::log::SubsystemMap::should_gather(unsigned int, int) [clone .part.120] () from /lib64/librados.so.2 ceph#4 0x00007fb08e97eb15 in RefCountedObject::put() () from /lib64/librados.so.2 ceph#5 0x00007fb08eae3f9e in MonClient::~MonClient() () from /lib64/librados.so.2 ceph#6 0x00007fb08e97c2d5 in librados::RadosClient::~RadosClient() () from /lib64/librados.so.2 ceph#7 0x00007fb08e97c319 in librados::RadosClient::~RadosClient() () from /lib64/librados.so.2 ceph#8 0x00007fb08e94684a in rados_shutdown () from /lib64/librados.so.2 ceph#9 0x00007fb098074210 in __pyx_pw_5rados_5Rados_7shutdown () from /usr/lib64/python2.7/site-packages/rados.so ... Signed-off-by: runsisi <runsisi@zte.com.cn>

Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ceph#1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 ceph#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 ceph#3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 ceph#4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 ceph#5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 ceph#6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 ceph#7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 ceph#8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 ceph#9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 ceph#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 ceph#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 ceph#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 ceph#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 ceph#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 ceph#15 0x0000000000000000 in ?? () (gdb) frame 7 ceph#7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 ceph#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread ceph#1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com> (cherry picked from commit 80da5f9) Conflicts: src/osd/OSD.cc in bool OSD::asok_command int OSD::shutdown void OSD::maybe_update_heartbeat_peers void OSD::_preboot void OSD::queue_want_up_thru void OSD::send_alive void OSD::send_failures void OSD::send_beacon MPGStats* OSD::collect_pg_stats void OSD::note_down_osd void OSD::consume_map void OSD::activate_map src/osd/OSD.h in private: dispatch_session_waiting - also use the new const OSDMapRef in places that no longer exist in master src/osd/OSD.cc in void OSDService::share_map void OSDService::send_incremental_map int OSD::_do_command void OSD::note_up_osd int OSD::init_op_flags src/osd/OSD.h in void send_incremental_map void share_map

* no need to discard_result(). as `output_stream::close()` returns an empty future<> already * free the connected socket after the background task finishes, because: we should not free the connected socket before the promise referencing it is fulfilled. otherwise we have error messages from ASan, like ==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078 READ of size 8 at 0x611000019aa0 thread T0 #0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396 #1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428 #2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397) #3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789 #4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687 #5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60 #6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113 #7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291 #8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622 #9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713 #10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199 #11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148 #12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308 #13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9) 0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30) freed by thread T0 here: #0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487) #1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458 #2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524 #3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396 #4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401 #5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98 #6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109 #7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161 #8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161 #9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81 #10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr() /usr/include/c++/10/bits/unique_ptr.h:357 #11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195 #12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260 #13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280 #14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432 #15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975 #16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265 #17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450 #18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36 Signed-off-by: Kefu Chai <kchai@redhat.com>

Accordingly to cppreference.com [1]: "If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur (...)" [1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap` with healthy looking content but damaged control block: ``` [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))] (gdb) bt #0 0x0000559cb81c3ea0 in ?? () #1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 #3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167 #5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103 #6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 #8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96 #9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342 #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677 #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311 #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706 #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0 #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665 9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc (gdb) print osdmap $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000} (gdb) print *osdmap # pretty sane OSDMap (gdb) print sizeof(osdmap) $26 = 16 (gdb) x/2a &osdmap 0x559ca22acef0: 0x559cba028000 0x559cba0ec900 (gdb) frame 2 #2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148 148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory. (gdb) disassemble Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release(): ... 0x0000559c97675b1e <+62>: mov (%rdi),%rax 0x0000559c97675b21 <+65>: mov %rdi,%rbx 0x0000559c97675b24 <+68>: callq *0x10(%rax) => 0x0000559c97675b27 <+71>: test %rbp,%rbp ... End of assembler dump. (gdb) info registers rdi rbx rax rdi 0x559cba0ec900 94131624790272 rbx 0x559cba0ec900 94131624790272 rax 0x559cba0ec8a0 94131624790176 (gdb) x/a 0x559cba0ec8a0 + 0x10 0x559cba0ec8b0: 0x559cb81c3ea0 (gdb) bt #0 0x0000559cb81c3ea0 in ?? () ... (gdb) p $_siginfo._sifields._sigfault.si_addr $27 = (void *) 0x559cb81c3ea0 ``` Helgrind seems to agree: ``` ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread #90 ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078) ==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103) ==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:02:54.519 510301== ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread #117 ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0 ==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165) ==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129) ==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700) ==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053) ==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd ==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688) ==00:00:02:54.519 510301== Block was alloc'd by thread #1 ``` Actually there is plenty of similar issues reported like: ``` ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread #119 ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0 ==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051) ==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr _base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728) ==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167) ==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053) ==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665) ==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701) ==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148) ==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677) ==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311) ==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ==00:00:05:04.903 510301== ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread #90 ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8 ==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054) ==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369) ==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102) ==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350) ==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100) ==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471) ==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155) ==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078) ==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103) ==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd ==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433) ==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699) ==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732) ==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076) ==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678) ==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77) ==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66) ==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389) ==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so) ==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so) ``` Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com> (cherry picked from commit 80da5f9) Conflicts: src/osd/OSD.cc in bool OSD::asok_command int OSD::shutdown void OSD::maybe_update_heartbeat_peers void OSD::_preboot void OSD::queue_want_up_thru void OSD::send_alive void OSD::send_failures void OSD::send_beacon MPGStats* OSD::collect_pg_stats void OSD::note_down_osd void OSD::consume_map void OSD::activate_map src/osd/OSD.h in private: dispatch_session_waiting - also use the new const OSDMapRef in places that no longer exist in master src/osd/OSD.cc in void OSDService::share_map void OSDService::send_incremental_map int OSD::_do_command void OSD::note_up_osd int OSD::init_op_flags src/osd/OSD.h in void send_incremental_map void share_map

# This is the 1st commit message: DO-NOT-MERGE; first commit for integration of s3-select engine into RGW; the request can only sent by AWS client ; can execute on CSV files # This is the commit message #2: remove debug info # This is the commit message #3: bug fix (aggregation) ; error handling # This is the commit message #4: fix comments(to be continue); # This is the commit message #5: placement-new allocator;cosmetics # This is the commit message #6: add namespace ; memory-mng: response buffer is now class-member # This is the commit message #7: std::list --> std::vector # This is the commit message #8: replace boost::split with simple C csv parser; there is a big difference ; mainly because of too many allocation & copy # This is the commit message #9: performance improvement; upon star-operation using reusable-buffer to reduce copies and allocations # This is the commit message #10: performance improvement; reduce allocations and copies; using reusable buffer(std::string) for message meta-data also # This is the commit message #11: replace crc implementation with boost implementation; it also improve performance; # This is the commit message #12: performance improvement ; reduce the number of object value construction on intensive flow ( eval() ); # This is the commit message #13: move from char* to std::string_view; change to csv_object interfaces mainly for performance improvements # This is the commit message #14: initial commit for column-alias supoort; next steps are error-handling(semantic, cyclic reference) and related performance improvements # This is the commit message #15: adding cache to column-alias, upon refer to alias more than once, it return cache result instead of executing the referenced-sub-tree; it can improve performance significantly (alias vs non-alias) # This is the commit message #16: cosmitcs; aggregation semantic validation is done just after syntax phase; error-messages for failed queries; # This is the commit message #17: adding validation for cyclic-alias-reference (endless evaluate-loop) ; its done by validating the call-stack-deph not crossing a threshold # This is the commit message #18: 1) seperate headers for the s3-select-functions framework; 2)bug fix for copy-constructor # This is the commit message #19: adding new basic-type timestamp (boost::posix_time); adding to_timestamp,add_date,diff_date,extract_date functions; # This is the commit message #20: adding yuvalif utcnow (return current time) implementation # This is the commit message #21: adding CSV parser integrated with AWS-cli, the upgraded parser is able handle null columns, dynamic column/row/escape/quote char definitions. the CSV-parser is implemented with BOOST state machine. # This is the commit message #22: fix comments # This is the commit message #23: add escape rules ; default row-delimiter # This is the commit message #24: *) bug fix. in case of syntax error, send error-description back to client. *) upon amount of runtime-error is crossing 100, abort query execution with error-message. *) compression-type value is check for "NONE" # This is the commit message #25: adding initial s3-select documentation # This is the commit message #26: *)identation *)add table for CSV behavior *)add alias feature decription # This is the commit message #27: add csv-header-info handling, use: get csv schema by first line. ignore: skip the first line. # This is the commit message #28: add csv-header-info feature description # This is the commit message #29: *) handling broken-CSV-rows is done on s3select-engine (CSV s3select reader) *) RGW is executing s3-select on io-vec instead of calling c_str (it might realloc) # This is the commit message #30: adding s3 select documentation(to be continue ...) , s3-select is part of radosgw top-level-link # This is the commit message #31: add s3select submodule (remove s3select header files from src/rgw ) # This is the commit message #32: re shape the document; mainly user oriented ; design & architecture is out (different document) ; TBD detailed example.

Create spec class for HA_RGW

add utcnow function and unit tests

include stdio in order to fix snprintf compilation error

6353d7b

Signed-off-by: Alexey Lapitsky <lex@realisticgroup.com>

liewegas added a commit that referenced this pull request Oct 28, 2011

Merge pull request #4 from vzctl/master

5fe8e00

fix error: 'snprintf' was not declared in this scope

liewegas merged commit 5fe8e00 into ceph:master Oct 28, 2011

olorin mentioned this pull request Mar 11, 2014

rados_connect not thread-safe when using nss (documentation) #1424

Merged

yuyuyu101 mentioned this pull request Mar 9, 2015

Wip async fix throttle deadlock #3913

Merged

branch-predictor pushed a commit to branch-predictor/ceph that referenced this pull request Nov 5, 2015

Merge pull request ceph#4 from branch-predictor/wip-kv

574f3fc

kv/LevelDBStore, FileStore, MonDBStore: simpler code for single-key fetches

chamdoo pushed a commit to chamdoo/ceph that referenced this pull request Nov 13, 2015

Merge pull request ceph#4 from xinxinsh/wip-rocksdb

05da593

fix compile error on 32-bits Reviewed-by: Sage Weil <sage@redhat.com>

dmick mentioned this pull request Feb 10, 2016

memstore: fix alignment of Page for test_pageset #7587

Merged

abhidixit pushed a commit to abhidixit/ceph that referenced this pull request Feb 23, 2016

Merge pull request ceph#4 from shivanshu21/hammer_IAM_nonMock1

b489d6d

Hammer iam non mock1

smithfarm mentioned this pull request Mar 10, 2016

packaging: make infernalis -> jewel upgrade work #8034

Merged

alimaredia added a commit to alimaredia/ceph that referenced this pull request May 24, 2016

blkin: squash this commit ceph#4

ab36ef5

Signed-off-by: Ali Maredia <amaredia@redhat.com>

dillaman mentioned this pull request Jun 15, 2016

rbd-nbd does not properly handle resize notifications #9291

Merged

tchaikov mentioned this pull request Mar 21, 2020

crimson/admin: do not reset connected_sock before closing #34104

Merged

3 tasks

majianpeng mentioned this pull request Apr 24, 2020

os/bluestore: Optimizing the lock of bluestore writing process #34109

Merged

3 tasks

ivancich mentioned this pull request Oct 14, 2020

rgw: rgw-orphan-list should use "plain" formatted rados ls output #37622

Merged

adk3798 referenced this pull request in adk3798/ceph Nov 3, 2020

Merge pull request #4 from adk3798/haproxy-spec

eab8b17

Create spec class for HA_RGW

fengchunsong mentioned this pull request Nov 10, 2020

rgw: prefetch GET range request #36144

Closed

galsalomon66 added a commit that referenced this pull request Jan 14, 2021

Merge pull request #4 from yuvalif/add_utcnow_function

23603c2

add utcnow function and unit tests

This was referenced Mar 22, 2021

bug fix: ceph-conf core-dump when startup #40302

Closed

crush/CrushLocation: do not print logging message in constructor #40457

Merged

tchaikov mentioned this pull request Jun 1, 2021

auth/KeyRing: always decode keying as plaintext #41631

Merged

alfonsomthd mentioned this pull request Oct 26, 2021

mgr/dashboard: Cluster expansion review page minor bug fixes #43661

Merged

3 tasks

pritha-srivastava mentioned this pull request Nov 24, 2021

rgw multisite: replicate metadata for iam roles #43597

Merged

3 tasks

This was referenced Mar 23, 2022

cls/rgw : Add missing classes in < #include "cls/rgw/cls_rgw_types.h"> #45394

Merged

[DNM] Wip iqbal cls rgw classes #45538

Closed

mattbenjamin mentioned this pull request Apr 22, 2022

RGW - Allow starting RGW/dbstore without connecting to Mons #45987

Merged

14 tasks

liujiangang01 mentioned this pull request Oct 11, 2022

client: _readdir_cache_cb() may use the readdir_cache already clear #29526

Merged

3 tasks

pritha-srivastava mentioned this pull request Mar 2, 2023

rgw/d3n: filter driver for d3n. #50198

Closed

14 tasks

dang mentioned this pull request Jun 2, 2023

Wip posix ordered #51519

Merged

14 tasks

chrisphoffman mentioned this pull request Nov 8, 2023

mgr/vol: don't always print success message for "volume rename" cmd #54340

Closed

14 tasks

joscollin mentioned this pull request Mar 19, 2024

cephfs_mirror: fix crash in update_fs_mirrors() #56148

Merged

14 tasks

xxhdx1985126 mentioned this pull request Apr 23, 2024

crimson/os/seastore: support extent checksum verification #55449

Merged

14 tasks

arm7star mentioned this pull request Sep 12, 2024

test/crimson/test_messenger_thrash: fix a local variable scope issue #59746

Merged

14 tasks

batrick mentioned this pull request Oct 28, 2024

doc/governance: updates based on 2024q4 election #60518

Merged

14 tasks

mohit84 mentioned this pull request Apr 3, 2025

crimson/osd/pg_recovery: use OperationThrottler to throttle object pushes/pulls #62080

Merged

14 tasks

anthonyeleven mentioned this pull request Apr 16, 2025

doc/radosgw: Promptify CLI, cosmetic fixes #62763

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix error: 'snprintf' was not declared in this scope#4

fix error: 'snprintf' was not declared in this scope#4
liewegas merged 1 commit intoceph:masterfrom
vzctl:master

vzctl commented Oct 28, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vzctl commented Oct 28, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants