Skip to content

os/LevelDBStore: faster LevelDBTransactionImpl::set#3

Merged
liewegas merged 1 commit intoliewegas:wip-kvfrom
branch-predictor:bp-kvdb-set-optim
Oct 20, 2015
Merged

os/LevelDBStore: faster LevelDBTransactionImpl::set#3
liewegas merged 1 commit intoliewegas:wip-kvfrom
branch-predictor:bp-kvdb-set-optim

Conversation

@branch-predictor
Copy link

This patch builds on Sage's idea to reduce bufferlist copying on
::set() calls. Initial patch reduced LevelDB's Slice generation
time from ~57ns to ~5ns in best-case scenario (bufferlist with
single bufferptr or contiguous bufferptrs), this patch reduces
slice generation time from ~57ns to ~11ns in worst case scenario
(under assumption that entire value length is at most 128KB),
leaving old code path for extremely-bad cases.

Signed-off-by: Piotr Dałek piotr.dalek@ts.fujitsu.com

This patch builds on Sage's idea to reduce bufferlist copying on
::set() calls. Initial patch reduced LevelDB's Slice generation
time from ~57ns to ~5ns in best-case scenario (bufferlist with
single bufferptr or contiguous bufferptrs), this patch reduces
slice generation time from ~57ns to ~11ns in worst case scenario
(under assumption that entire value length is at most 128KB),
leaving old code path for extremely-bad cases.

Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
liewegas added a commit that referenced this pull request Oct 20, 2015
os/LevelDBStore: faster LevelDBTransactionImpl::set
@liewegas liewegas merged commit daa4041 into liewegas:wip-kv Oct 20, 2015
@branch-predictor branch-predictor deleted the bp-kvdb-set-optim branch May 15, 2016 17:56
liewegas pushed a commit that referenced this pull request Nov 1, 2016
…er instance

the caller needs to check the nullity of the parameter before calling
PK11_FreeSymKey or PK11_FreeSlot, otherwise if CryptoAESKeyHandler::init
failed, we will hit a segfault as follows:
  #0  0x00007f76844f5a95 in PK11_FreeSymKey () from /lib64/libnss3.so
  #1  0x00007f76586b6e49 in CryptoAESKeyHandler::~CryptoAESKeyHandler() () from /lib64/librados.so.2
  #2  0x00007f76586b5eea in CryptoAES::get_key_handler(ceph::buffer::ptr const&, std::string&) () from /lib64/librados.so.2
  #3  0x00007f76586b4b9c in CryptoKey::_set_secret(int, ceph::buffer::ptr const&) () from /lib64/librados.so.2
  #4  0x00007f76586b4e95 in CryptoKey::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2
  #5  0x00007f76586b7ee6 in KeyRing::set_modifier(char const*, char const*, EntityName&, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&) () from /lib64/librados.so.2
  #6  0x00007f76586b8882 in KeyRing::decode_plaintext(ceph::buffer::list::iterator&) () from /lib64/librados.so.2
  #7  0x00007f76586b9803 in KeyRing::decode(ceph::buffer::list::iterator&) () from /lib64/librados.so.2
  #8  0x00007f76586b9a1f in KeyRing::load(CephContext*, std::string const&) () from /lib64/librados.so.2
  #9  0x00007f76586ba04b in KeyRing::from_ceph_context(CephContext*) () from /lib64/librados.so.2
  #10 0x00007f765852d0cd in MonClient::init() () from /lib64/librados.so.2
  #11 0x00007f76583c15f5 in librados::RadosClient::connect() () from /lib64/librados.so.2
  #12 0x00007f765838cb1c in rados_connect () from /lib64/librados.so.2
  ...

Signed-off-by: runsisi <runsisi@zte.com.cn>
liewegas pushed a commit that referenced this pull request Aug 1, 2017
I'm seeing sporadic single thread deadlocks on fio stat_mutex during krbd
thrash runs:

  (gdb) info threads
    Id   Target Id         Frame
  * 1    Thread 0x7f89ee730740 (LWP 15604) 0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0
  (gdb) bt
  #0  0x00007f89ed9f41bd in __lll_lock_wait () from /lib64/libpthread.so.0
  #1  0x00007f89ed9f17b2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  #2  0x00000000004429b9 in fio_mutex_down (mutex=0x7f89ee72d000) at mutex.c:170
  #3  0x0000000000459704 in thread_main (data=<optimized out>) at backend.c:1639
  #4  0x000000000045b013 in fork_main (offset=0, shmid=<optimized out>, sk_out=0x0) at backend.c:1778
  #5  run_threads (sk_out=sk_out@entry=0x0) at backend.c:2195
  #6  0x000000000045b47f in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2400
  #7  0x000000000040cb0c in main (argc=2, argv=0x7fffad3e3888, envp=<optimized out>) at fio.c:63
  (gdb) up 2
  170                     pthread_cond_wait(&mutex->cond, &mutex->lock);
  (gdb) p mutex.lock.__data.__owner
  $1 = 15604

Upgrading to 2.21 seems to make these go away.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
liewegas added a commit that referenced this pull request Jun 19, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.

<error>
  <unique>0x2399</unique>
  <tid>11</tid>
  <threadname>msgr-worker-1</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0x5366B18</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&amp;&amp;, unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>274</line>
    </frame>
    <frame>
      <ip>0x5355E60</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1311</line>
    </frame>
    <frame>
      <ip>0x533E2A3</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>45</line>
    </frame>
    <frame>
      <ip>0x534FB1C</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)::{lambda()#2}::operator()()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2739</line>
    </frame>
    <frame>
      <ip>0x534FF57</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2745</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke_impl&lt;void, ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke&lt;ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>95</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__call&lt;void, 0&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>400</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()&lt;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>484</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::C_submit_event&lt;std::_Bind&lt;ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3} (ConnectedSocket)&gt; &gt;::do_request(unsigned long)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.h</file>
      <line>227</line>
    </frame>
    <frame>
      <ip>0x535FCD6</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.cc</file>
      <line>441</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Stack.cc</file>
      <line>53</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0x55F519E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>execute_native_thread_routine</fn>
    </frame>
    <frame>
      <ip>0x1076BDD4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x118E0EAC</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

Signed-off-by: Sage Weil <sage@redhat.com>
liewegas added a commit that referenced this pull request Jun 19, 2019
We just took the curmap ref above; do not call get_osdmap() again.

I think it may explain a weird segv I saw here in ~shared_ptr, although
I'm not quite certain.  Regardless, this change is correct and better.

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00005596e5a98261 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:326
#2  handle_fatal_signal(int) () at ./src/global/signal_handler.cc:326
#3  <signal handler called>
#4  0x00005596f4fe80e0 in ?? ()
#5  0x00005596e5464068 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#6  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#7  0x00005596e543377f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f2b25044e28, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#8  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#9  std::shared_ptr<OSDMap const>::~shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#10 OSD::handle_osd_ping(MOSDPing*) () at ./src/osd/OSD.cc:4662

Signed-off-by: Sage Weil <sage@redhat.com>
liewegas added a commit that referenced this pull request Jul 19, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.

<error>
  <unique>0x2399</unique>
  <tid>11</tid>
  <threadname>msgr-worker-1</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0x5366B18</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&amp;&amp;, unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>274</line>
    </frame>
    <frame>
      <ip>0x5355E60</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1311</line>
    </frame>
    <frame>
      <ip>0x533E2A3</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>45</line>
    </frame>
    <frame>
      <ip>0x534FB1C</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)::{lambda()#2}::operator()()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2739</line>
    </frame>
    <frame>
      <ip>0x534FF57</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2745</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke_impl&lt;void, ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke&lt;ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>95</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__call&lt;void, 0&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>400</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()&lt;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>484</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::C_submit_event&lt;std::_Bind&lt;ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3} (ConnectedSocket)&gt; &gt;::do_request(unsigned long)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.h</file>
      <line>227</line>
    </frame>
    <frame>
      <ip>0x535FCD6</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.cc</file>
      <line>441</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Stack.cc</file>
      <line>53</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0x55F519E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>execute_native_thread_routine</fn>
    </frame>
    <frame>
      <ip>0x1076BDD4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x118E0EAC</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f019fc0)
liewegas pushed a commit that referenced this pull request Feb 18, 2020
Accordingly to cppreference.com [1]:

  "If multiple threads of execution access the same std::shared_ptr
  object without synchronization and any of those accesses uses
  a non-const member function of shared_ptr then a data race will
  occur (...)"

[1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:

  ```
  [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  #1  0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #3  0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #4  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #5  std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
  #6  OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  #8  0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
  #9  0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
  #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
  #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
  #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
  #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
  #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
  #15 0x0000000000000000 in ?? ()
  (gdb) frame 7
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  9665      in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
  (gdb) print osdmap
  $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
  (gdb) print *osdmap
     # pretty sane OSDMap
  (gdb) print sizeof(osdmap)
  $26 = 16
  (gdb) x/2a &osdmap
  0x559ca22acef0:   0x559cba028000  0x559cba0ec900

  (gdb) frame 2
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  148       /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
  (gdb) disassemble
  Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
  ...
     0x0000559c97675b1e <+62>:      mov    (%rdi),%rax
     0x0000559c97675b21 <+65>:      mov    %rdi,%rbx
     0x0000559c97675b24 <+68>:      callq  *0x10(%rax)
  => 0x0000559c97675b27 <+71>:      test   %rbp,%rbp
  ...
  End of assembler dump.
  (gdb) info registers rdi rbx rax
  rdi            0x559cba0ec900      94131624790272
  rbx            0x559cba0ec900      94131624790272
  rax            0x559cba0ec8a0      94131624790176
  (gdb) x/a 0x559cba0ec8a0 + 0x10
  0x559cba0ec8b0:   0x559cb81c3ea0
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  ...
  (gdb) p $_siginfo._sifields._sigfault.si_addr
  $27 = (void *) 0x559cb81c3ea0
  ```

Helgrind seems to agree:
  ```
  ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90
  ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:02:54.519 510301==    at 0x7218DD: operator= (shared_ptr_base.h:1078)
  ==00:00:02:54.519 510301==    by 0x7218DD: operator= (shared_ptr.h:103)
  ==00:00:02:54.519 510301==    by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:02:54.519 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:02:54.519 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:02:54.519 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:02:54.519 510301==
  ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117
  ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
  ==00:00:02:54.519 510301==    at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
  ==00:00:02:54.519 510301==    by 0x6B5842: shared_ptr (shared_ptr.h:129)
  ==00:00:02:54.519 510301==    by 0x6B5842: get_osdmap (OSD.h:1700)
  ==00:00:02:54.519 510301==    by 0x6B5842: OSD::create_context() (OSD.cc:9053)
  ==00:00:02:54.519 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:02:54.519 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:02:54.519 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:02:54.519 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:02:54.519 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:02:54.519 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==  Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
  ==00:00:02:54.519 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:02:54.519 510301==    by 0x66F766: main (ceph_osd.cc:688)
  ==00:00:02:54.519 510301==  Block was alloc'd by thread #1
  ```

Actually there is plenty of similar issues reported like:
  ```
  ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119
  ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
  ==00:00:05:04.903 510301==    at 0x753165: clear (hashtable.h:2051)
  ==00:00:05:04.903 510301==    by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
  >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
  il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
  _base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
  ==00:00:05:04.903 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:05:04.903 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:05:04.903 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:05:04.903 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:05:04.903 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:05:04.903 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:05:04.903 510301==
  ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90
  ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:05:04.903 510301==    at 0x7531E1: clear (hashtable.h:2054)
  ==00:00:05:04.903 510301==    by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:747)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:1078)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==  Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
  ==00:00:05:04.903 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:05:04.903 510301==    by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:699)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:1732)
  ==00:00:05:04.903 510301==    by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
liewegas pushed a commit that referenced this pull request Mar 24, 2020
* no need to discard_result(). as `output_stream::close()` returns an
  empty future<> already
* free the connected socket after the background task finishes, because:

we should not free the connected socket before the promise referencing it is fulfilled.

otherwise we have error messages from ASan, like

==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078
READ of size 8 at 0x611000019aa0 thread T0
    #0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396
    #1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428
    #2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397)
    #3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789
    #4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687
    #5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
    #6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113
    #7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
    #8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
    #9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713
    #10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199
    #11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148
    #12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308
    #13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9)

0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30)
freed by thread T0 here:
    #0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487)
    #1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458
    #2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524
    #3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396
    #4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401
    #5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98
    #6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109
    #7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    #8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    #9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81
    #10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr()
	/usr/include/c++/10/bits/unique_ptr.h:357    #11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195
    #12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260
    #13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280
    #14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432
    #15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975
    #16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265
    #17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450
    #18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36

Signed-off-by: Kefu Chai <kchai@redhat.com>
liewegas pushed a commit that referenced this pull request Feb 19, 2021
Accordingly to cppreference.com [1]:

  "If multiple threads of execution access the same std::shared_ptr
  object without synchronization and any of those accesses uses
  a non-const member function of shared_ptr then a data race will
  occur (...)"

[1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:

  ```
  [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  #1  0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #3  0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #4  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #5  std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
  #6  OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  #8  0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
  #9  0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
  #10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
  #11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
  #12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
  #13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
  #14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
  #15 0x0000000000000000 in ?? ()
  (gdb) frame 7
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  9665      in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
  (gdb) print osdmap
  $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
  (gdb) print *osdmap
     # pretty sane OSDMap
  (gdb) print sizeof(osdmap)
  $26 = 16
  (gdb) x/2a &osdmap
  0x559ca22acef0:   0x559cba028000  0x559cba0ec900

  (gdb) frame 2
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  148       /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
  (gdb) disassemble
  Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
  ...
     0x0000559c97675b1e <+62>:      mov    (%rdi),%rax
     0x0000559c97675b21 <+65>:      mov    %rdi,%rbx
     0x0000559c97675b24 <+68>:      callq  *0x10(%rax)
  => 0x0000559c97675b27 <+71>:      test   %rbp,%rbp
  ...
  End of assembler dump.
  (gdb) info registers rdi rbx rax
  rdi            0x559cba0ec900      94131624790272
  rbx            0x559cba0ec900      94131624790272
  rax            0x559cba0ec8a0      94131624790176
  (gdb) x/a 0x559cba0ec8a0 + 0x10
  0x559cba0ec8b0:   0x559cb81c3ea0
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  ...
  (gdb) p $_siginfo._sifields._sigfault.si_addr
  $27 = (void *) 0x559cb81c3ea0
  ```

Helgrind seems to agree:
  ```
  ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90
  ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:02:54.519 510301==    at 0x7218DD: operator= (shared_ptr_base.h:1078)
  ==00:00:02:54.519 510301==    by 0x7218DD: operator= (shared_ptr.h:103)
  ==00:00:02:54.519 510301==    by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:02:54.519 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:02:54.519 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:02:54.519 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:02:54.519 510301==
  ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117
  ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
  ==00:00:02:54.519 510301==    at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
  ==00:00:02:54.519 510301==    by 0x6B5842: shared_ptr (shared_ptr.h:129)
  ==00:00:02:54.519 510301==    by 0x6B5842: get_osdmap (OSD.h:1700)
  ==00:00:02:54.519 510301==    by 0x6B5842: OSD::create_context() (OSD.cc:9053)
  ==00:00:02:54.519 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:02:54.519 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:02:54.519 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:02:54.519 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:02:54.519 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:02:54.519 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==  Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
  ==00:00:02:54.519 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:02:54.519 510301==    by 0x66F766: main (ceph_osd.cc:688)
  ==00:00:02:54.519 510301==  Block was alloc'd by thread #1
  ```

Actually there is plenty of similar issues reported like:
  ```
  ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119
  ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
  ==00:00:05:04.903 510301==    at 0x753165: clear (hashtable.h:2051)
  ==00:00:05:04.903 510301==    by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
  >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
  il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
  _base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
  ==00:00:05:04.903 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:05:04.903 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:05:04.903 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:05:04.903 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:05:04.903 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:05:04.903 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:05:04.903 510301==
  ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90
  ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:05:04.903 510301==    at 0x7531E1: clear (hashtable.h:2054)
  ==00:00:05:04.903 510301==    by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:747)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:1078)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==  Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
  ==00:00:05:04.903 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:05:04.903 510301==    by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:699)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:1732)
  ==00:00:05:04.903 510301==    by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 80da5f9)

Conflicts:
	src/osd/OSD.cc in
		bool OSD::asok_command
		int OSD::shutdown
		void OSD::maybe_update_heartbeat_peers
		void OSD::_preboot
		void OSD::queue_want_up_thru
		void OSD::send_alive
		void OSD::send_failures
		void OSD::send_beacon
		MPGStats* OSD::collect_pg_stats
		void OSD::note_down_osd
		void OSD::consume_map
		void OSD::activate_map
	src/osd/OSD.h in
		private: dispatch_session_waiting

- also use the new const OSDMapRef in places that no longer exist in master
	src/osd/OSD.cc in
		void OSDService::share_map
		void OSDService::send_incremental_map
		int OSD::_do_command
		void OSD::note_up_osd
		int OSD::init_op_flags
	src/osd/OSD.h in
		void send_incremental_map
		void share_map
liewegas pushed a commit that referenced this pull request Apr 20, 2021
During a teuthology run [1] following crash happended:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696$ less remote/smithi052/log/ceph-osd.3.log.gz
...
DEBUG 2021-04-08 10:32:58,548 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62168 >> mon.0 v2:172.21.15.52:3300/0] <== #3 === mgrmap(e 4) v1 (1796)
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closing: reset no, replace no
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] TRIGGER CLOSING, was READY
INFO  2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] execute_ready(): protocol aborted at CLOSING -- std::system_error (error crimson::net:4, read eof)
DEBUG 2021-04-08 10:32:58,549 [shard 0] ms - [osd.3(client) v2:172.21.15.52:6813/30889@62056 >> mgr.4100 v2:172.21.15.52:6800/30259] closed!
Segmentation fault on shard 0.
Backtrace:
  0x000000000151765c
  0x00000000014d9600
  0x00000000014d9902
  0x00000000014d9972
  /lib64/libpthread.so.0+0x0000000000012b1f
  0x0000000000e59cba
  0x00000000014dc8a6
  0x00000000014cdd1c
  0x0000000001503053
  0x000000000149fab7
  0x00000000006e0ef5
  /lib64/libc.so.6+0x00000000000237b2
  0x000000000072a23d
daemon-helper: command crashed with signal 11
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-04-08_10:14:11-rados-master-distro-basic-smithi/6028696/

GDB testifies the `conn` during the execution of `ceph::mgr:report()` was null:

```
(gdb) frame 7
154	in /usr/src/debug/ceph-17.0.0-2935.g4153f8c2.el8.x86_64/src/crimson/mgr/client.cc
(gdb) print conn
$1 = {_b = 0x0, _p = 0x0}
```

Taken altogether with the `mgr.4100 v2:172.21.15.52:6800/30259] closed!`
debug this suggests that a call to `report()` occurred (likely from the
timer) but we were in the middle of the unatomic reconnect sequence:

```cpp
seastar::future<> Client::reconnect()
{
  if (conn) {
    conn->mark_down();
    conn = {};
  }
  // ...
  return seastar::sleep(a_while).then([this] {
    // ...
    conn = msgr.connect(peer, CEPH_ENTITY_TYPE_MGR);
  });
}
```

This commit alters the `mgr::report()` to skip reporting is the `conn`
is unavailable.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
liewegas added a commit that referenced this pull request May 4, 2021
Otherwise, if we assert, we'll hang here:

Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)):
#0  0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54

#3  0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78
#4  0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0)
    at ./src/googletest/googletest/src/gtest.cc:2605
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body")
    at ./src/googletest/googletest/src/gtest.cc:2641
#6  0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680
#7  0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858
#8  0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012
#9  0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723
#10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)",
    method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605
#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>,
    location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641
#12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306

Signed-off-by: Sage Weil <sage@newdream.net>
liewegas added a commit that referenced this pull request May 5, 2021
Otherwise, if we assert, we'll hang here:

Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)):
#0  0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54

#3  0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78
#4  0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0)
    at ./src/googletest/googletest/src/gtest.cc:2605
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body")
    at ./src/googletest/googletest/src/gtest.cc:2641
#6  0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680
#7  0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858
#8  0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012
#9  0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723
#10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)",
    method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605
#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>,
    location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641
#12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit ee5a0c9)
liewegas pushed a commit that referenced this pull request May 26, 2021
f7181ab has optimized the client
parallelism. To achieve that `PG::do_osd_ops()` were converted to
return basically future pair of futures. Unfortunately, the life-
time management of `OpsExecuter` was kept intact. In the result,
the object was valid only till fullfying the outer future while,
due to the `rollbacker` instances, it should be available till
`all_completed` becomes available.

This issue can explain the following problem has been observed
in a Teuthology job [1].

```
DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - do_op_call: method returned ret=-17, outdata.length()=0 while num_read=1, num_write=0
DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - rollback_obc_if_modified: object 19:e17d4708:test-rados-api-smithi095-38404-2::foo:head got erro
r generic:17, need_rollback=false
=================================================================
==33626==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000b9320 at pc 0x560f486b8222 bp 0x7fffc467a1e0 sp 0x7fffc467a1d0
READ of size 4 at 0x60d0000b9320 thread T0
    #0 0x560f486b8221  (/usr/bin/ceph-osd+0x2c610221)
    #1 0x560f4880c6b1 in seastar::continuation<seastar::internal::promise_base_with_type<boost::intrusive_ptr<MOSDOpReply> >, seastar::noncopy
able_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_
wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > >
 > ()>, seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::
IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson:
:errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInte
rruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::error
ated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > >(seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail
<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_
future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&&)::{lambda(seastar::internal::promise_base_with_type<boos
t::intrusive_ptr<MOSDOpReply> >&&, seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() (/usr/bin/ceph-osd+0x2c7646b1)
    #2 0x560f5352c3ae  (/usr/bin/ceph-osd+0x374843ae)
    #3 0x560f535318ef  (/usr/bin/ceph-osd+0x374898ef)
    #4 0x560f536e395a  (/usr/bin/ceph-osd+0x3763b95a)
    #5 0x560f532413d9  (/usr/bin/ceph-osd+0x371993d9)
    #6 0x560f476af95a in main (/usr/bin/ceph-osd+0x2b60795a)
    #7 0x7f7aa0af97b2 in __libc_start_main (/lib64/libc.so.6+0x237b2)
    #8 0x560f477d2e8d in _start (/usr/bin/ceph-osd+0x2b72ae8d)

```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-20_07:28:16-rados-master-distro-basic-smithi/6124735/

The commit deals with the problem by repacking the outer future.
An alternative could be in switching from `std::unique_ptr` to
`seastar::shared_ptr` for managing `OpsExecuter`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
liewegas pushed a commit that referenced this pull request Jun 2, 2021
The `FuturizedStore` interface imposes the `get_attr()`
takes the `name` parameter as `std::string_view`, and
thus burdens implementations with extending the life-
time of the data the instance refers to.

Unfortunately, `AlienStore` is unaware that prolonging
the life of a `std::string_view` instance doesn't prolong
the data memory it points to. This problem has manifested
in the following use-after-free detected at Sepia:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136929$ less ./remote/smithi194/log/ceph-osd.7.log.gz
...
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - do_osd_ops_execute: object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head - handling op
call
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - handling op call on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - calling method lock.lock, num_read=0, num_write=0
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - handling op getxattr on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - getxattr on obj=14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head for attr=_lock.TestLockPP1
DEBUG 2021-05-26 20:24:54,078 [shard 0] bluestore - get_attr
=================================================================
==34068==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030001851d0 at pc 0x7f824d6a5b27 bp 0x7f822b4201c0 sp 0x7f822b41f968
READ of size 17 at 0x6030001851d0 thread T28 (alien-store-tp)
...
    #0 0x7f824d6a5b26  (/lib64/libasan.so.5+0x40b26)
    #1 0x55e2cbb2e00b  (/usr/bin/ceph-osd+0x2b6dc00b)
    #2 0x55e2d31f086e  (/usr/bin/ceph-osd+0x32d9e86e)
    #3 0x55e2d3467607 in crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) (/usr/bin/ceph-osd+0x33015607)
    #4 0x55e2d346b14a  (/usr/bin/ceph-osd+0x3301914a)
    #5 0x7f8249d32ba2  (/lib64/libstdc++.so.6+0xc2ba2)
    #6 0x7f824a00d149 in start_thread (/lib64/libpthread.so.0+0x8149)
    #7 0x7f82486edf22 in clone (/lib64/libc.so.6+0xfcf22)

0x6030001851d0 is located 0 bytes inside of 31-byte region [0x6030001851d0,0x6030001851ef)
freed by thread T0 here:
    #0 0x7f824d757688 in operator delete(void*) (/lib64/libasan.so.5+0xf2688)

previously allocated by thread T0 here:
    #0 0x7f824d7567b0 in operator new(unsigned long) (/lib64/libasan.so.5+0xf17b0)

Thread T28 (alien-store-tp) created by T0 here:
    #0 0x7f824d6b7ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3)

SUMMARY: AddressSanitizer: heap-use-after-free (/lib64/libasan.so.5+0x40b26)
Shadow bytes around the buggy address:
  0x0c06800289e0: fd fd fd fa fa fa fd fd fd fa fa fa 00 00 00 fa
  0x0c06800289f0: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x0c0680028a00: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
  0x0c0680028a10: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x0c0680028a20: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
=>0x0c0680028a30: fd fd fa fa fd fd fd fd fa fa[fd]fd fd fd fa fa
  0x0c0680028a40: fd fd fd fd fa fa fd fd fd fd fa fa 00 00 00 07
  0x0c0680028a50: fa fa 00 00 00 fa fa fa 00 00 00 fa fa fa fd fd
  0x0c0680028a60: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x0c0680028a70: 00 00 00 00 fa fa fd fd fd fd fa fa fd fd fd fd
  0x0c0680028a80: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==34068==ABORTING
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
alfonsomthd pushed a commit that referenced this pull request Oct 18, 2021
exit() will call pthread_cond_destroy attempting to destroy dpdk::eal::cond
upon which other threads are currently blocked results in undefine
behavior. Link different libc version test, libc-2.17 can exit,
libc-2.27 will deadlock, the call stack is as follows:

Thread 3 (Thread 0xffff7e5749f0 (LWP 62213)):
 #0  0x0000ffff7f3c422c in futex_wait_cancelable (private=<optimized out>, expected=0,
    futex_word=0xaaaadc0e30f4 <dpdk::eal::cond+44>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
 #1  __pthread_cond_wait_common (abstime=0x0, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>, cond=0xaaaadc0e30c8 <dpdk::eal::cond>)
    at pthread_cond_wait.c:502
 #2  __pthread_cond_wait (cond=0xaaaadc0e30c8 <dpdk::eal::cond>, mutex=0xaaaadc0e30f8 <dpdk::eal::lock>)
    at pthread_cond_wait.c:655
 #3  0x0000ffff7f1f1f80 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
 #4  0x0000aaaad37f5078 in dpdk::eal::<lambda()>::operator()(void) const (__closure=<optimized out>, __closure=<optimized out>)
    at ./src/msg/async/dpdk/dpdk_rte.cc:136
 #5  0x0000ffff7f1f7ed4 in ?? () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
 #6  0x0000ffff7f3be088 in start_thread (arg=0xffffe73e197f) at pthread_create.c:463
 #7  0x0000ffff7efc74ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0xffff7ee3b010 (LWP 62200)):
 #0  0x0000ffff7f3c3c38 in futex_wait (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:61
 #1  futex_wait_simple (private=<optimized out>, expected=12, futex_word=0xaaaadc0e30ec <dpdk::eal::cond+36>)
    at ../sysdeps/nptl/futex-internal.h:135
 #2  __pthread_cond_destroy (cond=0xaaaadc0e30c8 <dpdk::eal::cond>) at pthread_cond_destroy.c:54
 #3  0x0000ffff7ef2be34 in __run_exit_handlers (status=-6, listp=0xffff7f04a5a0 <__exit_funcs>, run_list_atexit=255,
    run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
 #4  0x0000ffff7ef2bf6c in __GI_exit (status=<optimized out>) at exit.c:139
 #5  0x0000ffff7ef176e4 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:344
 #6  0x0000aaaad2939db0 in _start () at ./src/include/buffer.h:642

Fixes: https://tracker.ceph.com/issues/42890
Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Signed-off-by: luo rixin <luorixin@huawei.com>
liewegas pushed a commit that referenced this pull request Nov 5, 2021
The ceph.io website as of now is not friendly towards those who would like to fix or at least to report mistakes on the website. This repo does not make it clear that it is for the new website under development and this creates confusion. See issue #3 for an example. It is safe to assume that the majority of people who would have otherwise reported or even fixed mistakes give up early in the process. This update provides direction for reports related to the current website and makes a statement about this repository's purpose as a home for the new website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants