Skip to content

script/ptl-tool.py: Fix the Comment example#1

Merged
batrick merged 1 commit intobatrick:ptl-tool-2from
joscollin:ptl-tool-2
Oct 4, 2017
Merged

script/ptl-tool.py: Fix the Comment example#1
batrick merged 1 commit intobatrick:ptl-tool-2from
joscollin:ptl-tool-2

Conversation

@joscollin
Copy link

@joscollin joscollin commented Oct 3, 2017

The example shown in the comment needs export to work as expected.

@joscollin joscollin changed the title test script/ptl-tool.py: Fix the Comment example Oct 3, 2017
batrick pushed a commit that referenced this pull request Oct 4, 2017
Fixes the coverity issue:

** 1221538 Uninitialized pointer field
2. uninit_member: Non-static class member cluster is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member ioctx is not initialized
in this constructor nor in any functions that it calls.
CID 1221538 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
6. uninit_member: Non-static class member striper is not initialized
in this constructor nor in any functions that it calls

Signed-off-by: Amit Kumar amitkuma@redhat.com
The example shown in the comment needs export to work as expected.

Signed-off-by: Jos Collin <jcollin@redhat.com>
@batrick batrick merged commit 32b2653 into batrick:ptl-tool-2 Oct 4, 2017
@joscollin joscollin deleted the ptl-tool-2 branch October 5, 2017 01:01
batrick pushed a commit that referenced this pull request Oct 6, 2017
Fixes the coverity issues:

** 1402627 Uninitialized scalar field
CID 1402627 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member promotion_state is not
initialized in this constructor nor in any functions that it calls.

** 1402630 Uninitialized scalar field
CID 1402630 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member m_promotion_state is not
initialized in this constructor nor in any functions that it calls.

** 1402631 Uninitialized scalar field
CID 1402631 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member m_promotion_state is not
initialized in this constructor nor in any functions that it calls.

** 1402632 Uninitialized scalar field
CID 1402632 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member tag_tid is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Oct 6, 2017
Fixes the coverity issues:

** 1396177 Uninitialized scalar field
CID 1396177 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member m_state is not initialized
in this constructor nor in any functions that it calls.

** 1396178 Uninitialized scalar field
CID 1396178 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member tag_tid is not initialized
in this constructor nor in any functions that it calls.

** 1396203 Uninitialized scalar field
CID 1396203 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member m_state is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Oct 6, 2017
** 1396191 Unused value
CID 1396191 (#1 of 1): Unused value (UNUSED_VALUE)
returned_value: Assigning value from check_obj_tail_locator_underscore
(bucket_info, obj, key, fix, f) to ret here, but that stored value is
overwritten before it can be used.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Oct 14, 2017
Fixes the coverity issues:

** 1401444 Uninitialized scalar field
2. uninit_member: Non-static class member gc_start_offset is not
initialized in this constructor nor in any functions that it calls.
CID 1401444 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member gc_end_offset is not
initialized in this constructor nor in any functions that it calls.

** 1405091 Uninitialized scalar field
CID 1405091 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member bits is not initialized
in this constructor nor in any functions that it calls.

** 1409842 Uninitialized scalar field
CID 1409842 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member m_object_exist is not
initialized in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Oct 14, 2017
Fixes the coverity issues:

** 1396105 Uninitialized scalar field
CID 1396105 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member m_snap_list_ret is not
initialized in this constructor nor in any functions that it calls.

** 1396141 Uninitialized scalar field
CID 1396141 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member value is not initialized
in this constructor nor in any functions that it calls.

** 1396162 Uninitialized scalar field
2. uninit_member: Non-static class member m_num is not initialized
in this constructor nor in any functions that it calls.
CID 1396162 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member m_len is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Oct 24, 2017
Fixes the coverity issues:

** 1402628 Uninitialized scalar field
CID 1402628 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member epoch is not initialized
in this constructor nor in any functions that it calls.

** 1409841 Uninitialized scalar field
CID 1409841 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member len is not initialized
in this constructor nor in any functions that it calls.

** 1416594 Uninitialized scalar field
CID 1416594 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member field fh.fh_hk is not
initialized in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 8, 2017
…letion

We have a race condition:

 1. RGW client #1: requests an object be deleted.
 2. RGW client #1: sends a prepare op to bucket index OSD #1.
 3. OSD #1:        prepares the op, adding pending ops to the bucket dir entry
 4. RGW client #2: sends a list bucket to OSD #1
 5. RGW client #2: sees that there are pending operations on bucket
                   dir entry, and calls check_disk_state
 6. RGW client #2: check_disk_state sees that the object still exists, so it
                   sends CEPH_RGW_UPDATE to bucket index OSD (#1)
 7. RGW client #1: sends a delete object to object OSD (#2)
 8. OSD #2:        deletes the object
 9. RGW client #2: sends a complete op to bucket index OSD (#1)
10. OSD #1:        completes the op
11. OSD #1:        receives the CEPH_RGW_UPDATE and updates the bucket index
                   entry, thereby **RECREATING** it

Solution implemented:

At step #5 the object's dir entry exists. If we get to beginning of
step ceph#11 and the object's dir entry no longer exists, we know that the
dir entry was just actively being modified, and ignore the
CEPH_RGW_UPDATE operation, thereby NOT recreating it.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
batrick pushed a commit that referenced this pull request Nov 18, 2017
Fixes the coverity issues:

** 1414512 Uninitialized scalar field
2. uninit_member: Non-static class member lkey is not initialized
in this constructor nor in any functions that it calls.
CID 1414512 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member bound is not initialized
in this constructor nor in any functions that it calls.

** 1414523 Uninitialized scalar field
CID 1414523 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
5. uninit_member: Non-static class member port_cnt is not initialized
in this constructor nor in any functions that it calls.

** 1414524 Uninitialized scalar field
CID 1414524 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member num_chunk is not initialized
in this constructor nor in any functions that it calls.

** 1414525 Uninitialized scalar field
CID 1414525 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member gid_idx is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issue:
CID 1394853 (#1 of 1): Division or modulo by zero (DIVIDE_BY_ZERO)
29. divide_by_zero: In expression (this->data.object_size * max_objects
+ this->data.op_size - 1UL) / this->data.op_size, division by expression
this->data.op_size which may be zero has undefined behavior.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issue:

CID 1417062 (#1 of 1): Result is not floating-point
(UNINTENDED_INTEGER_DIVISION)
integer_division: Dividing integer expressions 100 and temp
, and then converting the integer quotient to type double.
Any remainder, or fractional part of the quotient, is ignored.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issue:

CID 1394846 (#1 of 1): Dereference after null check (FORWARD_NULL)
15. var_deref_model: Passing null pointer pool_name to regex_match,
which dereferences it.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
…letion

We have a race condition:

 1. RGW client #1: requests an object be deleted.
 2. RGW client #1: sends a prepare op to bucket index OSD #1.
 3. OSD #1:        prepares the op, adding pending ops to the bucket dir entry
 4. RGW client #2: sends a list bucket to OSD #1
 5. RGW client #2: sees that there are pending operations on bucket
                   dir entry, and calls check_disk_state
 6. RGW client #2: check_disk_state sees that the object still exists, so it
                   sends CEPH_RGW_UPDATE to bucket index OSD (#1)
 7. RGW client #1: sends a delete object to object OSD (#2)
 8. OSD #2:        deletes the object
 9. RGW client #2: sends a complete op to bucket index OSD (#1)
10. OSD #1:        completes the op
11. OSD #1:        receives the CEPH_RGW_UPDATE and updates the bucket index
                   entry, thereby **RECREATING** it

Solution implemented:

At step #5 the object's dir entry exists. If we get to beginning of
step ceph#11 and the object's dir entry no longer exists, we know that the
dir entry was just actively being modified, and ignore the
CEPH_RGW_UPDATE operation, thereby NOT recreating it.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit b33f529)
batrick pushed a commit that referenced this pull request Nov 22, 2017
** 1396166 Uninitialized pointer field
2. uninit_member: Non-static class member field commit_data.link_left
is not initialized in this constructor nor in any functions that it calls.
CID 1396166 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
4. uninit_member: Non-static class member field commit_data.node is not
initialized in this constructor nor in any functions that it calls.

** 1405353 Uninitialized pointer field
CID 1405353 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
3. uninit_member: Non-static class member cmount is not initialized
in this constructor nor in any functions that it calls.

** 1405848 Uninitialized pointer field
2. uninit_member: Non-static class member field iocb.data is not
initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member field iocb.key is not initialized
in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member field iocb.__pad2 is not
initialized in this constructor nor in any functions that it calls.
8. uninit_member: Non-static class member field iocb.aio_lio_opcode
is not initialized in this constructor nor in any functions that it calls.
10. uninit_member: Non-static class member field iocb.aio_reqprio is not
initialized in this constructor nor in any functions that it calls.
CID 1405848 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
12. uninit_member: Non-static class member field iocb.aio_fildes is not
initialized in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issues:

** 717326 Uninitialized scalar field
2. uninit_member: Non-static class member field h.fsid is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member field h.version is not initialized
in this constructor nor in any functions that it calls.
CID 717326 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member field h.st is not initialized in
this constructor nor in any functions that it calls.

** 1244199 Uninitialized scalar field
CID 1244199 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member reply_type is not initialized
in this constructor nor in any functions that it calls.

** 1249637 Uninitialized scalar field
CID 1249637 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member owner is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issues:

** 1351735 Uninitialized scalar field
CID 1351735 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member fuse_req_key is not
initialized in this constructor nor in any functions that it calls.

** 1396108 Uninitialized scalar field
CID 1396108 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member snap is not initialized
in this constructor nor in any functions that it calls.

** 1396115 Uninitialized scalar field
2. uninit_member: Non-static class member who is not initialized
in this constructor nor in any functions that it calls.
CID 1396115 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member seq is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issues:

** 1396212 Uninitialized scalar field
CID 1396212 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member stats_period is not
initialized in this constructor nor in any functions that it calls.

** 1396226 Uninitialized scalar field
CID 1396226 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member m_active_set is not
initialized in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issues:

** 1415776 Uninitialized scalar field
CID 1415776 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member field read_length.v is
not initialized in this constructor nor in any functions that it calls.

** 1415811 Uninitialized scalar field
CID 1415811 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member options is not initialized
in this constructor nor in any functions that it calls.

** 1415850 Uninitialized scalar field
CID 1415850 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member sent_reply is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 22, 2017
Fixes the coverity issue:
CID 1405270 (#1 of 1): Division or modulo by zero (DIVIDE_BY_ZERO)
2. divide_by_zero: In function call ll_get_stripe_osd,
division by expression l.stripe_count which may be zero has
undefined behavior.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 27, 2017
Fixes the coverity issues:

** 717263 Uninitialized scalar field
2. uninit_member: Non-static class member field head.op is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member field head.result is not initialized
in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member field head.mdsmap_epoch is not
initialized in this constructor nor in any functions that it calls.
8. uninit_member: Non-static class member field head.safe is not initialized
 in this constructor nor in any functions that it calls.
10. uninit_member: Non-static class member field head.is_dentry is not
initialized in this constructor nor in any functions that it calls.
CID 717263 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
12. uninit_member: Non-static class member field head.is_target is not
initialized in this constructor nor in any functions that it calls.

** 717264 Uninitialized scalar field
2. uninit_member: Non-static class member rdev is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member version is not initialized
in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member xattr_version is not
initialized in this constructor nor in any functions that it calls.
8. uninit_member: Non-static class member field cap.caps is not
initialized in this constructor nor in any functions that it calls.
10. uninit_member: Non-static class member field cap.wanted is not
initialized in this constructor nor in any functions that it calls.
12. uninit_member: Non-static class member field cap.cap_id is not
initialized in this constructor nor in any functions that it calls.
14. uninit_member: Non-static class member field cap.seq is not
initialized in this constructor nor in any functions that it calls.
16. uninit_member: Non-static class member field cap.mseq is not
initialized in this constructor nor in any functions that it calls.
18. uninit_member: Non-static class member field cap.realm is not
initialized in this constructor nor in any functions that it calls.
20. uninit_member: Non-static class member field cap.flags is not
initialized in this constructor nor in any functions that it calls.
22. uninit_member: Non-static class member time_warp_seq is not
initialized in this constructor nor in any functions that it calls.
24. uninit_member: Non-static class member size is not initialized
in this constructor nor in any functions that it calls.
26. uninit_member: Non-static class member max_size is not initialized
in this constructor nor in any functions that it calls.
28. uninit_member: Non-static class member change_attr is not initialized
in this constructor nor in any functions that it calls.
30. uninit_member: Non-static class member truncate_size is not
initialized in this constructor nor in any functions that it calls.
32. uninit_member: Non-static class member truncate_seq is not initialized
in this constructor nor in any functions that it calls.
34. uninit_member: Non-static class member mode is not initialized in
this constructor nor in any functions that it calls.
36. uninit_member: Non-static class member uid is not initialized in this
constructor nor in any functions that it calls.
38. uninit_member: Non-static class member gid is not initialized in this
constructor nor in any functions that it calls.
40. uninit_member: Non-static class member nlink is not initialized in
this constructor nor in any functions that it calls.
42. uninit_member: Non-static class member field dir_layout.dl_dir_hash
is not initialized in this constructor nor in any functions that it calls.
44. uninit_member: Non-static class member field dir_layout.dl_unused1
is not initialized in this constructor nor in any functions that it calls.
46. uninit_member: Non-static class member field dir_layout.dl_unused2 is
not initialized in this constructor nor in any functions that it calls.
48. uninit_member: Non-static class member field dir_layout.dl_unused3 is
not initialized in this constructor nor in any functions that it calls.
CID 717264 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
50. uninit_member: Non-static class member inline_version is not
initialized in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Nov 27, 2017
Fixes the coverity issue:

> CID 1395794 (#1 of 1): Wrong size argument (SIZEOF_MISMATCH)
> suspicious_sizeof: Passing argument &ch of type int * and
> argument 1UL to function read is suspicious because sizeof
> (int) /*4*/ is expected.

Signed-off-by: Kefu Chai <kchai@redhat.com>
batrick pushed a commit that referenced this pull request Dec 12, 2017
Fixes the coverity issues:

2. uninit_member: Non-static class member start_offset
is not initialized in this constructor nor in any functions
that it calls.

CID 1424396 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member last_offset is not
initialized in this constructor nor in any functions that it calls.

CID 1424658 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member offset is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar <amitkuma@redhat.com>
batrick pushed a commit that referenced this pull request Dec 12, 2017
Fixes the coverity issue:
2. uninit_member: Non-static class member offset is not
initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member length is not
initialized in this constructor nor in any functions that it calls.

CID 1424433 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member tgt_offset is not initialized
in this constructor nor in any functions that it calls.

Signed-off-by: Amit Kumar amitkuma@redhat.com
batrick pushed a commit that referenced this pull request Apr 5, 2019
New features:
* --jobs: start multiple client messengers from core #1 ~ #jobs;
* --core: can assign server core to get away from busy client cores;
* --rounds: a client will send <rounds>/<jobs> messages;

Improved:
* Better configuration report;
* Report individual client results plus a summary;
* Validate if CPU number is sufficient before running;
* Sleep 1 second while connecting, so it won't hurt performance;
* Simplify client logic and bug fixes;

Removed unecessary features:
* finish_decode() for MOSDOp;
* keepalive ratio;

Signed-off-by: Yingxin Cheng <yingxincheng@gmail.com>
batrick pushed a commit that referenced this pull request May 16, 2019
Fixes the coverity issues:

** 1355581 Uninitialized scalar field
CID 1355581 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member cur_shard is not initialized in
this constructor nor in any functions that it calls.

** 1356907 Uninitialized scalar field
2. uninit_member: Non-static class member field mtime.tv_sec is not initialized
in this constructor nor in any functions that it calls.
CID 1356907 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member field mtime.tv_nsec is not initialized
in this constructor nor in any functions that it calls

Signed-off-by: Amit Kumar amitkuma@redhat.com
batrick pushed a commit that referenced this pull request Jun 19, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.

<error>
  <unique>0x2399</unique>
  <tid>11</tid>
  <threadname>msgr-worker-1</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0x5366B18</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&amp;&amp;, unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>274</line>
    </frame>
    <frame>
      <ip>0x5355E60</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1311</line>
    </frame>
    <frame>
      <ip>0x533E2A3</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>45</line>
    </frame>
    <frame>
      <ip>0x534FB1C</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)::{lambda()#2}::operator()()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2739</line>
    </frame>
    <frame>
      <ip>0x534FF57</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2745</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke_impl&lt;void, ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke&lt;ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>95</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__call&lt;void, 0&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>400</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()&lt;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>484</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::C_submit_event&lt;std::_Bind&lt;ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3} (ConnectedSocket)&gt; &gt;::do_request(unsigned long)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.h</file>
      <line>227</line>
    </frame>
    <frame>
      <ip>0x535FCD6</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.cc</file>
      <line>441</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Stack.cc</file>
      <line>53</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0x55F519E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>execute_native_thread_routine</fn>
    </frame>
    <frame>
      <ip>0x1076BDD4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x118E0EAC</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

Signed-off-by: Sage Weil <sage@redhat.com>
batrick pushed a commit that referenced this pull request Jul 1, 2019
We just took the curmap ref above; do not call get_osdmap() again.

I think it may explain a weird segv I saw here in ~shared_ptr, although
I'm not quite certain.  Regardless, this change is correct and better.

(gdb) bt
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00005596e5a98261 in reraise_fatal (signum=11) at ./src/global/signal_handler.cc:326
#2  handle_fatal_signal(int) () at ./src/global/signal_handler.cc:326
#3  <signal handler called>
#4  0x00005596f4fe80e0 in ?? ()
#5  0x00005596e5464068 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#6  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5596f4b7cf60) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#7  0x00005596e543377f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f2b25044e28, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#8  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
ceph#9  std::shared_ptr<OSDMap const>::~shared_ptr (this=0x7f2b25044e20, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
ceph#10 OSD::handle_osd_ping(MOSDPing*) () at ./src/osd/OSD.cc:4662

Signed-off-by: Sage Weil <sage@redhat.com>
batrick pushed a commit that referenced this pull request Jul 29, 2019
CID 180671 (#1 of 1): Dereference null return value (NULL_RETURNS)

Signed-off-by: songweibin <song.weibin@zte.com.cn>
batrick pushed a commit that referenced this pull request Nov 8, 2019
I am seeing this trace, which matches except for the
'fun:_ZN15AsyncConnection7processEv' frame.

<error>
  <unique>0x2399</unique>
  <tid>11</tid>
  <threadname>msgr-worker-1</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0x5366B18</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v14_2_0::list&amp;&amp;, unsigned int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>crypto_onwire.cc</file>
      <line>274</line>
    </frame>
    <frame>
      <ip>0x5355E60</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v14_2_0::ptr_node, ceph::buffer::v14_2_0::ptr_node::disposer&gt;&amp;&amp;, int)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>1311</line>
    </frame>
    <frame>
      <ip>0x533E2A3</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>45</line>
    </frame>
    <frame>
      <ip>0x534FB1C</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)::{lambda()#2}::operator()()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2739</line>
    </frame>
    <frame>
      <ip>0x534FF57</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3}::operator()(ConnectedSocket&amp;)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>ProtocolV2.cc</file>
      <line>2745</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke_impl&lt;void, ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>60</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__invoke&lt;ProtocolV2::reuse_connection(const AsyncConnectionRef&amp;, ProtocolV2*)::&lt;lambda(ConnectedSocket&amp;)&gt;&amp;, ConnectedSocket&amp;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>invoke.h</file>
      <line>95</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>__call&lt;void, 0&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>400</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()&lt;&gt;</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8</dir>
      <file>functional</file>
      <line>484</line>
    </frame>
    <frame>
      <ip>0x535001E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::C_submit_event&lt;std::_Bind&lt;ProtocolV2::reuse_connection(boost::intrusive_ptr&lt;AsyncConnection&gt; const&amp;, ProtocolV2*)::{lambda(ConnectedSocket&amp;)#3} (ConnectedSocket)&gt; &gt;::do_request(unsigned long)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.h</file>
      <line>227</line>
    </frame>
    <frame>
      <ip>0x535FCD6</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*)</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Event.cc</file>
      <line>441</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>operator()</fn>
      <dir>/usr/src/debug/ceph-15.0.0-1717-g8d72af7/src/msg/async</dir>
      <file>Stack.cc</file>
      <line>53</line>
    </frame>
    <frame>
      <ip>0x5365086</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>std::_Function_handler&lt;void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}&gt;::_M_invoke(std::_Any_data const&amp;)</fn>
      <dir>/opt/rh/devtoolset-8/root/usr/include/c++/8/bits</dir>
      <file>std_function.h</file>
      <line>297</line>
    </frame>
    <frame>
      <ip>0x55F519E</ip>
      <obj>/usr/lib64/ceph/libceph-common.so.0</obj>
      <fn>execute_native_thread_routine</fn>
    </frame>
    <frame>
      <ip>0x1076BDD4</ip>
      <obj>/usr/lib64/libpthread-2.17.so</obj>
      <fn>start_thread</fn>
    </frame>
    <frame>
      <ip>0x118E0EAC</ip>
      <obj>/usr/lib64/libc-2.17.so</obj>
      <fn>clone</fn>
    </frame>
  </stack>
</error>

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit f019fc0)
batrick pushed a commit that referenced this pull request Jan 14, 2020
otherwise ODR is violated:

==449025==ERROR: AddressSanitizer: odr-violation (0x000000f03700):
  [1] size=8 'g_ceph_context' ../src/global/global_context.cc:24:14
  [2] size=8 'g_ceph_context' ../src/global/global_context.cc:24:14
These globals were registered at these points:
  [1]:
    #0 0x4779bd in __asan_register_globals (/var/ssd/ceph/clang-build/bin/ceph-conf+0x4779bd)
    #1 0x56e9cb in asan.module_ctor (/var/ssd/ceph/clang-build/bin/ceph-conf+0x56e9cb)

  [2]:
    #0 0x4779bd in __asan_register_globals (/var/ssd/ceph/clang-build/bin/ceph-conf+0x4779bd)
    #1 0x7fe5fed12aeb in asan.module_ctor (/var/ssd/ceph/clang-build/lib/libceph-common.so.2+0x2f34aeb)

==449025==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0

Signed-off-by: Kefu Chai <kchai@redhat.com>
batrick pushed a commit that referenced this pull request Feb 21, 2020
Accordingly to cppreference.com [1]:

  "If multiple threads of execution access the same std::shared_ptr
  object without synchronization and any of those accesses uses
  a non-const member function of shared_ptr then a data race will
  occur (...)"

[1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:

  ```
  [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  #1  0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #3  0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #4  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #5  std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
  #6  OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  #8  0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
  ceph#9  0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
  ceph#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
  ceph#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
  ceph#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
  ceph#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
  ceph#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
  ceph#15 0x0000000000000000 in ?? ()
  (gdb) frame 7
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  9665      in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
  (gdb) print osdmap
  $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
  (gdb) print *osdmap
     # pretty sane OSDMap
  (gdb) print sizeof(osdmap)
  $26 = 16
  (gdb) x/2a &osdmap
  0x559ca22acef0:   0x559cba028000  0x559cba0ec900

  (gdb) frame 2
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  148       /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
  (gdb) disassemble
  Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
  ...
     0x0000559c97675b1e <+62>:      mov    (%rdi),%rax
     0x0000559c97675b21 <+65>:      mov    %rdi,%rbx
     0x0000559c97675b24 <+68>:      callq  *0x10(%rax)
  => 0x0000559c97675b27 <+71>:      test   %rbp,%rbp
  ...
  End of assembler dump.
  (gdb) info registers rdi rbx rax
  rdi            0x559cba0ec900      94131624790272
  rbx            0x559cba0ec900      94131624790272
  rax            0x559cba0ec8a0      94131624790176
  (gdb) x/a 0x559cba0ec8a0 + 0x10
  0x559cba0ec8b0:   0x559cb81c3ea0
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  ...
  (gdb) p $_siginfo._sifields._sigfault.si_addr
  $27 = (void *) 0x559cb81c3ea0
  ```

Helgrind seems to agree:
  ```
  ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90
  ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:02:54.519 510301==    at 0x7218DD: operator= (shared_ptr_base.h:1078)
  ==00:00:02:54.519 510301==    by 0x7218DD: operator= (shared_ptr.h:103)
  ==00:00:02:54.519 510301==    by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:02:54.519 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:02:54.519 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:02:54.519 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:02:54.519 510301==
  ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117
  ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
  ==00:00:02:54.519 510301==    at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
  ==00:00:02:54.519 510301==    by 0x6B5842: shared_ptr (shared_ptr.h:129)
  ==00:00:02:54.519 510301==    by 0x6B5842: get_osdmap (OSD.h:1700)
  ==00:00:02:54.519 510301==    by 0x6B5842: OSD::create_context() (OSD.cc:9053)
  ==00:00:02:54.519 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:02:54.519 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:02:54.519 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:02:54.519 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:02:54.519 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:02:54.519 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==  Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
  ==00:00:02:54.519 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:02:54.519 510301==    by 0x66F766: main (ceph_osd.cc:688)
  ==00:00:02:54.519 510301==  Block was alloc'd by thread #1
  ```

Actually there is plenty of similar issues reported like:
  ```
  ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119
  ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
  ==00:00:05:04.903 510301==    at 0x753165: clear (hashtable.h:2051)
  ==00:00:05:04.903 510301==    by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
  >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
  il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
  _base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
  ==00:00:05:04.903 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:05:04.903 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:05:04.903 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:05:04.903 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:05:04.903 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:05:04.903 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:05:04.903 510301==
  ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90
  ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:05:04.903 510301==    at 0x7531E1: clear (hashtable.h:2054)
  ==00:00:05:04.903 510301==    by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:747)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:1078)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==  Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
  ==00:00:05:04.903 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:05:04.903 510301==    by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:699)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:1732)
  ==00:00:05:04.903 510301==    by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request May 5, 2020
* no need to discard_result(). as `output_stream::close()` returns an
  empty future<> already
* free the connected socket after the background task finishes, because:

we should not free the connected socket before the promise referencing it is fulfilled.

otherwise we have error messages from ASan, like

==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078
READ of size 8 at 0x611000019aa0 thread T0
    #0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396
    #1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428
    #2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397)
    #3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789
    #4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687
    #5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
    #6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113
    #7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
    #8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
    ceph#9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713
    ceph#10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199
    ceph#11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148
    ceph#12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308
    ceph#13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9)

0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30)
freed by thread T0 here:
    #0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487)
    #1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458
    #2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524
    #3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396
    #4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401
    #5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98
    #6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109
    #7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    #8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
    ceph#9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81
    ceph#10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr()
	/usr/include/c++/10/bits/unique_ptr.h:357    ceph#11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195
    ceph#12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260
    ceph#13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280
    ceph#14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432
    ceph#15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975
    ceph#16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265
    ceph#17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450
    ceph#18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36

Signed-off-by: Kefu Chai <kchai@redhat.com>
batrick pushed a commit that referenced this pull request May 5, 2020
This fixes the the following selinux error when using ceph-iscsi's
rbd-target-api daemon (rbd-target-gw has the same issue). They are
a result of the a python library, rtslib, which the daemons use.

Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:configfs_t:s0
Target Objects
/sys/kernel/config/target/iscsi/iqn.2003-01.com.re
                              dhat:ceph-iscsi/tpgt_1/attrib/authentication
[
                              file ]
Source                        rbd-target-api
Source Path                   /usr/libexec/platform-python3.6
Port                          <Unknown>
Host                          ans8
Source RPM Packages           platform-python-3.6.8-15.1.el8.x86_64
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:47 EST
Last Seen                     2020-01-08 18:39:47 EST
Local ID                      6f8c3415-7a50-4dc8-b3d2-2621e1d00ca3

Raw Audit Messages
type=AVC msg=audit(1578526787.577:68): avc:  denied  { ioctl } for
pid=995 comm="rbd-target-api"
path="/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat:ceph-iscsi/tpgt_1/attrib/authentication"
dev="configfs" ino=25703 ioctlcmd=0x5401
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:configfs_t:s0 tclass=file permissive=1

type=SYSCALL msg=audit(1578526787.577:68): arch=x86_64 syscall=ioctl
success=no exit=ENOTTY a0=34 a1=5401 a2=7ffd4f8f1f60 a3=3052cd2d95839b96
items=0 ppid=1 pid=995 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm=rbd-target-api
exe=/usr/libexec/platform-python3.6 subj=system_u:system_r:ceph_t:s0
key=(null)

Hash: rbd-target-api,ceph_t,configfs_t,file,ioctl

Signed-off-by: Mike Christie <mchristi@redhat.com>
batrick pushed a commit that referenced this pull request May 5, 2020
This fixes the selinux errors like this for /etc/target

-----------------------------------
Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:targetd_etc_rw_t:s0
Target Objects                target [ dir ]
Source                        rbd-target-api
Source Path                   rbd-target-api
Port                          <Unknown>
Host                          ans8
Source RPM Packages
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:48 EST
Last Seen                     2020-01-08 18:39:48 EST
Local ID                      9a13ee18-eaf2-4f2a-872f-2809ee4928f6

Raw Audit Messages
type=AVC msg=audit(1578526788.148:69): avc:  denied  { search } for
pid=995 comm="rbd-target-api" name="target" dev="sda1" ino=52198
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:targetd_etc_rw_t:s0 tclass=dir permissive=1

Hash: rbd-target-api,ceph_t,targetd_etc_rw_t,dir,search

which are a result of the rtslib library the ceph-iscsi daemons use
accessing /etc/target to read/write a file which stores meta data the
target uses.

Signed-off-by: Mike Christie <mchristi@redhat.com>
batrick pushed a commit that referenced this pull request Jul 30, 2020
Accordingly to cppreference.com [1]:

  "If multiple threads of execution access the same std::shared_ptr
  object without synchronization and any of those accesses uses
  a non-const member function of shared_ptr then a data race will
  occur (...)"

[1]: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic

One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:

  ```
  [Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  #1  0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  #3  0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #4  std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
  #5  std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
  #6  OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  #8  0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
  ceph#9  0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
  ceph#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
  ceph#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
  ceph#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
  ceph#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
  ceph#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
  ceph#15 0x0000000000000000 in ?? ()
  (gdb) frame 7
  #7  0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
      at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
  9665      in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
  (gdb) print osdmap
  $24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
  (gdb) print *osdmap
     # pretty sane OSDMap
  (gdb) print sizeof(osdmap)
  $26 = 16
  (gdb) x/2a &osdmap
  0x559ca22acef0:   0x559cba028000  0x559cba0ec900

  (gdb) frame 2
  #2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
  148       /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
  (gdb) disassemble
  Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
  ...
     0x0000559c97675b1e <+62>:      mov    (%rdi),%rax
     0x0000559c97675b21 <+65>:      mov    %rdi,%rbx
     0x0000559c97675b24 <+68>:      callq  *0x10(%rax)
  => 0x0000559c97675b27 <+71>:      test   %rbp,%rbp
  ...
  End of assembler dump.
  (gdb) info registers rdi rbx rax
  rdi            0x559cba0ec900      94131624790272
  rbx            0x559cba0ec900      94131624790272
  rax            0x559cba0ec8a0      94131624790176
  (gdb) x/a 0x559cba0ec8a0 + 0x10
  0x559cba0ec8b0:   0x559cb81c3ea0
  (gdb) bt
  #0  0x0000559cb81c3ea0 in ?? ()
  ...
  (gdb) p $_siginfo._sifields._sigfault.si_addr
  $27 = (void *) 0x559cb81c3ea0
  ```

Helgrind seems to agree:
  ```
  ==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread ceph#90
  ==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:02:54.519 510301==    at 0x7218DD: operator= (shared_ptr_base.h:1078)
  ==00:00:02:54.519 510301==    by 0x7218DD: operator= (shared_ptr.h:103)
  ==00:00:02:54.519 510301==    by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:02:54.519 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:02:54.519 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:02:54.519 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:02:54.519 510301==
  ==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread ceph#117
  ==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
  ==00:00:02:54.519 510301==    at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
  ==00:00:02:54.519 510301==    by 0x6B5842: shared_ptr (shared_ptr.h:129)
  ==00:00:02:54.519 510301==    by 0x6B5842: get_osdmap (OSD.h:1700)
  ==00:00:02:54.519 510301==    by 0x6B5842: OSD::create_context() (OSD.cc:9053)
  ==00:00:02:54.519 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:02:54.519 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:02:54.519 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:02:54.519 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:02:54.519 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:02:54.519 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:02:54.519 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:02:54.519 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:02:54.519 510301==  Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
  ==00:00:02:54.519 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:02:54.519 510301==    by 0x66F766: main (ceph_osd.cc:688)
  ==00:00:02:54.519 510301==  Block was alloc'd by thread #1
  ```

Actually there is plenty of similar issues reported like:
  ```
  ==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread ceph#119
  ==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
  ==00:00:05:04.903 510301==    at 0x753165: clear (hashtable.h:2051)
  ==00:00:05:04.903 510301==    by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
  >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
  il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
  _base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
  ==00:00:05:04.903 510301==    by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
  ==00:00:05:04.903 510301==    by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
  ==00:00:05:04.903 510301==    by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
  ==00:00:05:04.903 510301==    by 0x70E62E: run (OpSchedulerItem.h:148)
  ==00:00:05:04.903 510301==    by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
  ==00:00:05:04.903 510301==    by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
  ==00:00:05:04.903 510301==    by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ==00:00:05:04.903 510301==
  ==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread ceph#90
  ==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
  ==00:00:05:04.903 510301==    at 0x7531E1: clear (hashtable.h:2054)
  ==00:00:05:04.903 510301==    by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
  ==00:00:05:04.903 510301==    by 0x75331C: ~unordered_map (unordered_map.h:102)
  ==00:00:05:04.903 510301==    by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
  ==00:00:05:04.903 510301==    by 0x753606: operator() (shared_cache.hpp:100)
  ==00:00:05:04.903 510301==    by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
  ==00:00:05:04.903 510301==    by 0x73BB26: _M_release (shared_ptr_base.h:155)
  ==00:00:05:04.903 510301==    by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:747)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr_base.h:1078)
  ==00:00:05:04.903 510301==    by 0x72191E: operator= (shared_ptr.h:103)
  ==00:00:05:04.903 510301==    by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==  Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
  ==00:00:05:04.903 510301==    at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
  ==00:00:05:04.903 510301==    by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:699)
  ==00:00:05:04.903 510301==    by 0x7213BD: get_map (OSD.h:1732)
  ==00:00:05:04.903 510301==    by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
  ==00:00:05:04.903 510301==    by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
  ==00:00:05:04.903 510301==    by 0x72A06C: Context::complete(int) (Context.h:77)
  ==00:00:05:04.903 510301==    by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
  ==00:00:05:04.903 510301==    by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
  ==00:00:05:04.903 510301==    by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
  ==00:00:05:04.903 510301==    by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
  ```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 80da5f9)

Conflicts:
	src/osd/OSD.cc in
		bool OSD::asok_command
		int OSD::shutdown
		void OSD::maybe_update_heartbeat_peers
		void OSD::_preboot
		void OSD::queue_want_up_thru
		void OSD::send_alive
		void OSD::send_failures
		void OSD::send_beacon
		MPGStats* OSD::collect_pg_stats
		void OSD::note_down_osd
		void OSD::consume_map
		void OSD::activate_map
	src/osd/OSD.h in
		private: dispatch_session_waiting

- also use the new const OSDMapRef in places that no longer exist in master
	src/osd/OSD.cc in
		void OSDService::share_map
		void OSDService::send_incremental_map
		int OSD::_do_command
		void OSD::note_up_osd
		int OSD::init_op_flags
	src/osd/OSD.h in
		void send_incremental_map
		void share_map
batrick pushed a commit that referenced this pull request Aug 26, 2020
Changes addressing comments in PR - commit to be
squashed prior to merge

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
batrick pushed a commit that referenced this pull request Aug 27, 2020
This fixes the the following selinux error when using ceph-iscsi's
rbd-target-api daemon (rbd-target-gw has the same issue). They are
a result of the a python library, rtslib, which the daemons use.

Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:configfs_t:s0
Target Objects
/sys/kernel/config/target/iscsi/iqn.2003-01.com.re
                              dhat:ceph-iscsi/tpgt_1/attrib/authentication
[
                              file ]
Source                        rbd-target-api
Source Path                   /usr/libexec/platform-python3.6
Port                          <Unknown>
Host                          ans8
Source RPM Packages           platform-python-3.6.8-15.1.el8.x86_64
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:47 EST
Last Seen                     2020-01-08 18:39:47 EST
Local ID                      6f8c3415-7a50-4dc8-b3d2-2621e1d00ca3

Raw Audit Messages
type=AVC msg=audit(1578526787.577:68): avc:  denied  { ioctl } for
pid=995 comm="rbd-target-api"
path="/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat:ceph-iscsi/tpgt_1/attrib/authentication"
dev="configfs" ino=25703 ioctlcmd=0x5401
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:configfs_t:s0 tclass=file permissive=1

type=SYSCALL msg=audit(1578526787.577:68): arch=x86_64 syscall=ioctl
success=no exit=ENOTTY a0=34 a1=5401 a2=7ffd4f8f1f60 a3=3052cd2d95839b96
items=0 ppid=1 pid=995 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm=rbd-target-api
exe=/usr/libexec/platform-python3.6 subj=system_u:system_r:ceph_t:s0
key=(null)

Hash: rbd-target-api,ceph_t,configfs_t,file,ioctl

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 8187235)
batrick pushed a commit that referenced this pull request Aug 27, 2020
This fixes the selinux errors like this for /etc/target

-----------------------------------
Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:targetd_etc_rw_t:s0
Target Objects                target [ dir ]
Source                        rbd-target-api
Source Path                   rbd-target-api
Port                          <Unknown>
Host                          ans8
Source RPM Packages
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:48 EST
Last Seen                     2020-01-08 18:39:48 EST
Local ID                      9a13ee18-eaf2-4f2a-872f-2809ee4928f6

Raw Audit Messages
type=AVC msg=audit(1578526788.148:69): avc:  denied  { search } for
pid=995 comm="rbd-target-api" name="target" dev="sda1" ino=52198
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:targetd_etc_rw_t:s0 tclass=dir permissive=1

Hash: rbd-target-api,ceph_t,targetd_etc_rw_t,dir,search

which are a result of the rtslib library the ceph-iscsi daemons use
accessing /etc/target to read/write a file which stores meta data the
target uses.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 53be181)

Conflicts:
	selinux/ceph.te: trivial resolution
batrick pushed a commit that referenced this pull request Sep 18, 2020
This fixes the the following selinux error when using ceph-iscsi's
rbd-target-api daemon (rbd-target-gw has the same issue). They are
a result of the a python library, rtslib, which the daemons use.

Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:configfs_t:s0
Target Objects
/sys/kernel/config/target/iscsi/iqn.2003-01.com.re
                              dhat:ceph-iscsi/tpgt_1/attrib/authentication
[
                              file ]
Source                        rbd-target-api
Source Path                   /usr/libexec/platform-python3.6
Port                          <Unknown>
Host                          ans8
Source RPM Packages           platform-python-3.6.8-15.1.el8.x86_64
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:47 EST
Last Seen                     2020-01-08 18:39:47 EST
Local ID                      6f8c3415-7a50-4dc8-b3d2-2621e1d00ca3

Raw Audit Messages
type=AVC msg=audit(1578526787.577:68): avc:  denied  { ioctl } for
pid=995 comm="rbd-target-api"
path="/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat:ceph-iscsi/tpgt_1/attrib/authentication"
dev="configfs" ino=25703 ioctlcmd=0x5401
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:configfs_t:s0 tclass=file permissive=1

type=SYSCALL msg=audit(1578526787.577:68): arch=x86_64 syscall=ioctl
success=no exit=ENOTTY a0=34 a1=5401 a2=7ffd4f8f1f60 a3=3052cd2d95839b96
items=0 ppid=1 pid=995 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm=rbd-target-api
exe=/usr/libexec/platform-python3.6 subj=system_u:system_r:ceph_t:s0
key=(null)

Hash: rbd-target-api,ceph_t,configfs_t,file,ioctl

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 8187235)
batrick pushed a commit that referenced this pull request Sep 18, 2020
This fixes the selinux errors like this for /etc/target

-----------------------------------
Additional Information:
Source Context                system_u:system_r:ceph_t:s0
Target Context                system_u:object_r:targetd_etc_rw_t:s0
Target Objects                target [ dir ]
Source                        rbd-target-api
Source Path                   rbd-target-api
Port                          <Unknown>
Host                          ans8
Source RPM Packages
Target RPM Packages
Policy RPM                    selinux-policy-3.14.3-20.el8.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     ans8
Platform                      Linux ans8 4.18.0-147.el8.x86_64 #1 SMP
Thu Sep 26
                              15:52:44 UTC 2019 x86_64 x86_64
Alert Count                   1
First Seen                    2020-01-08 18:39:48 EST
Last Seen                     2020-01-08 18:39:48 EST
Local ID                      9a13ee18-eaf2-4f2a-872f-2809ee4928f6

Raw Audit Messages
type=AVC msg=audit(1578526788.148:69): avc:  denied  { search } for
pid=995 comm="rbd-target-api" name="target" dev="sda1" ino=52198
scontext=system_u:system_r:ceph_t:s0
tcontext=system_u:object_r:targetd_etc_rw_t:s0 tclass=dir permissive=1

Hash: rbd-target-api,ceph_t,targetd_etc_rw_t,dir,search

which are a result of the rtslib library the ceph-iscsi daemons use
accessing /etc/target to read/write a file which stores meta data the
target uses.

Signed-off-by: Mike Christie <mchristi@redhat.com>
(cherry picked from commit 53be181)
batrick pushed a commit that referenced this pull request Sep 18, 2020
Changes addressing comments in PR - commit to be
squashed prior to merge

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 60e110f)
batrick pushed a commit that referenced this pull request May 11, 2021
Otherwise, if we assert, we'll hang here:

Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)):
#0  0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54

#3  0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78
#4  0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0)
    at ./src/googletest/googletest/src/gtest.cc:2605
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body")
    at ./src/googletest/googletest/src/gtest.cc:2641
#6  0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680
#7  0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858
#8  0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012
ceph#9  0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723
ceph#10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)",
    method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605
ceph#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>,
    location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641
ceph#12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306

Signed-off-by: Sage Weil <sage@newdream.net>
batrick pushed a commit that referenced this pull request May 11, 2021
…dling.

In `crimson/osd/main.cc` we instruct Seastar to handle `SIGHUP`.

```
        // just ignore SIGHUP, we don't reread settings
        seastar::engine().handle_signal(SIGHUP, [] {})
```

This happens using the Seastar's signal handling infrastructure
which is incompliant with the alien world.

```
void
reactor::signals::handle_signal(int signo, noncopyable_function<void ()>&& handler) {
    // ...
    struct sigaction sa;
    sa.sa_sigaction = [](int sig, siginfo_t *info, void *p) {
        engine()._backend->signal_received(sig, info, p);
    };
    // ...
}
```

```
 extern __thread reactor* local_engine;
extern __thread size_t task_quota;

inline reactor& engine() {
    return *local_engine;
}
```

The low-level signal handler above assumes `local_engine._backend`
is not null which stays true only for threads from the S*'s world.
Unfortunately, as we don't block the `SIGHUP` for alien threads,
kernel is perfectly authorized to pick up one them to run the handler
leading to weirdly-looking segfaults like this one:

```
INFO  2021-04-23 07:06:57,807 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:58,753 [shard 0] ms - [osd.1(client) v2:172.21.15.100:6802/30478@51064 >> mgr.4105 v2:172.21.15.109:6800/29891] --> #7 === pg_stats(0 pgs seq 55834574872 v 0) v2 (87)
...
INFO  2021-04-23 07:06:58,813 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,753 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO  2021-04-23 07:06:59,753 [shard 0] osd - asok response length: 2947
INFO  2021-04-23 07:06:59,817 [shard 0] bluestore - stat
DEBUG 2021-04-23 07:06:59,865 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO  2021-04-23 07:06:59,866 [shard 0] osd - asok response length: 2947
DEBUG 2021-04-23 07:07:00,020 [shard 0] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "get_command_descriptions"}
INFO  2021-04-23 07:07:00,020 [shard 0] osd - asok response length: 2947
INFO  2021-04-23 07:07:00,820 [shard 0] bluestore - stat
...
Backtrace:
 0# 0x00005600CD0D6AAF in ceph-osd
 1# FatalSignal::signaled(int) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
 4# 0x00005600CD830B81 in ceph-osd
 5# 0x00007F5877C7EB20 in /lib64/libpthread.so.0
 6# pthread_cond_timedwait in /lib64/libpthread.so.0
 7# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
 8# 0x00007F5877999BA3 in /lib64/libstdc++.so.6
 9# 0x00007F5877C7414A in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
daemon-helper: command crashed with signal 11
```

Ultimately, it turned out the thread came out from a syscall (`futex`)
and started crunching the `SIGHUP` handler's code in which a nullptr
dereference happened.

This patch blocks `SIGHUP` for all threads spawned by `AlienStore`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request May 11, 2021
An attempt to `Connection::do_auth()` may finish in one of three states:
_success_, _failure_ and _cancellation_. Unfortunately, its callers were
missing the third treating cancellation like a failure. This was the root
cause of the following failure at Sepia:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-06_22:08:43-rados-master-distro-basic-smithi/6102605$ less ./remote/smithi204/log/ceph-osd.3.log.gz
...
WARN  2021-05-06 22:35:40,464 [shard 0] osd - ms_handle_reset
...
INFO  2021-05-06 22:35:40,465 [shard 0] monc - do_auth_single: connection closed
INFO  2021-05-06 22:35:40,465 [shard 0] ms - [osd.3(client) v2:172.21.15.204:6808/31418@57568 >> mon.? v2:172.21.15.204:3300/0] execute_connecting(): protocol aborted at CLOSING -- std::system_error (error crimson::net:6, protocol aborted)
...
ERROR 2021-05-06 22:35:40,465 [shard 0] osd - mon.osd.3 dispatch() ms_handle_reset caught exception: std::system_error (error crimson::net:3, negotiation failure)
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3909-g81233a18/rpm/el8/BUILD/ceph-17.0.0-3909-g81233a18/src/crimson/common/gated.h:36: crimson::common::Gated::dispatch(const char*, T&, Func&&) [with Func = crimson::mon::Client::ms_handle_reset(crimson::net::ConnectionRef, bool)::<lambda()>&; T = crimson::mon::Client]::<lambda(std::__exception_ptr::exception_ptr)>: Assertion `*eptr.__cxa_exception_type() == typeid(seastar::gate_closed_exception)' failed.
Aborting on shard 0.
Backtrace:
 0# 0x00005618C973932F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F7BB592EB20 in /lib64/libpthread.so.0
 4# gsignal in /lib64/libc.so.6
 5# abort in /lib64/libc.so.6
 6# 0x00007F7BB3F29B09 in /lib64/libc.so.6
 7# 0x00007F7BB3F37DE6 in /lib64/libc.so.6
 8# 0x00005618C9FF295C in ceph-osd
 9# 0x00005618C3907313 in ceph-osd
10# 0x00005618CCA2F84F in ceph-osd
11# 0x00005618CCA34D90 in ceph-osd
12# 0x00005618CCBEC9BB in ceph-osd
13# 0x00005618CC744E9A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
daemon-helper: command crashed with signal 6
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request May 18, 2021
The `send_message()` method is a high-level facility for
communicating with a monitor. If there is an active conn
available, it sends the message immediately; otherwise
the message is queued. This method assumes the queue is
already drained if the connection is available.

`active_con` is managed by `reopen_session()` where it's
first cleared and then reset after finding new alive mon.
This is followed by draining the `pending_messages` queue
which happens in `on_session_opened()` after the `MAuth`
exchange is finished.

Unfortunately, the path from the `active_con` clearing
to draining the queue is long and divided into multiple
continuations which results in lack of atomicity. When
e.g. `run_command()` interleaves the stages, following
crash happens:

```
INFO  2021-05-07 08:13:43,914 [shard 0] monc - do_auth_single: mon v2:172.21.15.82:6805/34166 => v2:172.21.15.82:3300/0 returns auth_reply(proto 2 0 (0) Success) v1: 0
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph-17.0.0-3910-g1b18e076/src/crimson/mon/MonClient.cc:1034: seastar::future<> crimson::mon::Client::send_message(MessageRef): Assertion `pending_messages.empty()' failed.
Aborting on shard 0.
Backtrace:
 0# 0x000055CDE6DB532F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007FC1BF20BB20 in /lib64/libpthread.so.0
 4# gsignal in /lib64/libc.so.6
 5# abort in /lib64/libc.so.6
 6# 0x00007FC1BD806B09 in /lib64/libc.so.6
 7# 0x00007FC1BD814DE6 in /lib64/libc.so.6
 8# crimson::mon::Client::send_message(boost::intrusive_ptr<Message>) in ceph-osd
 9# crimson::mon::Client::renew_subs() in ceph-osd
10# 0x000055CDE764FB0B in ceph-osd
11# 0x000055CDE10457F0 in ceph-osd
12# 0x000055CDEA0AB88F in ceph-osd
13# 0x000055CDEA0B0DD0 in ceph-osd
14# 0x000055CDEA2689FB in ceph-osd
15# 0x000055CDE9DC0EDA in ceph-osd
16# main in ceph-osd
17# __libc_start_main in /lib64/libc.so.6
18# _start in ceph-osd
```

The problem caused following failure at Sepia:
http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-07_07:41:02-rados-master-distro-basic-smithi/6104549

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request May 21, 2021
Otherwise, if we assert, we'll hang here:

Thread 1 (Thread 0x7f74eba79580 (LWP 1688617)):
#0  0x00007f74eb2aa529 in futex_wait (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=132, futex_word=0x7ffd642b4b54) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_cond_destroy (cond=0x7ffd642b4b30) at pthread_cond_destroy.c:54

#3  0x0000563ff2e5a891 in LibRadosService_StatusFormat_Test::TestBody (this=<optimized out>) at /usr/include/c++/7/bits/unique_ptr.h:78
#4  0x0000563ff2e9dc3a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x563ff2ea72e4 "the test body", method=<optimized out>, object=0x563ff422a6d0)
    at ./src/googletest/googletest/src/gtest.cc:2605
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x563ff422a6d0, method=<optimized out>, location=location@entry=0x563ff2ea72e4 "the test body")
    at ./src/googletest/googletest/src/gtest.cc:2641
#6  0x0000563ff2e908c3 in testing::Test::Run (this=0x563ff422a6d0) at ./src/googletest/googletest/src/gtest.cc:2680
#7  0x0000563ff2e90a25 in testing::TestInfo::Run (this=0x563ff41a3b70) at ./src/googletest/googletest/src/gtest.cc:2858
#8  0x0000563ff2e90ec1 in testing::TestSuite::Run (this=0x563ff41b6230) at ./src/googletest/googletest/src/gtest.cc:3012
ceph#9  0x0000563ff2e92bdc in testing::internal::UnitTestImpl::RunAllTests (this=<optimized out>) at ./src/googletest/googletest/src/gtest.cc:5723
ceph#10 0x0000563ff2e9e14a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x563ff2ea8728 "auxiliary test code (environments or event listeners)",
    method=<optimized out>, object=0x563ff41a2d10) at ./src/googletest/googletest/src/gtest.cc:2605
ceph#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x563ff41a2d10, method=<optimized out>,
    location=location@entry=0x563ff2ea8728 "auxiliary test code (environments or event listeners)") at ./src/googletest/googletest/src/gtest.cc:2641
ceph#12 0x0000563ff2e90ae8 in testing::UnitTest::Run (this=0x563ff30c0660 <testing::UnitTest::GetInstance()::instance>) at ./src/googletest/googletest/src/gtest.cc:5306

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit ee5a0c9)
batrick pushed a commit that referenced this pull request Jun 1, 2021
`OpSequencer` assumes that ID of a previous client request
is always lower than ID of current one. This is reflected
by the assertion in `OpSequencer::start_op()`. It triggered
the following failure [1] in Teuthology:

```
DEBUG 2021-05-07 08:01:41,227 [shard 0] osd - client_request(id=1, detail=osd_op(client.4171.0:1 2.2 2.7c339972 (undecoded) ondisk+retry+read+known_if_redirected e29) v8) same_interval_since: 31
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-3910-g1b18e076/rpm/el8/BUILD/ceph-
17.0.0-3910-g1b18e076/src/crimson/osd/osd_operation_sequencer.h:38: seastar::futurize_t<Result> crimson::osd::OpSequencer::start_op(HandleT&, uint64_t, uint64_t, FuncT&&) [with HandleT = crimson::PipelineHa
ndle; FuncT = crimson::interruptible::interruptor<InterruptCond>::wrap_function(Func&&) [with Func = crimson::osd::ClientRequest::start()::<lambda()> mutable::<lambda(Ref<crimson::osd::PG>)> mutable::<lambd
a()> mutable::<lambda()>; InterruptCond = crimson::osd::IOInterruptCondition]::<lambda()>; Result = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<>
>; seastar::futurize_t<Result> = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; uint64_t = long unsigned int]: Assertion `prev_op < this_op' fai
led.
Aborting on shard 0.
Backtrace:
Segmentation fault.
Backtrace:
 0# 0x00005592B028932F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F57B72E7B20 in /lib64/libpthread.so.0
 4# gsignal in /lib64/libc.so.6
 5# abort in /lib64/libc.so.6
 6# 0x00007F57B58E2B09 in /lib64/libc.so.6
 7# 0x00007F57B58F0DE6 in /lib64/libc.so.6
 8# 0x00005592ABB8484D in ceph-osd
 9# 0x00005592ABB8ACB3 in ceph-osd
10# seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<boost::intrusive_ptr<crimson::osd::PG> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&, seastar::future_state<boost::intrusive_ptr<crimson::osd::PG> >&&)#1}, boost::intrusive_ptr<crimson::osd::PG> >::run_and_dispose() in ceph-osd
11# 0x00005592B357F88F in ceph-osd
12# 0x00005592B3584DD0 in ceph-osd
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-07_07:41:02-rados-master-distro-basic-smithi/6104530

Crash analysis resulted in two observations:
1. during the request execution the acting set got
   changed, the request was interrupted and a try
   to re-execute it emerged;
2. the interrupted request was the very first client
   request the OSD has ever seen.

Code analysis showed a problem in how `ClientRequest`
establishes `prev_op_id`: although supposed to be performed
only once for a request, it can get executed twice but only
for the very first request `OpSequencer` saw.

```cpp
void ClientRequest::may_set_prev_op()
{
  // set prev_op_id if it's not set yet
  if (__builtin_expect(prev_op_id == 0, true)) {
    prev_op_id = sequencer.get_last_issued();
  }
}
```

Unfortunately, `0` isn't a distincted value that cannot
be returned by `get_last_issued()`:

```cpp
class OpSequencer {
  // ...

  uint64_t get_last_issued() const {
    return last_issued;
  }

  // ...

  // the id of last op which is issued
  uint64_t last_issued = 0;
```

As a result, `OpSequencer` returned on the second call
a new value (actually `this_op`) violating the assertion.
The commit fixes the problem by switching from a designated
value to `std::optional`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 1, 2021
f7181ab has optimized the client
parallelism. To achieve that `PG::do_osd_ops()` were converted to
return basically future pair of futures. Unfortunately, the life-
time management of `OpsExecuter` was kept intact. In the result,
the object was valid only till fullfying the outer future while,
due to the `rollbacker` instances, it should be available till
`all_completed` becomes available.

This issue can explain the following problem has been observed
in a Teuthology job [1].

```
DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - do_op_call: method returned ret=-17, outdata.length()=0 while num_read=1, num_write=0
DEBUG 2021-05-20 08:03:22,617 [shard 0] osd - rollback_obc_if_modified: object 19:e17d4708:test-rados-api-smithi095-38404-2::foo:head got erro
r generic:17, need_rollback=false
=================================================================
==33626==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000b9320 at pc 0x560f486b8222 bp 0x7fffc467a1e0 sp 0x7fffc467a1d0
READ of size 4 at 0x60d0000b9320 thread T0
    #0 0x560f486b8221  (/usr/bin/ceph-osd+0x2c610221)
    #1 0x560f4880c6b1 in seastar::continuation<seastar::internal::promise_base_with_type<boost::intrusive_ptr<MOSDOpReply> >, seastar::noncopy
able_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_
wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > >
 > ()>, seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::
IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson:
:errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInte
rruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::error
ated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > >(seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail
<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_
future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&&)::{lambda(seastar::internal::promise_base_with_type<boos
t::intrusive_ptr<MOSDOpReply> >&&, seastar::noncopyable_function<crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)11> > >::_future<crimson::errorated_future_marker<boost::intrusive_ptr<MOSDOpReply> > > > ()>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() (/usr/bin/ceph-osd+0x2c7646b1)
    #2 0x560f5352c3ae  (/usr/bin/ceph-osd+0x374843ae)
    #3 0x560f535318ef  (/usr/bin/ceph-osd+0x374898ef)
    #4 0x560f536e395a  (/usr/bin/ceph-osd+0x3763b95a)
    #5 0x560f532413d9  (/usr/bin/ceph-osd+0x371993d9)
    #6 0x560f476af95a in main (/usr/bin/ceph-osd+0x2b60795a)
    #7 0x7f7aa0af97b2 in __libc_start_main (/lib64/libc.so.6+0x237b2)
    #8 0x560f477d2e8d in _start (/usr/bin/ceph-osd+0x2b72ae8d)

```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-20_07:28:16-rados-master-distro-basic-smithi/6124735/

The commit deals with the problem by repacking the outer future.
An alternative could be in switching from `std::unique_ptr` to
`seastar::shared_ptr` for managing `OpsExecuter`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 1, 2021
Following crash occured at Sepia [1]:

```
INFO  2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ
et_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16,
 supported=1, required=0, banner="ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733
INFO  2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x000055E84CF44C1F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_conn() in ceph-osd
 5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
 6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
 7# 0x000055E84DF67669 in ceph-osd
 8# 0x000055E84DF68775 in ceph-osd
 9# 0x000055E846F47F60 in ceph-osd
10# 0x000055E85296770F in ceph-osd
11# 0x000055E85296CC50 in ceph-osd
12# 0x000055E852B1ECBB in ceph-osd
13# 0x000055E85267C73A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
Fault at location: 0x98
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907

When the `handle_auth_request()` happens, there is no guarantee
`active_con` is being available. This is reflected in the classical
implementation:

```cpp
int MonClient::handle_auth_request(
  Connection *con,
  // ...
  ceph::buffer::list *reply)
{
  // ...
  bool isvalid = ah->verify_authorizer(
    cct,
    *rotating_secrets,
    payload,
    auth_meta->get_connection_secret_length(),
    reply,
    &con->peer_name,
    &con->peer_global_id,
    &con->peer_caps_info,
    &auth_meta->session_key,
    &auth_meta->connection_secret,
    ac);
```

The patch transplate the same logic to crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 1, 2021
The `FuturizedStore` interface imposes the `get_attr()`
takes the `name` parameter as `std::string_view`, and
thus burdens implementations with extending the life-
time of the data the instance refers to.

Unfortunately, `AlienStore` is unaware that prolonging
the life of a `std::string_view` instance doesn't prolong
the data memory it points to. This problem has manifested
in the following use-after-free detected at Sepia:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136929$ less ./remote/smithi194/log/ceph-osd.7.log.gz
...
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - do_osd_ops_execute: object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head - handling op
call
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - handling op call on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - calling method lock.lock, num_read=0, num_write=0
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - handling op getxattr on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - getxattr on obj=14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head for attr=_lock.TestLockPP1
DEBUG 2021-05-26 20:24:54,078 [shard 0] bluestore - get_attr
=================================================================
==34068==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030001851d0 at pc 0x7f824d6a5b27 bp 0x7f822b4201c0 sp 0x7f822b41f968
READ of size 17 at 0x6030001851d0 thread T28 (alien-store-tp)
...
    #0 0x7f824d6a5b26  (/lib64/libasan.so.5+0x40b26)
    #1 0x55e2cbb2e00b  (/usr/bin/ceph-osd+0x2b6dc00b)
    #2 0x55e2d31f086e  (/usr/bin/ceph-osd+0x32d9e86e)
    #3 0x55e2d3467607 in crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) (/usr/bin/ceph-osd+0x33015607)
    #4 0x55e2d346b14a  (/usr/bin/ceph-osd+0x3301914a)
    #5 0x7f8249d32ba2  (/lib64/libstdc++.so.6+0xc2ba2)
    #6 0x7f824a00d149 in start_thread (/lib64/libpthread.so.0+0x8149)
    #7 0x7f82486edf22 in clone (/lib64/libc.so.6+0xfcf22)

0x6030001851d0 is located 0 bytes inside of 31-byte region [0x6030001851d0,0x6030001851ef)
freed by thread T0 here:
    #0 0x7f824d757688 in operator delete(void*) (/lib64/libasan.so.5+0xf2688)

previously allocated by thread T0 here:
    #0 0x7f824d7567b0 in operator new(unsigned long) (/lib64/libasan.so.5+0xf17b0)

Thread T28 (alien-store-tp) created by T0 here:
    #0 0x7f824d6b7ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3)

SUMMARY: AddressSanitizer: heap-use-after-free (/lib64/libasan.so.5+0x40b26)
Shadow bytes around the buggy address:
  0x0c06800289e0: fd fd fd fa fa fa fd fd fd fa fa fa 00 00 00 fa
  0x0c06800289f0: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x0c0680028a00: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
  0x0c0680028a10: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x0c0680028a20: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
=>0x0c0680028a30: fd fd fa fa fd fd fd fd fa fa[fd]fd fd fd fa fa
  0x0c0680028a40: fd fd fd fd fa fa fd fd fd fd fa fa 00 00 00 07
  0x0c0680028a50: fa fa 00 00 00 fa fa fa 00 00 00 fa fa fa fd fd
  0x0c0680028a60: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x0c0680028a70: 00 00 00 00 fa fa fd fd fd fd fa fa fd fd fd fd
  0x0c0680028a80: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==34068==ABORTING
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 9, 2021
Commit b5efdc6 has unified
the interruption handling among `InternalClientRequest` and
`ClientRequest`. Unfortunately, a call to `maybe_reset()` of
`OpSequencer` has been overlooked and dropped leading to the
`assert(prev_op > last_unblocked)` assertion failure in
`start_op()`.

This was the root cause of the following problem at Sepia:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136929$ less ./remote/smithi194/log/ceph-osd.6.log.gz
...
DEBUG 2021-05-26 20:24:53,988 [shard 0] ms - [osd.6(client) v2:172.21.15.194:6804/34047 >> client.4453 172.21.15.67:0/3814935464@37042] <== #1
 === osd_op(client.4453.0:5 12.6 12.7fc1f406 (undecoded) ondisk+write+known_if_redirected e52) v8 (42)
DEBUG 2021-05-26 20:24:53,988 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12.7fc1f406 (undecoded) ondisk+write+known_if_redirected e52) v8): start
DEBUG 2021-05-26 20:24:53,988 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12.7fc1f406 (undecoded) ondisk+write+known_if_redirected e52) v8): in repeat
DEBUG 2021-05-26 20:24:53,988 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12.7fc1f406 (undecoded) ondisk+write+known_if_redirected e52) v8) same_interval_since: 19
DEBUG 2021-05-26 20:24:53,988 [shard 0] osd - do_recover_missing check for recovery, 12:602f83fe:::foo:head
DEBUG 2021-05-26 20:24:53,988 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12:602f83fe:::foo:head {write 0~128 in=128b} snapc 0={} ondisk+write+known_if_redirected e52) v8): got obc lock
...
DEBUG 2021-05-26 20:25:21,810 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12:602f83fe:::foo:head {write 0~128 in=1
28b} snapc 0={} ondisk+write+known_if_redirected e52) v8): in repeat
...
DEBUG 2021-05-26 20:25:21,809 [shard 0] osd - should_abort_request operation restart, acting set changed
...
DEBUG 2021-05-26 20:25:21,813 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12:602f83fe:::foo:head {write 0~128 in=128b} snapc 0={} ondisk+write+known_if_redirected e52) v8) same_interval_since: 55
...
DEBUG 2021-05-26 20:25:21,813 [shard 0] osd - client_request(id=4, detail=osd_op(client.4453.0:5 12.6 12:602f83fe:::foo:head {write 0~128 in=128b} snapc 0={} ondisk+write+known_if_redirected e52) v8) same_interval_since: 55
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/osd/osd_operation_sequencer.h:52: seastar::futurize_t<Result> crimson::osd::OpSequencer::start_op(HandleT&, uint64_t, uint64_t, FuncT&&) [with HandleT = crimson::PipelineHandle; FuncT = crimson::interruptible::interruptor<InterruptCond>::wrap_function(Func&&) [with Func = crimson::osd::ClientRequest::start()::<lambda()> mutable::<lambda(Ref<crimson::osd::PG>)> mutable::<lambda()> mutable::<lambda()>; InterruptCond = crimson::osd::IOInterruptCondition]::<lambda()>; Result = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; seastar::futurize_t<Result> = crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<> >; uint64_t = long unsigned int]: Assertion `prev_op > last_unblocked' failed.
Aborting on shard 0.
Backtrace:
 0# 0x000055F440239C1F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F4788336B20 in /lib64/libpthread.so.0
 4# gsignal in /lib64/libc.so.6
 5# abort in /lib64/libc.so.6
 6# 0x00007F4786931B09 in /lib64/libc.so.6
 7# 0x00007F478693FDE6 in /lib64/libc.so.6
 8# 0x000055F43BA3DD17 in ceph-osd
 9# 0x000055F43BA419A8 in ceph-osd
10# seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<boost::intrusive_ptr<crimson::osd::PG> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::noncopyable_function<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > (boost::intrusive_ptr<crimson::osd::PG>&&)>&, seastar::future_state<boost::intrusive_ptr<crimson::osd::PG> >&&)#1}, boost::intrusive_ptr<crimson::osd::PG> >::run_and_dispose() in ceph-osd
11# 0x000055F445C5C70F in ceph-osd
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 9, 2021
This is an AlienStore's counterpart of 9fbd601
which becomes necessary as, since 6b66c30,
we call `open_collection()` to check the existence of OSD's superblock.

Without this fix the following crash happens:

```
[rzarzynski@o06 build]$ MDS=0 MGR=0 OSD=1 ../src/vstart.sh -l -n --without-dashboard --crimson
...
crimson-osd: boost/include/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = ObjectStore::CollectionImpl]: Assertion `px != 0' failed.
Aborting on shard 0.
Backtrace:
 0# print_backtrace(std::basic_string_view<char, std::char_traits<char> >) at /home/rzarzynski/ceph1/build/boost/include/boost/stacktrace/stacktrace.hpp:129
 1# FatalSignal::signaled(int, siginfo_t const*) at /home/rzarzynski/ceph1/build/../src/crimson/common/fatal_signal.cc:80
 2# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /home/rzarzynski/ceph1/build/../src/crimson/common/fatal_signal.cc:41
 3# FatalSignal::install_oneshot_signal_handler<6>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at /home/rzarzynski/ceph1/build/../src/crimson/common/fatal_signal.cc:36
 4# 0x00007F8BF534DB30 in /lib64/libpthread.so.0
 5# gsignal in /lib64/libc.so.6
 6# abort in /lib64/libc.so.6
 7# 0x00007F8BF3948B19 in /lib64/libc.so.6
 8# 0x00007F8BF3956DF6 in /lib64/libc.so.6
 9# boost::intrusive_ptr<ObjectStore::CollectionImpl>::operator->() const at /home/rzarzynski/ceph1/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:200
10# crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#2}::operator()(boost::intrusive_ptr<ObjectStore::CollectionImpl>) const at /home/rzarzynski/ceph1/build/../src/crimson/os/alienstore/alien_store.cc:214
11# seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > seastar::futurize<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >::invoke<crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#2}&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#2}&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:2135
12# auto seastar::futurize_invoke<crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#2}&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(crimson::os::AlienStore::open_collection(coll_t const&)::{lambda(boost::intrusive_ptr<ObjectStore::CollectionImpl>)#2}&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:2167
13# _ZZN7seastar6futureIN5boost13intrusive_ptrIN11ObjectStore14CollectionImplEEEE4thenIZN7crimson2os10AlienStore15open_collectionERK6coll_tEUlS5_E0_NS0_INS2_INS9_19FuturizedCollectionEEEEEEET0_OT_ENUlDpOT_E_clIJS5_EEEDaSN_ at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:1526
14# _ZN7seastar20noncopyable_functionIFNS_6futureIN5boost13intrusive_ptrIN7crimson2os19FuturizedCollectionEEEEEONS3_IN11ObjectStore14CollectionImplEEEEE17direct_vtable_forIZNS1_ISB_E4thenIZNS5_10AlienStore15open_collectionERK6coll_tEUlSB_E0_S8_EET0_OT_EUlDpOT_E_E4callEPKSE_SC_ at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/util/noncopyable_function.hh:125
15# seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>::operator()(boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) const at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/util/noncopyable_function.hh:210
16# seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > std::__invoke_impl<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> >, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(std::__invoke_other, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
17# std::__invoke_result<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >::type std::__invoke<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:97
18# std::invoke_result<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >::type std::invoke<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/functional:83
19# auto seastar::internal::future_invoke<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl> >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&&) at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:1213
20# seastar::future<boost::intrusive_ptr<ObjectStore::CollectionImpl> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>, seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&)#1}::operator()(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&) const::{lambda()#1}::operator()() const at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:1575
21# void seastar::futurize<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >::satisfy_with_result_of<seastar::future<boost::intrusive_ptr<ObjectStore::CollectionImpl> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>, seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&)#1}::operator()(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&&) at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:2120
22# seastar::future<boost::intrusive_ptr<ObjectStore::CollectionImpl> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>, seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&)#1}::operator()(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&) const at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:1571
23# seastar::continuation<seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>, seastar::future<boost::intrusive_ptr<ObjectStore::CollectionImpl> >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>, seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > >(seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&&)::{lambda(seastar::internal::promise_base_with_type<boost::intrusive_ptr<crimson::os::FuturizedCollection> >&&, seastar::noncopyable_function<seastar::future<boost::intrusive_ptr<crimson::os::FuturizedCollection> > (boost::intrusive_ptr<ObjectStore::CollectionImpl>&&)>&, seastar::future_state<boost::intrusive_ptr<ObjectStore::CollectionImpl> >&&)#1}, boost::intrusive_ptr<ObjectStore::CollectionImpl> >::run_and_dispose() at /home/rzarzynski/ceph1/build/../src/seastar/include/seastar/core/future.hh:771
24# seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /home/rzarzynski/ceph1/build/../src/seastar/src/core/reactor.cc:2237
25# seastar::reactor::run_some_tasks() at /home/rzarzynski/ceph1/build/../src/seastar/src/core/reactor.cc:2646 (discriminator 1)
26# seastar::reactor::run() at /home/rzarzynski/ceph1/build/../src/seastar/src/core/reactor.cc:2805
27# seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/rzarzynski/ceph1/build/../src/seastar/src/core/app-template.cc:207 (discriminator 7)
28# seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at /home/rzarzynski/ceph1/build/../src/seastar/src/core/app-template.cc:115 (discriminator 2)
29# main at /home/rzarzynski/ceph1/build/../src/crimson/osd/main.cc:206 (discriminator 1)
30# __libc_start_main in /lib64/libc.so.6
31# _start in /home/rzarzynski/ceph1/build/bin/crimson-osd
Reactor stalled for 33157 ms on shard 0. Backtrace: 0xb14ab 0xa6cd418 0xa6a496d 0xa54fd22 0xa566e3d 0xa56383c 0xa5639fc 0xa566808 0x12b2f 0x3780e 0x21c44 0x21b18 0x2fdf5 0x7cfdf93 0x7b78622 0x7c3fcb7 0x7c1bcb2 0x7c1bdf1 0x7c4004a 0x7d913c7 0x7d875ed 0x7d777d7 0x7d5f626 0x7d47ca4 0x7d47b5a 0x7d5f770 0x7d47931 0x7d9bb74 0xa5998b5 0xa59e1ff 0xa5a3a95 0xa43546c 0xa433331 0x3f01992 0x237c2 0x3c7f9bd
../src/vstart.sh: line 28: 665993 Aborted                 (core dumped) PATH=$CEPH_BIN:$PATH "$@"
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
batrick pushed a commit that referenced this pull request Jun 9, 2021
…uence.

The `active_con` can get invalidated every single time when there is
a preemption point. This includes even the middle connection open
sequence as it's spread across multiple continuations!

Unfortunately, we don't check for `active_con` in the lambdas inside
the `on_session_opened()` method. That was the reason of the following
crash at Sepia [1]:

```
INFO  2021-06-08 09:36:23,992 [shard 0] monc - do_auth_single: connection closed
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4967-g96cdf983/rpm/el8/BUILD/ceph-17.0.0-4967-g96cdf983/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x000055C3C1CA860F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007FAAE1713B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_conn() in ceph-osd
 5# 0x000055C3C2532DA8 in ceph-osd
 6# 0x000055C3C2535CB5 in ceph-osd
 7# 0x000055C3BBC9FC70 in ceph-osd
 8# 0x000055C3C76FAE5F in ceph-osd
 9# 0x000055C3C77003A0 in ceph-osd
10# 0x000055C3C78B240B in ceph-osd
11# 0x000055C3C740FE8A in ceph-osd
12# 0x000055C3C7419FAE in ceph-osd
13# main in ceph-osd
14# __libc_start_main in /lib64/libc.so.6
15# _start in ceph-osd
Fault at location: 0x98
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-06-08_09:11:05-rados-master-distro-basic-smithi/6159602

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants