Project

General

Profile

Actions

Bug #66336

closed

valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config()

Added by J. Eric Ivancich almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
% Done:

0%

Source:
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3901-g60cd2bc5f0
Released In:
v20.2.0~2324
Upkeep Timestamp:
2025-11-01T01:33:28+00:00

Description

This happens on main. It can be seen in this teuthology run: https://qa-proxy.ceph.com/teuthology/ivancich-2024-05-31_15:04:52-rgw-wip-eric-testing-1-distro-default-smithi/7735471/

Here's what's generated:

- file_path: /var/log/ceph/valgrind/ceph.client.0.log
  kind: MismatchedFree
  traceback:
  - file: /
    function: operator delete[](void*, unsigned long)
    line: ''
  - file: /
    function: ''
    line: ''
  - file: /
    function: MonClient::get_monmap_and_config()
    line: ''
  - file: /
    function: global_init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char,
      std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char,
      std::char_traits<char>, std::allocator<char> > > > > const*, std::vector<char
      const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int,
      bool)
    line: ''
  - file: /
    function: rgw_global_init(std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
      std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char,
      std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char,
      std::char_traits<char>, std::allocator<char> > > > > const*, std::vector<char
      const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int)
    line: ''

Files

valgrind_log_8000.xml (307 KB) valgrind_log_8000.xml Mark Kogan, 07/16/2024 09:58 AM

Related issues 3 (1 open2 closed)

Related to Ceph - Bug #63867: Segfault in CommonSafeTimer::cancel_all_events due to uninitialized dataNew

Actions
Copied to rgw - Backport #67307: squid: valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config()ResolvedCasey BodleyActions
Copied to rgw - Backport #69652: reef: valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config()ResolvedJ. Eric IvancichActions
Actions #1

Updated by Casey Bodley almost 2 years ago

  • Subject changed from rgw: valgrind complains of a MismatchedFree to valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config()
Actions #2

Updated by Casey Bodley almost 2 years ago

seems to only happen against ubuntu

Actions #3

Updated by Mark Kogan almost 2 years ago

updating current repro effort status,
the issue has NOT reproduced with the currently latest ubuntu (24.04) container vstart environment with the workload being java s3tests:

sudo podman run -it --rm --replace --name UBU --privileged -v /mnt/osd--host/src-git/:/src-git ubuntu

root@12dda126a09b:/# cat /etc/os-release 
...
VERSION="24.04 LTS (Noble Numbat)" 

...
root@12dda126a09b:/src-git/ceph--up--ubuntu/build# export CEPH_DEV=0                                                                                                                         
sudo ../src/stop.sh;../src/stop.sh ; sleep 2 ; sudo find ./out -maxdepth 1 -type f -delete ; sudo rm -rf ./dev/* ; sudo chown $(id -nu):$(id -ng) -R *                                       
env MON=1 OSD=1 MDS=0 MGR=1 RGW=1 NFS=0 ../src/vstart.sh -n -x --nolockdep --without-dashboard -o debug_ms=0 --rgw_frontend "beast" 
...
root@12dda126a09b:/src-git/ceph--up--ubuntu/build# sudo  pkill -9 radosgw
...
root@12dda126a09b:/src-git/ceph--up--ubuntu/build# sudo truncate -s0 ./out/radosgw.8000.log ; sudo valgrind --soname-synonyms=somalloc="*tcmalloc*" --vgdb=no --trace-children=no --child-silent-after-fork=yes --num-callers=20 --track-origins=yes --time-stamp=yes  --suppressions=../qa/valgrind.supp --xml=yes --xml-file=./valgrind_log_8000.xml --tool=memcheck --max-threads=2048 -- ./bin/radosgw -c ./ceph.conf --log-file=./out/radosgw.8000.log --admin-socket=./out/radosgw.8000.asok --pid-file=./out/radosgw.8000.pid -n client.rgw.8000 --rgw_frontends="beast port=8000" --debug_ms=0 --debug_rgw=1 -f
...

...
root@12dda126a09b:/src-git/ceph--up--ubuntu/build/java_s3tests# ./gradle/gradle/bin/gradle clean test --rerun-tasks --no-build-cache --no-daemon --no-parallel
...

## <ctrl>-c to stop valgrind & rgw
...
2024-06-18T10:32:08.961-0200 8ef7000 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-18T10:32:11.244-0200 8ef7000 -1 WARNING: all dangerous and experimental features are enabled.
2024-06-18T10:32:11.325-0200 8ef7000 -1 WARNING: all dangerous and experimental features are enabled.
^C2024-06-18T11:24:50.654-0200 e5736c0 -1 received  signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0
2024-06-18T11:24:50.663-0200 8ef7000 -1 shutting down
...

...
root@12dda126a09b:/src-git/ceph--up--ubuntu/build# cat valgrind_log_8000.xml 
<?xml version="1.0"?>

<valgrindoutput>

<protocolversion>4</protocolversion>
<protocoltool>memcheck</protocoltool>

<preamble>
  <line>Memcheck, a memory error detector</line>
  <line>Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.</line>
  <line>Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info</line>
  <line>Command: ./bin/radosgw -c ./ceph.conf --log-file=./out/radosgw.8000.log --admin-socket=./out/radosgw.8000.asok --pid-file=./out/radosgw.8000.pid -n client.rgw.8000 --rgw_frontends=beast port=8000 --debug_ms=0 --debug_rgw=1 -f</line>
</preamble>

<pid>11452</pid>
<ppid>11451</ppid>
<tool>memcheck</tool>

<args>
  <vargv>
    <exe>/usr/bin/valgrind.bin</exe>
    <arg>--soname-synonyms=somalloc=*tcmalloc*</arg>
    <arg>--vgdb=no</arg>
    <arg>--trace-children=no</arg>
    <arg>--child-silent-after-fork=yes</arg>
    <arg>--num-callers=20</arg>
    <arg>--track-origins=yes</arg>
    <arg>--time-stamp=yes</arg>
    <arg>--suppressions=../qa/valgrind.supp</arg>
    <arg>--xml=yes</arg>
    <arg>--xml-file=./valgrind_log_8000.xml</arg>
    <arg>--tool=memcheck</arg>
    <arg>--max-threads=2048</arg>
  </vargv>
  <argv>
    <exe>./bin/radosgw</exe>
    <arg>-c</arg>
    <arg>./ceph.conf</arg>
    <arg>--log-file=./out/radosgw.8000.log</arg>
    <arg>--admin-socket=./out/radosgw.8000.asok</arg>
    <arg>--pid-file=./out/radosgw.8000.pid</arg>
    <arg>-n</arg>
    <arg>client.rgw.8000</arg>
    <arg>--rgw_frontends=beast port=8000</arg>
    <arg>--debug_ms=0</arg>
    <arg>--debug_rgw=1</arg>
    <arg>-f</arg>
  </argv>
</args>

<status>
  <state>RUNNING</state>
  <time>00:00:00:00.237 </time>
</status>

<status>
  <state>FINISHED</state>
  <time>00:00:53:12.613 </time>
</status>

<errorcounts>
</errorcounts>

<suppcounts>
  <pair>
    <count>51536</count>
    <name>&lt;allthefrees, so we can behave with tcmalloc&gt;</name>
  </pair>
  <pair>
    <count>1</count>
    <name>tcmalloc: param points to uninit bytes under call_init (centos9) or call_init.part.0 (jammy)</name>
  </pair>
</suppcounts>

</valgrindoutput>

Actions #4

Updated by J. Eric Ivancich over 1 year ago

  • Assignee set to Mark Kogan
Actions #5

Updated by Casey Bodley over 1 year ago

a data point from the squid-release branch which tracks the first squid release candidate but is ~800 commits behind the current squid branch

Yuri's suite run https://pulpito.ceph.com/yuriw-2024-06-27_14:25:29-rgw-squid-release-distro-default-smithi/ contains an rgw/verify job on ubuntu22 with validator/valgrind that didn't show this valgrind issue: https://qa-proxy.ceph.com/teuthology/yuriw-2024-06-27_14:25:29-rgw-squid-release-distro-default-smithi/7776147/teuthology.log

in contrast, recent runs on top of the latest squid branch are seeing these failures consistently: https://pulpito.ceph.com/cbodley-2024-06-27_13:37:18-rgw-wip-cbodley2-testing-distro-default-smithi/

if we can't reproduce this locally, it might be worth trying to bisect this on squid

Actions #6

Updated by Casey Bodley over 1 year ago

  • Priority changed from High to Urgent
Actions #7

Updated by Casey Bodley over 1 year ago

https://tracker.ceph.com/issues/63867 is tracking what looks like a compiler bug causing crashes under this same MonClient::get_monmap_and_config() function. i wonder if that's somehow related

Actions #8

Updated by Casey Bodley over 1 year ago

  • Related to Bug #63867: Segfault in CommonSafeTimer::cancel_all_events due to uninitialized data added
Actions #9

Updated by Casey Bodley over 1 year ago

i scheduled a job with debuginfo packages installed hoping to get more symbols. from https://qa-proxy.ceph.com/teuthology/cbodley-2024-07-02_17:40:03-rgw:verify-wip-66036-distro-default-smithi/7783961/remote/smithi064/log/valgrind/ceph.client.0.log.gz:

<error>
  <unique>0x203</unique>
  <tid>1</tid>
  <kind>MismatchedFree</kind>
  <what>Mismatched free() / delete / delete []</what>
  <stack>
    <frame>
      <ip>0x484CD4F</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator delete[](void*, unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x53D4D78</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_tempbuf.h</file>
      <line>74</line>
    </frame>
    <frame>
      <ip>0x53D4D78</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_tempbuf.h</file>
      <line>182</line>
    </frame>
    <frame>
      <ip>0x53D4D78</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_algo.h</file>
      <line>5025</line>
    </frame>
    <frame>
      <ip>0x53D4D78</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_algo.h</file>
      <line>5056</line>
    </frame>
    <frame>
      <ip>0x53D4D78</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>Messenger::add_dispatcher_head(Dispatcher*, unsigned int) [clone .constprop.0]</fn>
      <dir>./obj-x86_64-linux-gnu/src/./src/msg</dir>
      <file>Messenger.h</file>
      <line>405</line>
    </frame>
    <frame>
      <ip>0x51832C7</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>MonClient::get_monmap_and_config()</fn>
      <dir>./obj-x86_64-linux-gnu/src/./src/mon</dir>
      <file>MonClient.cc</file>
      <line>134</line>
    </frame>
    <frame>
      <ip>0xF628DC</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>global_init(std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::less&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt; &gt; const*, std::vector&lt;char const*, std::allocator&lt;char const*&gt; &gt;&amp;, unsigned int, code_environment_t, int, bool)</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/../global/./src/global</dir>
      <file>global_init.cc</file>
      <line>370</line>
    </frame>
    <frame>
      <ip>0x8CC09A</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw_global_init(std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::less&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt; &gt; const*, std::vector&lt;char const*, std::allocator&lt;char const*&gt; &gt;&amp;, unsigned int, code_environment_t, int)</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/./src/rgw</dir>
      <file>rgw_common.cc</file>
      <line>3189</line>
    </frame>
    <frame>
      <ip>0x66DBE4</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>main</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/./src/rgw</dir>
      <file>rgw_main.cc</file>
      <line>105</line>
    </frame>
  </stack>
  <auxwhat>Address 0xa2f84a0 is 0 bytes inside a block of size 16 alloc'd</auxwhat>
  <stack>
    <frame>
      <ip>0x4849899</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new(unsigned long, std::nothrow_t const&amp;)</fn>
    </frame>
    <frame>
      <ip>0x53D3946</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_tempbuf.h</file>
      <line>109</line>
    </frame>
    <frame>
      <ip>0x53D3946</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>std::_Temporary_buffer&lt;__gnu_cxx::__normal_iterator&lt;Messenger::PriorityDispatcher*, std::vector&lt;Messenger::PriorityDispatcher, std::allocator&lt;Messenger::PriorityDispatcher&gt; &gt; &gt;, Messenger::PriorityDispatcher&gt;::_Temporary_buffer(__gnu_cxx::__normal_iterator&lt;Messenger::PriorityDispatcher*, std::vector&lt;Messenger::PriorityDispatcher, std::allocator&lt;Messenger::PriorityDispatcher&gt; &gt; &gt;, long) [clone .constprop.0]</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_tempbuf.h</file>
      <line>262</line>
    </frame>
    <frame>
      <ip>0x53D4D33</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_algo.h</file>
      <line>5018</line>
    </frame>
    <frame>
      <ip>0x53D4D33</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/include/c++/11/bits</dir>
      <file>stl_algo.h</file>
      <line>5056</line>
    </frame>
    <frame>
      <ip>0x53D4D33</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>Messenger::add_dispatcher_head(Dispatcher*, unsigned int) [clone .constprop.0]</fn>
      <dir>./obj-x86_64-linux-gnu/src/./src/msg</dir>
      <file>Messenger.h</file>
      <line>405</line>
    </frame>
    <frame>
      <ip>0x51832C7</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>MonClient::get_monmap_and_config()</fn>
      <dir>./obj-x86_64-linux-gnu/src/./src/mon</dir>
      <file>MonClient.cc</file>
      <line>134</line>
    </frame>
    <frame>
      <ip>0xF628DC</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>global_init(std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::less&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt; &gt; const*, std::vector&lt;char const*, std::allocator&lt;char const*&gt; &gt;&amp;, unsigned int, code_environment_t, int, bool)</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/../global/./src/global</dir>
      <file>global_init.cc</file>
      <line>370</line>
    </frame>
    <frame>
      <ip>0x8CC09A</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw_global_init(std::map&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;, std::less&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;, std::allocator&lt;std::pair&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt; &gt; &gt; const*, std::vector&lt;char const*, std::allocator&lt;char const*&gt; &gt;&amp;, unsigned int, code_environment_t, int)</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/./src/rgw</dir>
      <file>rgw_common.cc</file>
      <line>3189</line>
    </frame>
    <frame>
      <ip>0x66DBE4</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>main</fn>
      <dir>./obj-x86_64-linux-gnu/src/rgw/./src/rgw</dir>
      <file>rgw_main.cc</file>
      <line>105</line>
    </frame>
  </stack>
</error>

still missing some symbols due to UnknownInlinedFun, but the stack points into Messenger::add_dispatcher_head() from ceph/src/msg/Messenger.h:
  void add_dispatcher_head(Dispatcher *d, PriorityDispatcher::priority_t priority=Dispatcher::PRIORITY_DEFAULT) {
    bool first = dispatchers.empty();
    dispatchers.insert(dispatchers.begin(), PriorityDispatcher{priority, d});
    std::stable_sort(dispatchers.begin(), dispatchers.end());
    if (d->ms_can_fast_dispatch_any()) {
      fast_dispatchers.insert(fast_dispatchers.begin(), PriorityDispatcher{priority, d});
      std::stable_sort(fast_dispatchers.begin(), fast_dispatchers.end());
    }
    if (first)
      ready();
  }

the new and delete[] are in stl_tempbuf.h which is called from stl_algo.h so i'm guessing that's inside one of our calls to std::stable_sort()

i pushed a branch for debug builds in https://shaman.ceph.com/builds/ceph/wip-66336-debug/ hoping that will get rid of the UnknownInlinedFun stuff

Actions #10

Updated by Casey Bodley over 1 year ago

Casey Bodley wrote in #note-9:

i pushed a branch for debug builds in https://shaman.ceph.com/builds/ceph/wip-66336-debug/ hoping that will get rid of the UnknownInlinedFun stuff

no luck there

Actions #11

Updated by Mark Kogan over 1 year ago

@Casey Bodley thanks to the narrowing information in comment#6 was able to repro and bisect to commit:
https://github.com/ceph/ceph/pull/57682/commits/272052893fc6e84094ad65e2e7228a8ed5bfdf14 -- msg: add priority to dispatcher invocation order

from PR:
https://github.com/ceph/ceph/pull/57682 -- squid: mds: use regular dispatch for processing beacons #57682

##############
## GIT BISECT:
##############

root@bfa0575d162c:/src-git/ceph--up--ubuntu# git bisect bad 81127b728ce
Bisecting: 429 revisions left to test after this (roughly 9 steps)
[25725dec9c9d4cab5b84a6eb734d15a08efa4fa5] osd/scrub: Change scrub cost to use average object size
root@bfa0575d162c:/src-git/ceph--up--ubuntu# git status
HEAD detached at 25725dec9c9
You are currently bisecting, started from branch 'squid'.

root@bfa0575d162c:/src-git/ceph--up--ubuntu# git bisect good
Bisecting: 212 revisions left to test after this (roughly 8 steps)
[ffdcd9c42be4354b71dbe3ec41883ab92e8606af] Merge pull request #56883 from guits/wip-65480-squid

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect bad
Bisecting: 106 revisions left to test after this (roughly 7 steps)
[7018c7c4791b1c97e7f6d28a7d9bb5e8383e0aab] Merge pull request #58009 from Matan-B/wip-55488-squid

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect bad
Bisecting: 58 revisions left to test after this (roughly 6 steps)
[a85501fdefda6180f3638d4dc69cb0c0399bc3ef] Merge pull request #58014 from Matan-B/wip-55735-squid

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect good
Bisecting: 29 revisions left to test after this (roughly 5 steps)
[e86811589ab8d5cf70ec9d493ee8d576aac2985a] Merge pull request #58033 from Matan-B/wip-57313-squi

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect good
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[3f51c8918b3dc912b051920a6f00103382bcaaff] Merge pull request #57840 from rishabh-d-dave/wip-66330-squid

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect bad
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[fac565605e7fdab062a5a2dde007c9431bbd9526] qa/cephfs: pass confirmation flag to fs fail in tear down code

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect good
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[598780344c99d8968535db6e91e29cd29d1d5581] mds: set dispatcher order

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect bad
Bisecting: 1 revision left to test after this (roughly 1 step)
[272052893fc6e84094ad65e2e7228a8ed5bfdf14] msg: add priority to dispatcher invocation order

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c9296d551c0bc69458c4c69f06de9881e62101b0] mds: note when dispatcher is called

root@c5b34b4a7802:/src-git/ceph--up--ubuntu# git bisect good
272052893fc6e84094ad65e2e7228a8ed5bfdf14 is the first bad commit
commit 272052893fc6e84094ad65e2e7228a8ed5bfdf14
Author: Patrick Donnelly <pdonnell@redhat.com>
Date:   Tue May 14 14:15:21 2024 -0400

    msg: add priority to dispatcher invocation order

    So we can ensure that e.g. MDSRank::ms_dispatch is lowest priority so that we
    do not acquire the mds_lock when looking at beacons.

    This change maintains the current behavior when the priority is unset: the use
    of std::stable_sort will ensure that the add_dispatcher_head and
    add_dispatcher_tail calls will preserve order when dispatcher priorities are
    equal.

    Fixes: 7fc04be9332704946ba6f0e95cfcd1afc34fc0fe
    Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
    (cherry picked from commit b463d93b08f392ebd636c24bf5f0fa4249600256)

 src/msg/Dispatcher.h |  6 ++++
 src/msg/Messenger.h  | 82 ++++++++++++++++++++++++++++++++--------------------
 2 files changed, 57 insertions(+), 31 deletions(-)
Actions #12

Updated by Casey Bodley over 1 year ago

thanks Mark! are you able to capture a suppression with --gen-suppressions=yes?

Actions #13

Updated by Mark Kogan over 1 year ago

sure, attached with --gen-suppressions=all

Actions #14

Updated by Casey Bodley over 1 year ago

  • Status changed from New to Fix Under Review
  • Backport set to squid
  • Pull request ID set to 58631

in https://github.com/ceph/ceph/pull/58631, i propose changes that avoid calling std::stable_sort() there. depending on how long that takes to review/test, we may still want to merge valgrind suppressions in the meantime

Actions #15

Updated by Rishabh Dave over 1 year ago

This was seen in a QA run for Squid backport PRs - https://pulpito.ceph.com/xiubli-2024-07-29_02:08:56-fs-wip-xiubli-testing-20240726.021939-squid-distro-default-smithi/7823832/

From xiubli-2024-07-29_02:08:56-fs-wip-xiubli-testing-20240726.021939-squid-distro-default-smithi/7823832/remote/smithi176/log/valgrind/mds.a.log.gz -

    <frame>
      <ip>0x515B5F8</ip>
      <obj>/usr/lib/ceph/libceph-common.so.2</obj>
      <fn>Messenger::add_dispatcher_head(Dispatcher*, unsigned int) [clone .constprop.0]</fn>
      <dir>./obj-x86_64-linux-gnu/src/./src/msg</dir>
      <file>Messenger.h</file>
      <line>405</line>
    </frame>
    <frame>

From xiubli-2024-07-29_02:08:56-fs-wip-xiubli-testing-20240726.021939-squid-distro-default-smithi/7823832/remote/smithi176/log/valgrind/mon.a.log.gz -

    <frame>
      <ip>0x98FE35</ip>
      <obj>/usr/bin/ceph-mon</obj>
      <fn>Messenger::add_dispatcher_tail(Dispatcher*, unsigned int) [clone .constprop.0]</fn>
      <dir>./obj-x86_64-linux-gnu/src/mon/./src/msg</dir>
      <file>Messenger.h</file>
      <line>423</line>
    </frame>
    <frame>

Copying other more relevant bits below. From valgrind.yaml -

- file_path: /var/log/ceph/valgrind/mds.a.log
  kind: MismatchedFree
  traceback:
  - file: /
    function: operator delete[](void*, unsigned long)
    line: ''
  - file: /usr/include/c++/11/bits/stl_tempbuf.h
    function: UnknownInlinedFun
    line: '74'
  - file: /usr/include/c++/11/bits/stl_tempbuf.h
    function: UnknownInlinedFun
    line: '182'
  - file: /usr/include/c++/11/bits/stl_algo.h
    function: UnknownInlinedFun
    line: '5025'
  - file: /usr/include/c++/11/bits/stl_algo.h
    function: UnknownInlinedFun
    line: '5056'

From teuthology.log -

2024-07-29T09:30:37.757 DEBUG:teuthology.orchestra.run.smithi192:> sudo rm -rf -- /etc/ceph/ceph.conf /etc/ceph/ceph.keyring /home/ubuntu/cephtest/ceph.data /home/ubuntu/cephtest/ceph.monmap /home/ubuntu/cephtest/../*.pid
2024-07-29T09:30:37.958 DEBUG:tasks.ceph:valgrind exception message: valgrind error: MismatchedFree
operator delete[](void*, unsigned long)
UnknownInlinedFun
UnknownInlinedFun

Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_53ce1462e129f6eb4071986336534c740fdebd31/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_650bd0c3361e3179c37519a10c7e259cdea50ae8/qa/tasks/ceph.py", line 1908, in task
    with contextutil.nested(*subtasks):
  File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_teuthology_53ce1462e129f6eb4071986336534c740fdebd31/teuthology/contextutil.py", line 54, in nested
    raise exc[1]
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_650bd0c3361e3179c37519a10c7e259cdea50ae8/qa/tasks/ceph.py", line 252, in ceph_log
    yield
  File "/home/teuthworker/src/git.ceph.com_teuthology_53ce1462e129f6eb4071986336534c740fdebd31/teuthology/contextutil.py", line 46, in nested
    if exit(*exc):
  File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_650bd0c3361e3179c37519a10c7e259cdea50ae8/qa/tasks/ceph.py", line 346, in valgrind_post
    raise valgrind_exception
Exception: valgrind error: MismatchedFree
operator delete[](void*, unsigned long)
UnknownInlinedFun
UnknownInlinedFun
Actions #17

Updated by Casey Bodley over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee changed from Mark Kogan to Casey Bodley
Actions #18

Updated by Casey Bodley over 1 year ago

  • Copied to Backport #67307: squid: valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config() added
Actions #19

Updated by Casey Bodley over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #20

Updated by Casey Bodley over 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #21

Updated by J. Eric Ivancich about 1 year ago

  • Status changed from Resolved to Pending Backport
  • Backport changed from squid to squid,reef
Actions #22

Updated by J. Eric Ivancich about 1 year ago

  • Copied to Backport #69652: reef: valgrind error: MismatchedFree operator delete[](void*, unsigned long) MonClient::get_monmap_and_config() added
Actions #23

Updated by J. Eric Ivancich 9 months ago

  • Status changed from Pending Backport to Resolved
Actions #24

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 60cd2bc5f0dd78ff48bc64c4a1a13aa24c4a2cfb
  • Fixed In set to v19.3.0-3901-g60cd2bc5f0d
  • Upkeep Timestamp set to 2025-07-11T01:38:22+00:00
Actions #25

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3901-g60cd2bc5f0d to v19.3.0-3901-g60cd2bc5f0
  • Upkeep Timestamp changed from 2025-07-11T01:38:22+00:00 to 2025-07-14T22:43:25+00:00
Actions #26

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2324
  • Upkeep Timestamp changed from 2025-07-14T22:43:25+00:00 to 2025-11-01T01:33:28+00:00
Actions

Also available in: Atom PDF