Project

General

Profile

Actions

Bug #73822

open

Rocky10 - rados/verify - valgrind error: MismatchedFree operator delete[](void*, unsigned long, std::align_val_t) RocksDBStore::close() RocksDBStore::~RocksDBStore()

Added by Nitzan Mordechai 4 months ago. Updated 4 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

/a/sjust-2025-11-11_04:48:46-rados-wip-rocky10-branch-of-the-day-2025-11-10-1762829866-distro-default-smithi/ ['8594563', '8594659', '8594510']
please check for example /a/sjust-2025-11-11_04:48:46-rados-wip-rocky10-branch-of-the-day-2025-11-10-1762829866-distro-default-smithi/8594563/remote/smithi121/log/valgrind/osd.2.log.gz for full valgrind error output

<error>
  <unique>0x675</unique>
  <tid>1</tid>
  <kind>MismatchedFree</kind>
  <what>Mismatched free() / delete / delete []</what>
  <stack>
    <frame>
      <ip>0x48485EC</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator delete[](void*, unsigned long, std::align_val_t)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.24.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>1504</line>
    </frame>
    <frame>
      <ip>0x1050182</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>RocksDBStore::close()</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/kv</dir>
      <file>RocksDBStore.cc</file>
      <line>1330</line>
    </frame>
    <frame>
      <ip>0x105021D</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>RocksDBStore::~RocksDBStore()</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/kv</dir>
      <file>RocksDBStore.cc</file>
      <line>1291</line>
    </frame>
    <frame>
      <ip>0xB541B3</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>UnknownInlinedFun</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/kv</dir>
      <file>RocksDBStore.cc</file>
      <line>1295</line>
    </frame>
    <frame>
      <ip>0xB541B3</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>BlueStore::_close_db()</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/os/bluestore</dir>
      <file>BlueStore.cc</file>
      <line>8248</line>
    </frame>
    <frame>
      <ip>0xB56023</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>BlueStore::_open_db_and_around(bool, bool)</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/os/bluestore</dir>
      <file>BlueStore.cc</file>
      <line>7891</line>
    </frame>
    <frame>
      <ip>0xB589DB</ip>
      <obj>/usr/bin/ceph-osd</obj>
      <fn>BlueStore::_mount()</fn>
      <dir>/usr/src/debug/ceph-20.3.0-4047.gd005727a.el10.x86_64/src/os/bluestore</dir>
      <file>BlueStore.cc</file>
      <line>9426</line>
    </frame>


Related issues 2 (2 open0 closed)

Related to mgr - Bug #73930: ceph-mgr modules rely on deprecated python subinterpretersNew

Actions
Related to RADOS - Bug #74604: Rocky10 - MismatchedFree delete coming from ceph-osd-classic codeFix Under ReviewNitzan Mordechai

Actions
Actions #1

Updated by Nitzan Mordechai 4 months ago

  • Description updated (diff)
Actions #2

Updated by Adam Kupczyk 4 months ago · Edited

I am unable to replicate the problem.
DB is removed by simple

void RocksDBStore::close() {
...
delete db;
}

that translates into virtual table call, I guess 0x18 is "~":

   10201:       48 8b bd 88 00 00 00    mov    0x88(%rbp),%rdi
  delete db;
   10213:       48 85 ff                test   %rdi,%rdi
   10216:       74 06                   je     1021e <RocksDBStore::close()+0x28e>
   10218:       48 8b 07                mov    (%rdi),%rax
   1021b:       ff 50 18                callq  *0x18(%rax)

The code in DBImplReadOnly::~DBImplReadOnly() is:
      74:       e9 00 00 00 00          jmpq   79 <_ZN7rocksdb14DBImplReadOnlyD0Ev+0x29>
      75: R_X86_64_PLT32      _ZdlPvmSt11align_val_t-0x4

Note: jmpq instead of call has a consequence that we do not see a ~DBImplReadOnly in valgrind callstack.

echo _ZdlPvmSt11align_val_t | c++filt
operator delete(void*, unsigned long, std::align_val_t)

I tried this with gcc toolset 11 and gcc toolset 13.3.1-2 (I think this is used by builder)

BUT:
valgrind callstack (suppression proposal part) clearly has:

echo _ZdaPvmSt11align_val_t | c++filt
operator delete[](void*, unsigned long, std::align_val_t)

I suspect that the compiler somehow emitted invalid call to `delete[]` instead of `delete`.

Actions #3

Updated by Radoslaw Zarzynski 4 months ago · Edited

@Adam Kupczyk: How about suppressing it?

Actions #4

Updated by Nitzan Mordechai 4 months ago

I added some comparisons from centos9 and rocky10 rpms:
Centos9 shows:
/opt/rh/gcc-toolset-13
GLIBCXX_3.4.29 - GCC 13

Rocky10:
GLIBCXX_3.4.32 - GCC 14

Compare the delete:
Rocky 10 (0xf48166):
f48166: mov 0x88(%rbp),%rdi
f48178: test %rdi,%rdi
f4817d: mov (%rdi),%rax
f48180: call *0x18(%rax)

CentOS 9 (0xf8d616):
f8d616: mov 0x88(%rbp),%rdi
f8d628: test %rdi,%rdi
f8d62d: mov (%rdi),%rax
f8d630: call *0x18(%rax)

looks identical, RocksDB version didn't change, so it looks like GCC 14 issue with RocksDB and not ceph

Actions #5

Updated by Yaarit Hatuka 4 months ago

  • Related to Bug #73930: ceph-mgr modules rely on deprecated python subinterpreters added
Actions #6

Updated by Radoslaw Zarzynski 4 months ago

Do we observe any other effect of this bug beyond making Valgrind angry?

Actions #7

Updated by Nitzan Mordechai 4 months ago

I didn't see any other effect, Valgrind complained only during the shutdown and release memory time.

Actions #8

Updated by Radoslaw Zarzynski 4 months ago

OK, let's update the suppression file.

Actions #9

Updated by Laura Flores 3 months ago

Scrub note: bump up

Actions #10

Updated by Radoslaw Zarzynski 3 months ago

@Nitzan Mordechai: are you going to update the suppression file?

Actions #11

Updated by Nitzan Mordechai 3 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 66651
Actions #12

Updated by Nitzan Mordechai 3 months ago

Radoslaw Zarzynski wrote in #note-10:

@Nitzan Mordechai: are you going to update the suppression file?

Done

Actions #13

Updated by Laura Flores 2 months ago

Scrub note: sent to testing

Actions #14

Updated by Radoslaw Zarzynski 2 months ago

Awaits QA.

Actions #15

Updated by Radoslaw Zarzynski about 2 months ago

Still awaits QA.

Actions #16

Updated by Laura Flores about 2 months ago

  • Related to Bug #74604: Rocky10 - MismatchedFree delete coming from ceph-osd-classic code added
Actions #17

Updated by Laura Flores about 2 months ago · Edited

There is also a leak in the OSD along with the Mon RocksDB leak: https://tracker.ceph.com/issues/74604

Is is a duplicate of this one? For now, I linked it as related.

Actions #18

Updated by Radoslaw Zarzynski about 2 months ago

Sent to QA.

Actions #19

Updated by Nitzan Mordechai about 1 month ago

/a/yaarit-2026-02-08_02:25:21-rados-wip-rocky10-branch-of-the-day-2026-02-06-1770413686-distro-default-trial/39937

Actions #20

Updated by Radoslaw Zarzynski about 1 month ago

Oops, it looks the fixing commit was present in the branch mentioned in the previous comment:

$ git log ceph-ci/wip-rocky10-branch-of-the-day-2026-02-06-1770413686
...
commit 56de49411b1c1f1e837f7694c653118f1145fafe
Author: NitzanMordhai <nmordech@ibm.com>
Date:   Thu Feb 5 11:48:39 2026 +0000

    qa: suppress false positive delete map mismatch errors

    Valgrind reports "Mismatched free() / delete / delete []" errors during
    OSD startup.

    Standard library containers (like std::map) correctly call delete, but
    Valgrind falsely interprets this as a call to delete[] because GCC 14
    folds the identical aligned delete operators into a single symbol. This
    causes Valgrind to flag a mismatch against the non-array allocation.

    Fixes: https://tracker.ceph.com/issues/74604
    Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
Actions #21

Updated by Nitzan Mordechai about 1 month ago

That one is a bit different, coming from main and with 2 calls for RocksDBStore::~RocksDBStore {
<insert_a_suppression_name_here>
Memcheck:Free
fun:_ZdaPvmSt11align_val_t
fun:_ZN12RocksDBStore5closeEv
fun:_ZN12RocksDBStoreD1Ev
fun:_ZN12RocksDBStoreD0Ev
fun:main
}

the suppression is: {
rocksdb mismatched free bluestore close
Memcheck:Free
fun:_ZdaPvmSt11align_val_t
fun:_ZN12RocksDBStore5closeEv
fun:_ZN12RocksDBStoreD*Ev
fun:_ZN9BlueStore9_close_dbEv
}

i'll combine them into 1:

{
rocksdb mismatched free bluestore close
Memcheck:Free
fun:_ZdaPvmSt11align_val_t
fun:_ZN12RocksDBStore5closeEv
fun:_ZN12RocksDBStoreD*Ev
fun:_ZN12RocksDBStoreD*Ev
...
}

Actions #22

Updated by Nitzan Mordechai about 1 month ago

/a/yaarit-2026-02-10_23:48:52-rados-wip-rocky10-branch-of-the-day-2026-02-09-1770676549-distro-default-trial/
['44504', '44382', '44329']

Actions #23

Updated by Radoslaw Zarzynski about 1 month ago

The fix went into testing.

Actions #24

Updated by Laura Flores 25 days ago

This PR is under test in https://tracker.ceph.com/issues/74811.

Actions #25

Updated by Laura Flores 22 days ago

/a/nmordech-2026-02-25_11:36:23-rados-wip-rocky10-branch-of-the-day-2026-02-24-1771941190-distro-default-trial/70160

Actions #26

Updated by Radoslaw Zarzynski 18 days ago

Still under QA.

Actions #27

Updated by Radoslaw Zarzynski 11 days ago

The associated PR fixes multiple tickets. Reapproved after a change.

Actions #28

Updated by Radoslaw Zarzynski 4 days ago

In QA.

Actions

Also available in: Atom PDF