Bug #69953: mds: segmentation faults in recent QA - CephFS - Ceph

1a947b3b1273f040cb2ef904cd9b4d02e3978120

Category:

Correctness/Safety

Target version:

Ceph - v20.0.0

% Done:

Source:

Q/A

Backport:

tentacle,squid

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

crash

Pull request ID:

62553

Tags (freeform):

temp-assign backport_processed

Merge Commit:

Fixed In:

v20.0.0-1424-g1a947b3b12

Released In:

v20.2.0~584

Upkeep Timestamp:

2025-11-01T01:00:33+00:00

Description

/a/teuthology-2025-02-01_20:24:16-fs-main-distro-default-smithi$ $ grep Segmentation */teu*
8108365/teuthology.log:2025-02-05T21:58:35.495 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 21:58:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[68126]: *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T21:58:35.496 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 21:58:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[68126]: 2025-02-05T21:58:35.201+0000 7f9620957640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T21:58:35.497 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 21:58:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[68126]:      0> 2025-02-05T21:58:35.201+0000 7f9620957640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T21:58:35.498 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 21:58:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[68126]:      0> 2025-02-05T21:58:35.201+0000 7f9620957640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:01:35.418 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:01:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71094]: *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:01:35.419 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:01:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71094]: 2025-02-05T22:01:35.064+0000 7f8a73e5b640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:01:35.420 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:01:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71094]:      0> 2025-02-05T22:01:35.064+0000 7f8a73e5b640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:01:35.421 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:01:35 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71094]:      0> 2025-02-05T22:01:35.064+0000 7f8a73e5b640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:03:59.167 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:03:58 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71936]: *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:03:59.168 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:03:58 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71936]: 2025-02-05T22:03:58.897+0000 7f751e9eb640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:03:59.169 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:03:58 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71936]:      0> 2025-02-05T22:03:58.897+0000 7f751e9eb640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:03:59.171 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:03:58 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[71936]:      0> 2025-02-05T22:03:58.897+0000 7f751e9eb640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:04:17.099 INFO:journalctl@ceph.mds.l.smithi196.stdout:Feb 05 22:04:16 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-l[67851]: *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:04:17.101 INFO:journalctl@ceph.mds.l.smithi196.stdout:Feb 05 22:04:16 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-l[67851]: 2025-02-05T22:04:16.693+0000 7f18cb072640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:04:17.102 INFO:journalctl@ceph.mds.l.smithi196.stdout:Feb 05 22:04:16 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-l[67851]:      0> 2025-02-05T22:04:16.693+0000 7f18cb072640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:04:17.104 INFO:journalctl@ceph.mds.l.smithi196.stdout:Feb 05 22:04:16 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-l[67851]:      0> 2025-02-05T22:04:16.693+0000 7f18cb072640 -1 *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:06:05.167 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:06:04 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[72705]: *** Caught signal (Segmentation fault) **
8108365/teuthology.log:2025-02-05T22:08:59.667 INFO:journalctl@ceph.mds.f.smithi196.stdout:Feb 05 22:08:59 smithi196 ceph-7dc4dc16-e409-11ef-bb7f-bd4984dce30f-mds-f[73220]: *** Caught signal (Segmentation fault) **
8108376/teuthology.log:2025-02-08T10:25:08.173 INFO:journalctl@ceph.mds.h.smithi022.stdout:Feb 08 10:25:07 smithi022 ceph-1c7f3ed2-e603-11ef-bb7f-bd4984dce30f-mds-h[67310]: *** Caught signal (Segmentation fault) **
8108380/teuthology.log:2025-02-08T10:20:08.770 INFO:journalctl@ceph.mds.c.smithi126.stdout:Feb 08 10:20:08 smithi126 ceph-dd7d6730-e603-11ef-bb7f-bd4984dce30f-mds-c[68624]: *** Caught signal (Segmentation fault) **
8108380/teuthology.log:2025-02-08T10:20:08.771 INFO:journalctl@ceph.mds.c.smithi126.stdout:Feb 08 10:20:08 smithi126 ceph-dd7d6730-e603-11ef-bb7f-bd4984dce30f-mds-c[68624]: 2025-02-08T10:20:08.392+0000 7f5ff818f640 -1 *** Caught signal (Segmentation fault) **
8108380/teuthology.log:2025-02-08T10:20:08.772 INFO:journalctl@ceph.mds.c.smithi126.stdout:Feb 08 10:20:08 smithi126 ceph-dd7d6730-e603-11ef-bb7f-bd4984dce30f-mds-c[68624]:      0> 2025-02-08T10:20:08.392+0000 7f5ff818f640 -1 *** Caught signal (Segmentation fault) **
8108380/teuthology.log:2025-02-08T10:20:08.773 INFO:journalctl@ceph.mds.c.smithi126.stdout:Feb 08 10:20:08 smithi126 ceph-dd7d6730-e603-11ef-bb7f-bd4984dce30f-mds-c[68624]:     -1> 2025-02-08T10:20:08.392+0000 7f5ff818f640 -1 *** Caught signal (Segmentation fault) **
8108479/teuthology.log:2025-02-08T12:00:34.246 INFO:journalctl@ceph.mds.f.smithi110.stdout:Feb 08 12:00:33 smithi110 ceph-0b4cc472-e612-11ef-bb7f-bd4984dce30f-mds-f[67785]: *** Caught signal (Segmentation fault) **
8108494/teuthology.log:2025-02-08T12:25:20.381 INFO:journalctl@ceph.mds.e.smithi137.stdout:Feb 08 12:25:20 smithi137 ceph-a6e2ee36-e615-11ef-bb7f-bd4984dce30f-mds-e[67699]: *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T12:55:42.607 INFO:journalctl@ceph.mds.c.smithi139.stdout:Feb 08 12:55:42 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-c[68955]: *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T13:02:47.357 INFO:journalctl@ceph.mds.l.smithi139.stdout:Feb 08 13:02:47 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-l[68399]: *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T13:07:46.358 INFO:journalctl@ceph.mds.l.smithi139.stdout:Feb 08 13:07:45 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-l[73073]: *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T13:07:46.359 INFO:journalctl@ceph.mds.l.smithi139.stdout:Feb 08 13:07:45 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-l[73073]: 2025-02-08T13:07:45.991+0000 7fab76d51640 -1 *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T13:07:46.360 INFO:journalctl@ceph.mds.l.smithi139.stdout:Feb 08 13:07:46 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-l[73073]:      0> 2025-02-08T13:07:45.991+0000 7fab76d51640 -1 *** Caught signal (Segmentation fault) **
8108514/teuthology.log:2025-02-08T13:07:46.361 INFO:journalctl@ceph.mds.l.smithi139.stdout:Feb 08 13:07:46 smithi139 ceph-4a546862-e619-11ef-bb7f-bd4984dce30f-mds-l[73073]:      0> 2025-02-08T13:07:45.991+0000 7fab76d51640 -1 *** Caught signal (Segmentation fault) **
...

https://pulpito.ceph.com/teuthology-2025-02-01_20:24:16-fs-main-distro-default-smithi/

Two issues here:

- the segmentation faults obviously
- teuthology is not reporting the core dumps as the primary failure reason; we should NEVER have segmentation faults and all other failures reasons are simply irrelevant in comparison

Whoever takes this: we need to figure out when these Segmentation faults were introduced; look at older QA runs to help bisect.

Related issues 9 (5 open — 4 closed)

Related to CephFS - Bug #68914: mds: Segmentation fault in mds_log_replay / MR_Finisher thread

Triaged

Venky Shankar

Related to Orchestrator - Bug #70247: Non-zero exit code 1 from systemctl reset-failed ceph-47356c0e-f761-11ef-bb88-bd4984dce30f@mon.a

New

Related to CephFS - Bug #70624: qa: assertion failure on context completion of C_MDS_RetryRequest

Resolved

Related to CephFS - Bug #70761: qa: mds crash and traceback seen when running fs:workload suite

Triaged

Related to CephFS - Bug #70723: qa: AddressSanitizer reports heap-use-after-free in mds-log-replay thread

Resolved

Milind Changire

Related to CephFS - Bug #71996: cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)"

Need More Info

Copied to CephFS - Backport #70924: reef: mds: segmentation faults in recent QA

QA Testing

Copied to CephFS - Backport #70925: squid: mds: segmentation faults in recent QA

Resolved

Milind Changire

Copied to CephFS - Backport #72653: tentacle: mds: segmentation faults in recent QA

Resolved

Milind Changire

Updated by Venky Shankar about 1 year ago

Patrick Donnelly wrote:

[...]

https://pulpito.ceph.com/teuthology-2025-02-01_20:24:16-fs-main-distro-default-smithi/

Two issues here:

- the segmentation faults obviously
- teuthology is not reporting the core dumps as the primary failure reason; we should NEVER have segmentation faults and all other failures reasons are simply irrelevant in comparison

I remember flagging his sometime last year in some forum - we obviously didn't take it seriously :/

FWIW, I always do a

find <run> -name "*core*"

for the fs suite run to avoid such mystery (I haven't done a fs suite run since mid January 2025 though).

Actions

Updated by Venky Shankar about 1 year ago

... and here is the crash backtrace

    -7> 2025-02-05T21:58:35.200+0000 7f9620957640 10 mds.0.log _replay: read_pos == write_pos
    -6> 2025-02-05T21:58:35.200+0000 7f9620957640 10 mds.0.log _replay - complete, 58099 events
    -5> 2025-02-05T21:58:35.200+0000 7f9620957640 10 mds.0.log _replay_thread kicking waiters
    -4> 2025-02-05T21:58:35.200+0000 7f9620957640 10 MDSContext::complete: 15C_MDS_BootStart
    -3> 2025-02-05T21:58:35.200+0000 7f9620957640  5 mds.0.0 Finished replaying journal as standby-replay
    -2> 2025-02-05T21:58:35.200+0000 7f9620957640 10 mds.0.0 setting replay timer
    -1> 2025-02-05T21:58:35.200+0000 7f9620957640 10 mds.0.log _replay_thread finish
     0> 2025-02-05T21:58:35.201+0000 7f9620957640 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9620957640 thread_name:mds-log-replay

 ceph version 19.3.0-7232-g44b51db6 (44b51db6813fb456c78075909d800e4ec3b2679f) squid (dev)
 1: /lib64/libc.so.6(+0x3e930) [0x7f962dd40930]
 2: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)+0x93) [0x7f962ec92713]
 3: (tcmalloc::ThreadCache::Cleanup()+0x48) [0x7f962ec92818]
 4: (tcmalloc::ThreadCache::DeleteCache(tcmalloc::ThreadCache*)+0x12) [0x7f962ec92b82]
 5: /lib64/libc.so.6(+0x873c1) [0x7f962dd893c1]
 6: /lib64/libc.so.6(+0x8a166) [0x7f962dd8c166]
 7: /lib64/libc.so.6(+0x10f300) [0x7f962de11300]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I have a feeling this is somewhat related to the MR_Finisher thread crash.

Actions

Updated by Venky Shankar about 1 year ago

Related to Bug #68914: mds: Segmentation fault in mds_log_replay / MR_Finisher thread added

Actions

Updated by Venky Shankar about 1 year ago

Also, there aren't any core dumps generated from the crash. Now I feel like an idiot doing find .... I mentioned in note-1.

Actions

Updated by Milind Changire about 1 year ago

Assignee set to Neeraj Pratap Singh

Actions

Updated by Venky Shankar about 1 year ago

Assignee changed from Neeraj Pratap Singh to Milind Changire

Actions

Updated by Afreen Misbah about 1 year ago

Related to Bug #69864: use same name for image size in create/update api added

Actions

Updated by Afreen Misbah about 1 year ago

Related to deleted (Bug #69864: use same name for image size in create/update api)

Actions

Updated by Milind Changire about 1 year ago · Edited

I've searched for Scrub error and Segmentation fault in the logs and these are possibly the first runs with Segmentation fault pointing to thread MR_Finisher or md_log_replay:

/teuthology/pdonnell-2022-08-19_22:40:41-fs:workload-wip-pdonnell-testing-20220819.203214-distro-default-smithi/6981777/teuthology.log.gz
2022-08-20T07:19:10.873 INFO:journalctl@ceph.mds.f.smithi099.stdout:Aug 20 07:19:10 smithi099 ceph-1066cfae-2056-11ed-8431-001a4aab830c-mds-f[124400]:  in thread 7f4716fc9700 thread_name:MR_Finisher
---
/teuthology/pdonnell-2022-08-22_18:53:15-fs:workload-wip-pdonnell-testing-20220822.164347-distro-default-smithi/6986012/teuthology.log.gz
2022-08-23T09:05:04.361 INFO:journalctl@ceph.mds.h.smithi055.stdout:Aug 23 09:05:03 smithi055 ceph-ecdc59ce-22c0-11ed-8431-001a4aab830c-mds-h[123274]:  in thread 7f72490d6700 thread_name:md_log_replay

oh, and also no core dumps accompanying the segfaults either.

Actions

#10

Updated by Venky Shankar about 1 year ago

Milind Changire wrote in #note-9:

I've searched for Scrub error and Segmentation fault in the logs and these are possibly the first runs with Segmentation fault pointing to thread MR_Finisher or md_log_replay:

[...]

oh, and also no core dumps accompanying the segfaults either.

oh, wow - this is happening from >2 years. So, we obviously need teuthology to fail a run when any ceph daemon crashes and ensure that the coredump survives.

As far as this issue is concerned, do we know the PRs in the batch where the issue was first seen? Its possible that the crash started to happen before that in which case that SHA can be the bisect point.

Actions

#11

Updated by Venky Shankar about 1 year ago

Also, let's link a tracker to this for following up with teuthology folks for flagging run failures when any ceph daemon crashes.

Actions

#12

Updated by Milind Changire about 1 year ago

Venky Shankar wrote in #note-10:

Milind Changire wrote in #note-9:

I've searched for Scrub error and Segmentation fault in the logs and these are possibly the first runs with Segmentation fault pointing to thread MR_Finisher or md_log_replay:

[...]

oh, and also no core dumps accompanying the segfaults either.

oh, wow - this is happening from >2 years. So, we obviously need teuthology to fail a run when any ceph daemon crashes and ensure that the coredump survives.

As far as this issue is concerned, do we know the PRs in the batch where the issue was first seen? Its possible that the crash started to happen before that in which case that SHA can be the bisect point.

I've started a git bisect with Patrick's PR as the HEAD ... but looks like the build farm has started acting up.

Actions

#13

Updated by Patrick Donnelly about 1 year ago

Venky Shankar wrote in #note-10:

Milind Changire wrote in #note-9:

I've searched for Scrub error and Segmentation fault in the logs and these are possibly the first runs with Segmentation fault pointing to thread MR_Finisher or md_log_replay:

[...]

oh, and also no core dumps accompanying the segfaults either.

oh, wow - this is happening from >2 years. So, we obviously need teuthology to fail a run when any ceph daemon crashes and ensure that the coredump survives.

The "watchdog" Jos wrote is supposed to fail a run but tearing down a running test is messy so it often appears to fail for other reasons. No idea why this particular test did not get torn down by the watchdog however. Perhaps because it's cephadm?

Actions

#14

Updated by Venky Shankar about 1 year ago

Patrick Donnelly wrote in #note-13:

Venky Shankar wrote in #note-10:

Milind Changire wrote in #note-9:

I've searched for Scrub error and Segmentation fault in the logs and these are possibly the first runs with Segmentation fault pointing to thread MR_Finisher or md_log_replay:

[...]

oh, and also no core dumps accompanying the segfaults either.

oh, wow - this is happening from >2 years. So, we obviously need teuthology to fail a run when any ceph daemon crashes and ensure that the coredump survives.

The "watchdog" Jos wrote is supposed to fail a run but tearing down a running test is messy so it often appears to fail for other reasons. No idea why this particular test did not get torn down by the watchdog however. Perhaps because it's cephadm?

If that's the case then starting the qa suite isn't really reporting daemon crashes since quincy and that probably explains the long duration since the time this crash is being seen.

Coming to the crash itself, which is

 1: /lib64/libc.so.6(+0x3e930) [0x7f962dd40930]
 2: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)+0x93) [0x7f962ec92713]
 3: (tcmalloc::ThreadCache::Cleanup()+0x48) [0x7f962ec92818]
 4: (tcmalloc::ThreadCache::DeleteCache(tcmalloc::ThreadCache*)+0x12) [0x7f962ec92b82]
 5: /lib64/libc.so.6(+0x873c1) [0x7f962dd893c1]
 6: /lib64/libc.so.6(+0x8a166) [0x7f962dd8c166]
 7: /lib64/libc.so.6(+0x10f300) [0x7f962de11300]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Which looks like a fault when releasing memory back to tcmalloc. I'm not sure if this is something in tcmalloc or our code. (maybe using the libc allocator and running the test might give a more understandable backtrace?).

Actions

#15

Updated by Milind Changire about 1 year ago

I've raised Orchestrator Bug #70247 since the containers just don't run for me anymore.
This issue seems to have resurfaced.
I'm not sure if its because of the cephadm commit that I have in the branch.

Actions

#16

Updated by Venky Shankar about 1 year ago

Related to Bug #70247: Non-zero exit code 1 from systemctl reset-failed ceph-47356c0e-f761-11ef-bb88-bd4984dce30f@mon.a added

Actions

https://pulpito.ceph.com/mchangir-2025-03-03_17:42:13-fs:workload-wip-mchangir-use-libc-for-segfault-main-debug-testing-default-smithi/8167010

#17

Updated by Milind Changire about 1 year ago

The above job was using a build with the libc allocator (not the tcmalloc allocator):

here's the stack trace of the crash in the mgr:

    -2> 2025-03-03T18:20:37.037+0000 7f3289ffb640 10 log_client handle_log_ack log(last 374)
    -1> 2025-03-03T18:20:37.037+0000 7f3289ffb640 10 log_client  logged 2025-03-03T18:20:35.945206+0000 mgr.x (mgr.14232) 373 : cluster [DBG] pgmap v353: 129 pgs: 129 active+clean; 651 KiB data, 388 MiB used, 1.0 TiB / 1.0 TiB avail; 4.0 KiB/s rd, 682 B/s wr, 6 op/s
     0> 2025-03-03T18:20:37.039+0000 7f3289ffb640 -1 *** Caught signal (Aborted) **
 in thread 7f3289ffb640 thread_name:ms_dispatch

 ceph version 19.3.0-7772-gcfa5ba05 (cfa5ba052b03f5b29c75de806210d0bdc7462583) squid (dev)
 1: /lib64/libc.so.6(+0x3ebf0) [0x7f32a0fc8bf0]
 2: /lib64/libc.so.6(+0x8bd4c) [0x7f32a1015d4c]
 3: raise()
 4: abort()
 5: /lib64/libc.so.6(+0x29172) [0x7f32a0fb3172]
 6: /lib64/libc.so.6(+0x95df7) [0x7f32a101fdf7]
 7: /lib64/libc.so.6(+0x97b5a) [0x7f32a1021b5a]
 8: free()
 9: /usr/lib64/ceph/libceph-common.so.2(+0x215039) [0x7f32a16c7039]
 10: /usr/lib64/ceph/libceph-common.so.2(+0x216cd6) [0x7f32a16c8cd6]
 11: (LogClient::handle_log_ack(MLogAck*)+0x62b) [0x7f32a16d241f]
 12: (MonClient::ms_dispatch(Message*)+0x556) [0x7f32a1a0f8fa]
 13: /usr/lib64/ceph/libceph-common.so.2(+0x554fa6) [0x7f32a1a06fa6]
 14: /usr/lib64/ceph/libceph-common.so.2(+0x3a8255) [0x7f32a185a255]
 15: (DispatchQueue::entry()+0x663) [0x7f32a185abfb]
 16: /usr/lib64/ceph/libceph-common.so.2(+0x4965eb) [0x7f32a19485eb]
 17: (Thread::entry_wrapper()+0x33) [0x7f32a16e4c63]
 18: (Thread::_entry_func(void*)+0xd) [0x7f32a16e4c79]
 19: /lib64/libc.so.6(+0x8a002) [0x7f32a1014002]
 20: /lib64/libc.so.6(+0x10f070) [0x7f32a1099070]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I'll be trying out builds with:

-fsanitize=address
valgrind

to see if we can catch this earlier

Actions

#18

Updated by Milind Changire about 1 year ago

libtool hates -fsanitize=address
the only way I can instrument the code with the Address Sanitizer is to disable the use of libtool during the build
however, there are many projects and dependent RPMs which depend on libtool ... earlier I thought it was only the erasure-code project
I even resorted to removing the libtool RPM from the system in the %build phase ... which revealed other package dependencies on libtool

there's some doc and YAMLs in the teuthology repo under the docs/laptop/ dir
I'm going to see if a mock teuthology setup is possible on my laptop and if I can run the tests locally.

Actions

#19

Updated by Milind Changire about 1 year ago

since the build on my laptop doesn't invoke libtool (AFAICT), the only way forward seems to be is to build with -fsanitize=address and build the cluster manually and run the fs:workload tests and await a Segmentation fault and eventually a core-dump

Actions

#20

Updated by Milind Changire about 1 year ago

So far I've been able to quiesce two use-after-free issues:
1. mgr: pertaining to dereference issue with MgrOpRequest (quiesced by adding a ref)
2. mds: pertaining to dereference issue with LogSegment (quiesced by making LogSegment a RefCountedObj and adding ref get() and put() calls)

fyi - extensive suite-wide tests have not been exercised yet

I still have 1 use-after-free issue to address.
This is related to a continuation context being referenced after being destroyed: C_Flush_Journal

Actions

#21

Updated by Milind Changire about 1 year ago · Edited

further investigation reveals ...

C_Flush_Journal::expire_segments() adds new sub to the gather context for each expiring segment
however, the context added to the gather context calls C_Flush_Journal::trim_expired_segments()
and this function further down the line calls C_Flush_Journal::complete() ... which destroys the C_Flush_Journal object
so the second invocation of the gather context of the expiring segments proceeds to invoke C_Flush_Journal::trim_expired_segments() again
... leading to a use-after-free of the C_Flush_Journal object

should I make the C_Flush_Journal a RefCountedObj object as well ?

the above issue is seen via the "flush journal" asok command where the C_Flush_Journal object is created

there's a second instance of creation of the C_Flush_Journal object that needs to be investigated as well

Actions

#22

Updated by Milind Changire about 1 year ago

for the C_FLush_Journal issue, I wonder if it would be sufficient to associate the labmda context completion to the last of the expiring segments i.e. would it be safe to assume that the expiring happens in "order" so that we don't need to add ref counting to C_Flush_Journal

Actions

#23

Updated by Venky Shankar about 1 year ago

Milind Changire wrote in #note-21:

further investigation reveals ...

C_Flush_Journal::expire_segments() adds new sub to the gather context for each expiring segment
however, the context added to the gather context calls C_Flush_Journal::trim_expired_segments()
and this function further down the line calls C_Flush_Journal::complete() ... which destroys the C_Flush_Journal object
so the second invocation of the gather context of the expiring segments proceeds to invoke C_Flush_Journal::trim_expired_segments() again
... leading to a use-after-free of the C_Flush_Journal object

should I make the C_Flush_Journal a RefCountedObj object as well ?

The gather completion would only be called when the gather context is activated and when all subs finish. In this case, C_Flush_Journal::trim_expired_segments() would only be called after expiry_gather.activate() is invoked (in C_Flush_Journal::expire_segments()@) and all gather subs finish.

Actions

#24

Updated by Milind Changire about 1 year ago

okay, here's the update about C_Flush_Journal:

  void trim_expired_segments() {
    ceph_assert(ceph_mutex_is_locked_by_me(mds->mds_lock));
    dout(5) << __func__ << ": expiry complete, expire_pos/trim_pos is now " 
            << std::hex << mdlog->get_journaler()->get_expire_pos() << "/" 
            << mdlog->get_journaler()->get_trimmed_pos() << dendl;

    // Now everyone I'm interested in is expired
    auto* ctx = new MDSInternalContextWrapper(mds, new LambdaContext([this](int r) {
      handle_write_head(r);
    }));
    mdlog->trim_expired_segments(ctx);

    dout(5) << __func__ << ": trimming is complete; wait for journal head write. Journal expire_pos/trim_pos is now " 
            << std::hex << mdlog->get_journaler()->get_expire_pos() << "/" 
            << mdlog->get_journaler()->get_trimmed_pos() << dendl;
  }

Venky helped to identify that the last dout that gets executed after the context completion is the culprit due to references to data members in the C_Flush_Journal object.

Actions

#25

Updated by Milind Changire about 1 year ago

another use-after-free event in C_Flush_Journal ...

C_Flush_Journal::flush_mdlog() creates a subtreemap event and submits it to MDLog. The event then gets destroyed after handing over to the Journal. C_Flush_Journal::flush_mdlog() then reads the sequence number of the submitted event ... which is a use-after-free violation.

Actions

#26

Updated by Milind Changire 12 months ago

Related to Bug #70624: qa: assertion failure on context completion of C_MDS_RetryRequest added
Related to Bug #70761: qa: mds crash and traceback seen when running fs:workload suite added
Related to Bug #70723: qa: AddressSanitizer reports heap-use-after-free in mds-log-replay thread added

Actions

#27

Updated by Venky Shankar 12 months ago

Status changed from New to Fix Under Review
Backport set to reef,squid
Pull request ID set to 62553

Actions

#28

Updated by Venky Shankar 11 months ago

Status changed from Fix Under Review to Pending Backport

Actions

#29

Updated by Upkeep Bot 11 months ago

Copied to Backport #70924: reef: mds: segmentation faults in recent QA added

Actions

#30

Updated by Upkeep Bot 11 months ago

Copied to Backport #70925: squid: mds: segmentation faults in recent QA added

Actions

#31

Updated by Upkeep Bot 11 months ago

Tags (freeform) set to backport_processed

Actions

#32

Updated by Venky Shankar 11 months ago

Status changed from Pending Backport to Fix Under Review

This is an umbrella tracker - other fixes are still under review.

Actions

#33

Updated by Patrick Donnelly 9 months ago

Status changed from Fix Under Review to Pending Backport
Backport changed from reef,squid to tentacle,squid

Actions

#34

Updated by Upkeep Bot 9 months ago

Merge Commit set to 1a947b3b1273f040cb2ef904cd9b4d02e3978120
Fixed In set to v20.0.0-1424-g1a947b3b127
Upkeep Timestamp set to 2025-07-08T18:07:30+00:00

Actions

#35

Updated by Venky Shankar 8 months ago

Related to Bug #71996: cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)" added

Actions

#36

Updated by Upkeep Bot 8 months ago

Fixed In changed from v20.0.0-1424-g1a947b3b127 to v20.0.0-1424-g1a947b3b1273
Upkeep Timestamp changed from 2025-07-08T18:07:30+00:00 to 2025-07-14T15:21:59+00:00

Actions

#37

Updated by Upkeep Bot 8 months ago

Fixed In changed from v20.0.0-1424-g1a947b3b1273 to v20.0.0-1424-g1a947b3b12
Upkeep Timestamp changed from 2025-07-14T15:21:59+00:00 to 2025-07-14T20:46:27+00:00

Actions

#38

Updated by Milind Changire 7 months ago

Status changed from Pending Backport to New

trying to trigger backport tracker cloning for tentacle

Actions

#39

Updated by Milind Changire 7 months ago

Status changed from New to Pending Backport

Actions

#40

Updated by Venky Shankar 7 months ago

Milind Changire wrote in #note-38:

trying to trigger backport tracker cloning for tentacle

These should already be in tentacle branch, isn't it?

Actions

#41

Updated by Milind Changire 7 months ago

Status changed from Pending Backport to Fix Under Review
Tags (freeform) deleted (~~backport_processed~~)

trying to retrigger tracker cloning to tentacle

Actions

#42

Updated by Milind Changire 7 months ago

Status changed from Fix Under Review to Pending Backport

Actions

#43

Updated by Upkeep Bot 7 months ago

Copied to Backport #72653: tentacle: mds: segmentation faults in recent QA added

Actions

#44

Updated by Upkeep Bot 7 months ago

Tags (freeform) set to backport_processed

Actions

#45

Updated by Milind Changire 7 months ago

Venky Shankar wrote in #note-40:

Milind Changire wrote in #note-38:

trying to trigger backport tracker cloning for tentacle

These should already be in tentacle branch, isn't it?

yes ... indeed they are
didn't check that before
sorry for the confusion

I've restored the backport_processed tag as well

Actions

#46

Updated by Venky Shankar 7 months ago

Milind Changire wrote in #note-45:

Venky Shankar wrote in #note-40:

Milind Changire wrote in #note-38:

trying to trigger backport tracker cloning for tentacle

These should already be in tentacle branch, isn't it?

yes ... indeed they are
didn't check that before
sorry for the confusion

I've restored the backport_processed tag as well

Yeh. But the bot created the backport tracker which should be closed now.

Milind Changire wrote in #note-45:

Venky Shankar wrote in #note-40:

Milind Changire wrote in #note-38:

trying to trigger backport tracker cloning for tentacle

These should already be in tentacle branch, isn't it?

yes ... indeed they are
didn't check that before
sorry for the confusion

I've restored the backport_processed tag as well

Yeh. But the bot created backport trackers which should be closed now.

Actions

#47

Updated by Upkeep Bot 5 months ago

Released In set to v20.2.0~584
Upkeep Timestamp changed from 2025-07-14T20:46:27+00:00 to 2025-11-01T01:00:33+00:00

Actions

#48

Updated by Venky Shankar about 2 months ago

Assignee changed from Milind Changire to Mahesh Mohan
Tags (freeform) changed from backport_processed to temp-assign

Actions