Actions

Copy link

Bug #72713

open

SIGSEGV when ceph-csi-rbd driver schedules RBD volume deletion

Added by Max Carrara 7 months ago. Updated 4 months ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (dev)

Backport:

Regression:

Yes

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v19.2.3

ceph-qa-suite:

Pull request ID:

66446

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Description

Some of our users had reported that all of their MGRs segfault [1] [2].

It turned out that what these users have in common is that they're using Kubernetes with Ceph as storage via the ceph-csi-rbd and ceph-csi-cephfs drivers [3].

Was able to reproduce this on a fresh Ceph 19.2.3 cluster on top of Proxmox VE 9, with a separate 3-node Kubernetes cluster using ceph-csi-rbd as driver. That's 6 hosts in total; all of them virtualized. Since the ceph-csi-rbd and ceph-csi-cephfs drivers are just clients, I figured it's better to report this here.

Here are my findings so far:

The MGR seems to segfault quite consistently when removing an image from the trash that was previously provisioned via ceph-csi-rbd.
Removing the image manually via rbd trash remove—or if that fails, via rbd trash purge—and then bringing the MGRs up again with systemctl reset-failed && systemctl restart ceph-mgr.target seems to temporarily fix the issue.
- So, as long as the "faulty" volume remains in the trash, all MGRs in the cluster will segfault immediately after starting, right when attempting to remove the volume from the trash.
The issue remains fixed until a volume provisioned by the ceph-csi-rbd driver ends up in the trash and the MGR attempts to remove it again.
On occasion, the MDS seems to receive a SIGABRT as well when that happens, so it's possible that this is related to cephfs as well.

Here's an excerpt from a recent crash:

   -20> 2025-08-25T11:25:36.976+0200 7f9321e706c0  5 librbd::ManagedLock: 0x5946fcba81b8 handle_acquire_lock: successfully acquired exclusive lock
   -19> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: tick
   -18> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_tickets
   -17> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2025-08-25T11:25:06.994275+0200)
   -16> 2025-08-25T11:25:37.006+0200 7f932e7a16c0  0 log_channel(cluster) log [DBG] : pgmap v10: 417 pgs: 416 active+clean, 1 unknown; 6.3 MiB data, 32 GiB used, 320 GiB / 352 GiB avail; 204 B/s rd, 0 op/s
   -15> 2025-08-25T11:25:37.006+0200 7f932e7a16c0 10 monclient: _send_mon_message to mon.ceph-test-01 at v2:172.16.64.221:3300/0
   -14> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_exclusive_lock: r=0
   -13> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 validate_image_removal:
   -12> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_snaps:
   -11> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 list_image_watchers:
   -10> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::Watcher: 0x5946fc669500 notifications_blocked: blocked=0
    -9> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_list_image_watchers: r=0
    -8> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_watchers:
    -7> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_group:
    -6> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_check_group: r=0
    -5> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 finish: r=0
    -4> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::RemoveRequest: 0x5946f62b0000 handle_pre_remove_image: r=0
    -3> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::TrimRequest: 0x5946fbd0b480 send_pre_trim:  delete_start_min=0 num_objects=512
    -2> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::TrimRequest: 0x5946fbd0b480 send_remove_objects:  delete_start=0 num_objects=512
    -1> 2025-08-25T11:25:37.012+0200 7f932166f6c0  0 [progress INFO root] update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash)
     0> 2025-08-25T11:25:37.013+0200 7f932166f6c0 -1 *** Caught signal (Segmentation fault) **
 in thread 7f932166f6c0 thread_name:io_context_pool

 ceph version 19.2.3 (bfe79fc8ee46f629d9ce4db0a202f0f9c0a94ac7) squid (stable)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x7f9348c49df0]
 2: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1598b0) [0x7f934a3598b0]
 3: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a1843) [0x7f934a3a1843]
 4: _PyType_LookupRef()
 5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a216b) [0x7f934a3a216b]
 6: PyObject_GetAttr()
 7: _PyEval_EvalFrameDefault()
 8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1109dd) [0x7f934a3109dd]
 9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x3d3442) [0x7f934a5d3442]
 10: /lib/python3/dist-packages/rbd.cpython-313-x86_64-linux-gnu.so(+0xacfed) [0x7f9336b4ffed]
 11: /lib/librbd.so.1(+0x3cc8af) [0x7f93363cc8af]
 12: /lib/librbd.so.1(+0x3ccfed) [0x7f93363ccfed]
 13: /lib/librbd.so.1(+0x3afec6) [0x7f93363afec6]
 14: /lib/librbd.so.1(+0x3b0560) [0x7f93363b0560]
 15: /lib/librbd.so.1(+0x2cac93) [0x7f93362cac93]
 16: /lib/librbd.so.1(+0x12e7bd) [0x7f933612e7bd]
 17: /lib/librbd.so.1(+0x2b1c9e) [0x7f93362b1c9e]
 18: /lib/librbd.so.1(+0x2b4379) [0x7f93362b4379]
 19: /lib/librados.so.2(+0xd2716) [0x7f9348ae4716]
 20: /lib/librados.so.2(+0xd3705) [0x7f9348ae5705]
 21: /lib/librados.so.2(+0xd3f8a) [0x7f9348ae5f8a]
 22: /lib/librados.so.2(+0xea598) [0x7f9348afc598]
 23: /lib/librados.so.2(+0xd7a71) [0x7f9348ae9a71]
 24: /lib/librados.so.2(+0xedf63) [0x7f9348afff63]
 25: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xe1224) [0x7f9348ee1224]
 26: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7f9348c9cb7b]
 27: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x7f9348d1a7b8]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Note the line with update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash).

I've rebuilt Ceph 19.2.3 locally and managed to make it retain its dbgsyms with export DEB_BUILD_OPTIONS="nostrip". That way I was able to gather a bunch of coredumps via coredumpctl. I've extracted the backtraces and attached the most recent ones as files. I've also attached all recent crash dumps (produced by ceph-crash). Note that the crash dumps might all be quite similar; the respective ceph-mgr systemd unit restarts a couple times on exit before giving up.

Doing some debugging myself, it seems that a callback passed to trash_remove in src/pybind/rbd/rbd.pyx [4] might be the cause here, with progress_callback in src/pybind/mgr/rbd_support/task.py [5] being the callback used.

I'll see if I can dig up more, but that's all I've found for now. If there's anything specific you'd like me to try or test, please let me know!

[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=6635
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/
[3]: https://github.com/ceph/ceph-csi
[4]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907
[5]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/mgr/rbd_support/task.py#L458-L480

Files

Download all files

coredumpctl-gdb-2025-08-25T13_30_03+02_00.log (234 KB) coredumpctl-gdb-2025-08-25T13_30_03+02_00.log	ceph-mds: SIGABRT	Max Carrara, 08/25/2025 12:10 PM
coredumpctl-gdb-2025-08-25T13_31_20+02_00.log (2.91 MB) coredumpctl-gdb-2025-08-25T13_31_20+02_00.log	ceph-mgr: SIGSEGV	Max Carrara, 08/25/2025 12:10 PM
var-lib-ceph-crash.tar.gz (1.39 MB) var-lib-ceph-crash.tar.gz	Contents of /var/lib/ceph/crash	Max Carrara, 08/25/2025 12:16 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Max Carrara 6 months ago

Short update: I've managed to provide a workaround for this bug on our end [1].

tl;dr: Disabling the on_progress callbacks prevents the collective segfaults. Instead of the passed callback, the default no-op callback is used instead.

I'll paraphrase what I mentioned in the workaround patch here:

I have a very strong suspicion that this might be related to Python sub-interpreters (yet again). To paraphrase myself from the workaround patch [1], I believe that the internal changes to Python sub-interpreters in Python 3.12 and 3.13 might be at fault.

What leads me to suspect this are the following three clues:

A user on our forum reported that the issue vanishes as soon as they set up a Ceph MGR inside a Debian Bookworm VM. That MGR must also be the active one. Bookworm has Python version 3.11, which is the version before any substantial changes to sub-interpreters [3]⁴ were made.
There's another bug [5] regarding another segfault during MGR startup. The author concluded that the problem is related to sub-interpreters and opened an issue [6] on Python's issue tracker that goes into more detail. The code path here is completely different, but it shows that problems regarding sub-interpreters are popping up elsewhere at the very least.
The segfault happens inside the Python interpreter, as can be seen in the first stacktrace of the "ceph-mgr: SIGSEGV" attachment. The on_progress callback that the MGR passes through Cython [7] all the way down to librbd segfaults after it is called.

I'll let you know once I found out more.

[1]: https://lore.proxmox.com/pve-devel/20250909170515.606422-1-m.carrara@proxmox.com/
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/page-3#post-796315
[3]: https://docs.python.org/3.12/whatsnew/3.12.html#pep-684-a-per-interpreter-gil
[4]: https://github.com/python/cpython/issues/117953
[5]: https://tracker.ceph.com/issues/67696
[6]: https://github.com/python/cpython/issues/138045
[7]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907

Actions

Copy link

Updated by Kefu Chai 4 months ago

The RBD Python bindings experience segmentation faults when using progress callbacks (on_progress) with Python 3.13. This issue affects operations like trash_remove_with_progress() and similar APIs that accept callback functions.

Root Cause¶

The segfault is caused by Python 3.13's implementation of PEP 684 (Per-Interpreter GIL), which introduces stricter sub-interpreter isolation. The problem occurs due to the interaction between Cython's GIL management and callback invocation patterns:

Current Implementation Flow:¶

Python code passes a callable object as on_progress parameter
Cython releases the GIL: with nogil: (rbd.pyx:906)
Python callable is cast to void* and passed to C++ librbd
C++ code invokes the callback function pointer
Cython callback attempts to re-acquire GIL: with gil: (rbd.pyx:389)
Callback accesses the Python object: (<object>ptr)(offset, total)

Why It Crashes on Python 3.13:¶

Python 3.13's per-interpreter GIL creates isolated interpreter states
When the callback re-acquires the GIL, the interpreter context may be incompatible with the Python object being accessed
Internal Python functions like _Py_dict_lookup expect per-interpreter data structures to be valid
Accessing Python objects from an incompatible interpreter context corrupts internal state and causes segfaults

Affected Code Locations¶

src/pybind/rbd/rbd.pyx:¶


  # Line 389-390: Callback definition                                                                                          
  cdef int progress_callback(uint64_t offset, uint64_t total, void* ptr) with gil:
      return (<object>ptr)(offset, total)

  # Lines 903-908: Callback registration
  if on_progress:                                                                                                                                                                                                                                              
      _prog_cb = &progress_callback
      _prog_arg = <void *>on_progress  # Python object as void*                                                                                                                                                                                                
  with nogil:                                  
      ret = rbd_trash_remove_with_progress(_ioctx, _image_id, _force,
                                           _prog_cb, _prog_arg)

Other affected functions:¶

RBD.trash_remove() (line 904)
RBD.trash_move() (line 796)
RBD.trash_purge() (line 1145, 1172, 1199)
Image.copy() (line 4335)
And several other operations with progress callbacks

Current Status (as of 2025-01-27)¶

Main branch still suffers from this issue. No fixes have been merged to address Python 3.13 compatibility for callbacks. A search through commits since 2024 shows:

No Python 3.13-specific callback fixes
No PEP 684 compatibility changes
No sub-interpreter safety improvements for callbacks (https://github.com/ceph/ceph/pull/62951 is specific to PyO3)

Actions

Copy link

Updated by Kefu Chai 4 months ago

with https://github.com/ceph/ceph/pull/66244, we should be able to workaround this issue by running mgr modules (like rbd_support) in the main interpreter, and the callback also executes in the main interpreter's context.

Actions

Copy link

Updated by Kefu Chai 4 months ago · Edited

Status changed from New to Fix Under Review
Pull request ID set to 66446

Actions

Copy link

Updated by Kefu Chai 4 months ago

Related to Bug #73857: rbd mirror snapshot hang/failure on rocky10 added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Tags

Custom queries

Bug #72713

SIGSEGV when ceph-csi-rbd driver schedules RBD volume deletion

Updated by Max Carrara 6 months ago

Updated by Kefu Chai 4 months ago

Root Cause¶

Current Implementation Flow:¶

Why It Crashes on Python 3.13:¶

Affected Code Locations¶

src/pybind/rbd/rbd.pyx:¶

Other affected functions:¶

Current Status (as of 2025-01-27)¶

Updated by Kefu Chai 4 months ago

Updated by Kefu Chai 4 months ago · Edited

Updated by Kefu Chai 4 months ago