Project

General

Profile

Actions

Bug #72713

open

SIGSEGV when ceph-csi-rbd driver schedules RBD volume deletion

Added by Max Carrara 7 months ago. Updated 4 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

Some of our users had reported that all of their MGRs segfault [1] [2].

It turned out that what these users have in common is that they're using Kubernetes with Ceph as storage via the ceph-csi-rbd and ceph-csi-cephfs drivers [3].

Was able to reproduce this on a fresh Ceph 19.2.3 cluster on top of Proxmox VE 9, with a separate 3-node Kubernetes cluster using ceph-csi-rbd as driver. That's 6 hosts in total; all of them virtualized. Since the ceph-csi-rbd and ceph-csi-cephfs drivers are just clients, I figured it's better to report this here.

Here are my findings so far:

  • The MGR seems to segfault quite consistently when removing an image from the trash that was previously provisioned via ceph-csi-rbd.
  • Removing the image manually via rbd trash remove—or if that fails, via rbd trash purge—and then bringing the MGRs up again with systemctl reset-failed && systemctl restart ceph-mgr.target seems to temporarily fix the issue.
    • So, as long as the "faulty" volume remains in the trash, all MGRs in the cluster will segfault immediately after starting, right when attempting to remove the volume from the trash.
  • The issue remains fixed until a volume provisioned by the ceph-csi-rbd driver ends up in the trash and the MGR attempts to remove it again.
  • On occasion, the MDS seems to receive a SIGABRT as well when that happens, so it's possible that this is related to cephfs as well.

Here's an excerpt from a recent crash:

   -20> 2025-08-25T11:25:36.976+0200 7f9321e706c0  5 librbd::ManagedLock: 0x5946fcba81b8 handle_acquire_lock: successfully acquired exclusive lock
   -19> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: tick
   -18> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_tickets
   -17> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2025-08-25T11:25:06.994275+0200)
   -16> 2025-08-25T11:25:37.006+0200 7f932e7a16c0  0 log_channel(cluster) log [DBG] : pgmap v10: 417 pgs: 416 active+clean, 1 unknown; 6.3 MiB data, 32 GiB used, 320 GiB / 352 GiB avail; 204 B/s rd, 0 op/s
   -15> 2025-08-25T11:25:37.006+0200 7f932e7a16c0 10 monclient: _send_mon_message to mon.ceph-test-01 at v2:172.16.64.221:3300/0
   -14> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_exclusive_lock: r=0
   -13> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 validate_image_removal:
   -12> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_snaps:
   -11> 2025-08-25T11:25:37.010+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 list_image_watchers:
   -10> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::Watcher: 0x5946fc669500 notifications_blocked: blocked=0
    -9> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_list_image_watchers: r=0
    -8> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_watchers:
    -7> 2025-08-25T11:25:37.011+0200 7f9321e706c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_group:
    -6> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_check_group: r=0
    -5> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 finish: r=0
    -4> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::image::RemoveRequest: 0x5946f62b0000 handle_pre_remove_image: r=0
    -3> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::TrimRequest: 0x5946fbd0b480 send_pre_trim:  delete_start_min=0 num_objects=512
    -2> 2025-08-25T11:25:37.011+0200 7f932166f6c0  5 librbd::TrimRequest: 0x5946fbd0b480 send_remove_objects:  delete_start=0 num_objects=512
    -1> 2025-08-25T11:25:37.012+0200 7f932166f6c0  0 [progress INFO root] update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash)
     0> 2025-08-25T11:25:37.013+0200 7f932166f6c0 -1 *** Caught signal (Segmentation fault) **
 in thread 7f932166f6c0 thread_name:io_context_pool

 ceph version 19.2.3 (bfe79fc8ee46f629d9ce4db0a202f0f9c0a94ac7) squid (stable)
 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x7f9348c49df0]
 2: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1598b0) [0x7f934a3598b0]
 3: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a1843) [0x7f934a3a1843]
 4: _PyType_LookupRef()
 5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a216b) [0x7f934a3a216b]
 6: PyObject_GetAttr()
 7: _PyEval_EvalFrameDefault()
 8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1109dd) [0x7f934a3109dd]
 9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x3d3442) [0x7f934a5d3442]
 10: /lib/python3/dist-packages/rbd.cpython-313-x86_64-linux-gnu.so(+0xacfed) [0x7f9336b4ffed]
 11: /lib/librbd.so.1(+0x3cc8af) [0x7f93363cc8af]
 12: /lib/librbd.so.1(+0x3ccfed) [0x7f93363ccfed]
 13: /lib/librbd.so.1(+0x3afec6) [0x7f93363afec6]
 14: /lib/librbd.so.1(+0x3b0560) [0x7f93363b0560]
 15: /lib/librbd.so.1(+0x2cac93) [0x7f93362cac93]
 16: /lib/librbd.so.1(+0x12e7bd) [0x7f933612e7bd]
 17: /lib/librbd.so.1(+0x2b1c9e) [0x7f93362b1c9e]
 18: /lib/librbd.so.1(+0x2b4379) [0x7f93362b4379]
 19: /lib/librados.so.2(+0xd2716) [0x7f9348ae4716]
 20: /lib/librados.so.2(+0xd3705) [0x7f9348ae5705]
 21: /lib/librados.so.2(+0xd3f8a) [0x7f9348ae5f8a]
 22: /lib/librados.so.2(+0xea598) [0x7f9348afc598]
 23: /lib/librados.so.2(+0xd7a71) [0x7f9348ae9a71]
 24: /lib/librados.so.2(+0xedf63) [0x7f9348afff63]
 25: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xe1224) [0x7f9348ee1224]
 26: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7f9348c9cb7b]
 27: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x7f9348d1a7b8]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Note the line with update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash).

I've rebuilt Ceph 19.2.3 locally and managed to make it retain its dbgsyms with export DEB_BUILD_OPTIONS="nostrip". That way I was able to gather a bunch of coredumps via coredumpctl. I've extracted the backtraces and attached the most recent ones as files. I've also attached all recent crash dumps (produced by ceph-crash). Note that the crash dumps might all be quite similar; the respective ceph-mgr systemd unit restarts a couple times on exit before giving up.

Doing some debugging myself, it seems that a callback passed to trash_remove in src/pybind/rbd/rbd.pyx [4] might be the cause here, with progress_callback in src/pybind/mgr/rbd_support/task.py [5] being the callback used.

I'll see if I can dig up more, but that's all I've found for now. If there's anything specific you'd like me to try or test, please let me know!

[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=6635
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/
[3]: https://github.com/ceph/ceph-csi
[4]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907
[5]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/mgr/rbd_support/task.py#L458-L480


Files

coredumpctl-gdb-2025-08-25T13_30_03+02_00.log (234 KB) coredumpctl-gdb-2025-08-25T13_30_03+02_00.log ceph-mds: SIGABRT Max Carrara, 08/25/2025 12:10 PM
coredumpctl-gdb-2025-08-25T13_31_20+02_00.log (2.91 MB) coredumpctl-gdb-2025-08-25T13_31_20+02_00.log ceph-mgr: SIGSEGV Max Carrara, 08/25/2025 12:10 PM
var-lib-ceph-crash.tar.gz (1.39 MB) var-lib-ceph-crash.tar.gz Contents of /var/lib/ceph/crash Max Carrara, 08/25/2025 12:16 PM

Related issues 1 (1 open0 closed)

Related to mgr - Bug #73857: rbd mirror snapshot hang/failure on rocky10Fix Under ReviewSamuel Just

Actions
Actions #1

Updated by Max Carrara 6 months ago

Short update: I've managed to provide a workaround for this bug on our end [1].

tl;dr: Disabling the on_progress callbacks prevents the collective segfaults. Instead of the passed callback, the default no-op callback is used instead.

I'll paraphrase what I mentioned in the workaround patch here:

I have a very strong suspicion that this might be related to Python sub-interpreters (yet again). To paraphrase myself from the workaround patch [1], I believe that the internal changes to Python sub-interpreters in Python 3.12 and 3.13 might be at fault.

What leads me to suspect this are the following three clues:

  1. A user on our forum reported that the issue vanishes as soon as they set up a Ceph MGR inside a Debian Bookworm VM. That MGR must also be the active one. Bookworm has Python version 3.11, which is the version before any substantial changes to sub-interpreters [3]4 were made.
  2. There's another bug [5] regarding another segfault during MGR startup. The author concluded that the problem is related to sub-interpreters and opened an issue [6] on Python's issue tracker that goes into more detail. The code path here is completely different, but it shows that problems regarding sub-interpreters are popping up elsewhere at the very least.
  3. The segfault happens inside the Python interpreter, as can be seen in the first stacktrace of the "ceph-mgr: SIGSEGV" attachment. The on_progress callback that the MGR passes through Cython [7] all the way down to librbd segfaults after it is called.

I'll let you know once I found out more.

[1]: https://lore.proxmox.com/pve-devel/20250909170515.606422-1-m.carrara@proxmox.com/
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/page-3#post-796315
[3]: https://docs.python.org/3.12/whatsnew/3.12.html#pep-684-a-per-interpreter-gil
[4]: https://github.com/python/cpython/issues/117953
[5]: https://tracker.ceph.com/issues/67696
[6]: https://github.com/python/cpython/issues/138045
[7]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907

Actions #2

Updated by Kefu Chai 4 months ago

The RBD Python bindings experience segmentation faults when using progress callbacks (on_progress) with Python 3.13. This issue affects operations like trash_remove_with_progress() and similar APIs that accept callback functions.

Root Cause

The segfault is caused by Python 3.13's implementation of PEP 684 (Per-Interpreter GIL), which introduces stricter sub-interpreter isolation. The problem occurs due to the interaction between Cython's GIL management and callback invocation patterns:

Current Implementation Flow:

  1. Python code passes a callable object as on_progress parameter
  2. Cython releases the GIL: with nogil: (rbd.pyx:906)
  3. Python callable is cast to void* and passed to C++ librbd
  4. C++ code invokes the callback function pointer
  5. Cython callback attempts to re-acquire GIL: with gil: (rbd.pyx:389)
  6. Callback accesses the Python object: (<object>ptr)(offset, total)

Why It Crashes on Python 3.13:

  • Python 3.13's per-interpreter GIL creates isolated interpreter states
  • When the callback re-acquires the GIL, the interpreter context may be incompatible with the Python object being accessed
  • Internal Python functions like _Py_dict_lookup expect per-interpreter data structures to be valid
  • Accessing Python objects from an incompatible interpreter context corrupts internal state and causes segfaults

Affected Code Locations

src/pybind/rbd/rbd.pyx:


  # Line 389-390: Callback definition                                                                                          
  cdef int progress_callback(uint64_t offset, uint64_t total, void* ptr) with gil:
      return (<object>ptr)(offset, total)

  # Lines 903-908: Callback registration
  if on_progress:                                                                                                                                                                                                                                              
      _prog_cb = &progress_callback
      _prog_arg = <void *>on_progress  # Python object as void*                                                                                                                                                                                                
  with nogil:                                  
      ret = rbd_trash_remove_with_progress(_ioctx, _image_id, _force,
                                           _prog_cb, _prog_arg)

Other affected functions:

  • RBD.trash_remove() (line 904)
  • RBD.trash_move() (line 796)
  • RBD.trash_purge() (line 1145, 1172, 1199)
  • Image.copy() (line 4335)
  • And several other operations with progress callbacks

Current Status (as of 2025-01-27)

Main branch still suffers from this issue. No fixes have been merged to address Python 3.13 compatibility for callbacks. A search through commits since 2024 shows:

Actions #3

Updated by Kefu Chai 4 months ago

with https://github.com/ceph/ceph/pull/66244, we should be able to workaround this issue by running mgr modules (like rbd_support) in the main interpreter, and the callback also executes in the main interpreter's context.

Actions #4

Updated by Kefu Chai 4 months ago · Edited

  • Status changed from New to Fix Under Review
  • Pull request ID set to 66446
Actions #5

Updated by Kefu Chai 4 months ago

  • Related to Bug #73857: rbd mirror snapshot hang/failure on rocky10 added
Actions

Also available in: Atom PDF