Bug #72713
openSIGSEGV when ceph-csi-rbd driver schedules RBD volume deletion
0%
Description
Some of our users had reported that all of their MGRs segfault [1] [2].
It turned out that what these users have in common is that they're using Kubernetes with Ceph as storage via the ceph-csi-rbd and ceph-csi-cephfs drivers [3].
Was able to reproduce this on a fresh Ceph 19.2.3 cluster on top of Proxmox VE 9, with a separate 3-node Kubernetes cluster using ceph-csi-rbd as driver. That's 6 hosts in total; all of them virtualized. Since the ceph-csi-rbd and ceph-csi-cephfs drivers are just clients, I figured it's better to report this here.
Here are my findings so far:
- The MGR seems to segfault quite consistently when removing an image from the trash that was previously provisioned via
ceph-csi-rbd. - Removing the image manually via
rbd trash remove—or if that fails, viarbd trash purge—and then bringing the MGRs up again withsystemctl reset-failed && systemctl restart ceph-mgr.targetseems to temporarily fix the issue.- So, as long as the "faulty" volume remains in the trash, all MGRs in the cluster will segfault immediately after starting, right when attempting to remove the volume from the trash.
- The issue remains fixed until a volume provisioned by the
ceph-csi-rbddriver ends up in the trash and the MGR attempts to remove it again. - On occasion, the MDS seems to receive a SIGABRT as well when that happens, so it's possible that this is related to cephfs as well.
Here's an excerpt from a recent crash:
-20> 2025-08-25T11:25:36.976+0200 7f9321e706c0 5 librbd::ManagedLock: 0x5946fcba81b8 handle_acquire_lock: successfully acquired exclusive lock
-19> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: tick
-18> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_tickets
-17> 2025-08-25T11:25:36.993+0200 7f932af9a6c0 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2025-08-25T11:25:06.994275+0200)
-16> 2025-08-25T11:25:37.006+0200 7f932e7a16c0 0 log_channel(cluster) log [DBG] : pgmap v10: 417 pgs: 416 active+clean, 1 unknown; 6.3 MiB data, 32 GiB used, 320 GiB / 352 GiB avail; 204 B/s rd, 0 op/s
-15> 2025-08-25T11:25:37.006+0200 7f932e7a16c0 10 monclient: _send_mon_message to mon.ceph-test-01 at v2:172.16.64.221:3300/0
-14> 2025-08-25T11:25:37.010+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_exclusive_lock: r=0
-13> 2025-08-25T11:25:37.010+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 validate_image_removal:
-12> 2025-08-25T11:25:37.010+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_snaps:
-11> 2025-08-25T11:25:37.010+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 list_image_watchers:
-10> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::Watcher: 0x5946fc669500 notifications_blocked: blocked=0
-9> 2025-08-25T11:25:37.011+0200 7f9321e706c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_list_image_watchers: r=0
-8> 2025-08-25T11:25:37.011+0200 7f9321e706c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_image_watchers:
-7> 2025-08-25T11:25:37.011+0200 7f9321e706c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 check_group:
-6> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 handle_check_group: r=0
-5> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::image::PreRemoveRequest: 0x5946fd0d4480 finish: r=0
-4> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::image::RemoveRequest: 0x5946f62b0000 handle_pre_remove_image: r=0
-3> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::TrimRequest: 0x5946fbd0b480 send_pre_trim: delete_start_min=0 num_objects=512
-2> 2025-08-25T11:25:37.011+0200 7f932166f6c0 5 librbd::TrimRequest: 0x5946fbd0b480 send_remove_objects: delete_start=0 num_objects=512
-1> 2025-08-25T11:25:37.012+0200 7f932166f6c0 0 [progress INFO root] update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash)
0> 2025-08-25T11:25:37.013+0200 7f932166f6c0 -1 *** Caught signal (Segmentation fault) **
in thread 7f932166f6c0 thread_name:io_context_pool
ceph version 19.2.3 (bfe79fc8ee46f629d9ce4db0a202f0f9c0a94ac7) squid (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x7f9348c49df0]
2: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1598b0) [0x7f934a3598b0]
3: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a1843) [0x7f934a3a1843]
4: _PyType_LookupRef()
5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a216b) [0x7f934a3a216b]
6: PyObject_GetAttr()
7: _PyEval_EvalFrameDefault()
8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1109dd) [0x7f934a3109dd]
9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x3d3442) [0x7f934a5d3442]
10: /lib/python3/dist-packages/rbd.cpython-313-x86_64-linux-gnu.so(+0xacfed) [0x7f9336b4ffed]
11: /lib/librbd.so.1(+0x3cc8af) [0x7f93363cc8af]
12: /lib/librbd.so.1(+0x3ccfed) [0x7f93363ccfed]
13: /lib/librbd.so.1(+0x3afec6) [0x7f93363afec6]
14: /lib/librbd.so.1(+0x3b0560) [0x7f93363b0560]
15: /lib/librbd.so.1(+0x2cac93) [0x7f93362cac93]
16: /lib/librbd.so.1(+0x12e7bd) [0x7f933612e7bd]
17: /lib/librbd.so.1(+0x2b1c9e) [0x7f93362b1c9e]
18: /lib/librbd.so.1(+0x2b4379) [0x7f93362b4379]
19: /lib/librados.so.2(+0xd2716) [0x7f9348ae4716]
20: /lib/librados.so.2(+0xd3705) [0x7f9348ae5705]
21: /lib/librados.so.2(+0xd3f8a) [0x7f9348ae5f8a]
22: /lib/librados.so.2(+0xea598) [0x7f9348afc598]
23: /lib/librados.so.2(+0xd7a71) [0x7f9348ae9a71]
24: /lib/librados.so.2(+0xedf63) [0x7f9348afff63]
25: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xe1224) [0x7f9348ee1224]
26: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7f9348c9cb7b]
27: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x7f9348d1a7b8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Note the line with update: starting ev e7f59634-825f-4796-a4c3-7a8a8c058443 (Removing image k8s-rbd/8d3016c14b061 from trash).
I've rebuilt Ceph 19.2.3 locally and managed to make it retain its dbgsyms with export DEB_BUILD_OPTIONS="nostrip". That way I was able to gather a bunch of coredumps via coredumpctl. I've extracted the backtraces and attached the most recent ones as files. I've also attached all recent crash dumps (produced by ceph-crash). Note that the crash dumps might all be quite similar; the respective ceph-mgr systemd unit restarts a couple times on exit before giving up.
Doing some debugging myself, it seems that a callback passed to trash_remove in src/pybind/rbd/rbd.pyx [4] might be the cause here, with progress_callback in src/pybind/mgr/rbd_support/task.py [5] being the callback used.
I'll see if I can dig up more, but that's all I've found for now. If there's anything specific you'd like me to try or test, please let me know!
[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=6635
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/
[3]: https://github.com/ceph/ceph-csi
[4]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907
[5]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/mgr/rbd_support/task.py#L458-L480
Files
Updated by Max Carrara 6 months ago
Short update: I've managed to provide a workaround for this bug on our end [1].
tl;dr: Disabling the on_progress callbacks prevents the collective segfaults. Instead of the passed callback, the default no-op callback is used instead.
I'll paraphrase what I mentioned in the workaround patch here:
I have a very strong suspicion that this might be related to Python sub-interpreters (yet again). To paraphrase myself from the workaround patch [1], I believe that the internal changes to Python sub-interpreters in Python 3.12 and 3.13 might be at fault.
What leads me to suspect this are the following three clues:
- A user on our forum reported that the issue vanishes as soon as they set up a Ceph MGR inside a Debian Bookworm VM. That MGR must also be the active one. Bookworm has Python version 3.11, which is the version before any substantial changes to sub-interpreters [3]4 were made.
- There's another bug [5] regarding another segfault during MGR startup. The author concluded that the problem is related to sub-interpreters and opened an issue [6] on Python's issue tracker that goes into more detail. The code path here is completely different, but it shows that problems regarding sub-interpreters are popping up elsewhere at the very least.
- The segfault happens inside the Python interpreter, as can be seen in the first stacktrace of the "ceph-mgr: SIGSEGV" attachment. The
on_progresscallback that the MGR passes through Cython [7] all the way down tolibrbdsegfaults after it is called.
I'll let you know once I found out more.
[1]: https://lore.proxmox.com/pve-devel/20250909170515.606422-1-m.carrara@proxmox.com/
[2]: https://forum.proxmox.com/threads/ceph-managers-seg-faulting-post-upgrade-8-9-upgrade.169363/page-3#post-796315
[3]: https://docs.python.org/3.12/whatsnew/3.12.html#pep-684-a-per-interpreter-gil
[4]: https://github.com/python/cpython/issues/117953
[5]: https://tracker.ceph.com/issues/67696
[6]: https://github.com/python/cpython/issues/138045
[7]: https://github.com/ceph/ceph/blob/c92aebb279828e9c3c1f5d24613efca272649e62/src/pybind/rbd/rbd.pyx#L878-L907
Updated by Kefu Chai 4 months ago
The RBD Python bindings experience segmentation faults when using progress callbacks (on_progress) with Python 3.13. This issue affects operations like trash_remove_with_progress() and similar APIs that accept callback functions.
Root Cause¶
The segfault is caused by Python 3.13's implementation of PEP 684 (Per-Interpreter GIL), which introduces stricter sub-interpreter isolation. The problem occurs due to the interaction between Cython's GIL management and callback invocation patterns:
Current Implementation Flow:¶
- Python code passes a callable object as on_progress parameter
- Cython releases the GIL: with nogil: (rbd.pyx:906)
- Python callable is cast to void* and passed to C++ librbd
- C++ code invokes the callback function pointer
- Cython callback attempts to re-acquire GIL: with gil: (rbd.pyx:389)
- Callback accesses the Python object: (<object>ptr)(offset, total)
Why It Crashes on Python 3.13:¶
- Python 3.13's per-interpreter GIL creates isolated interpreter states
- When the callback re-acquires the GIL, the interpreter context may be incompatible with the Python object being accessed
- Internal Python functions like _Py_dict_lookup expect per-interpreter data structures to be valid
- Accessing Python objects from an incompatible interpreter context corrupts internal state and causes segfaults
Affected Code Locations¶
src/pybind/rbd/rbd.pyx:¶
# Line 389-390: Callback definition
cdef int progress_callback(uint64_t offset, uint64_t total, void* ptr) with gil:
return (<object>ptr)(offset, total)
# Lines 903-908: Callback registration
if on_progress:
_prog_cb = &progress_callback
_prog_arg = <void *>on_progress # Python object as void*
with nogil:
ret = rbd_trash_remove_with_progress(_ioctx, _image_id, _force,
_prog_cb, _prog_arg)
Other affected functions:¶
- RBD.trash_remove() (line 904)
- RBD.trash_move() (line 796)
- RBD.trash_purge() (line 1145, 1172, 1199)
- Image.copy() (line 4335)
- And several other operations with progress callbacks
Current Status (as of 2025-01-27)¶
Main branch still suffers from this issue. No fixes have been merged to address Python 3.13 compatibility for callbacks. A search through commits since 2024 shows:
- No Python 3.13-specific callback fixes
- No PEP 684 compatibility changes
- No sub-interpreter safety improvements for callbacks (https://github.com/ceph/ceph/pull/62951 is specific to PyO3)
Updated by Kefu Chai 4 months ago
with https://github.com/ceph/ceph/pull/66244, we should be able to workaround this issue by running mgr modules (like rbd_support) in the main interpreter, and the callback also executes in the main interpreter's context.
Updated by Kefu Chai 4 months ago
- Related to Bug #73857: rbd mirror snapshot hang/failure on rocky10 added