Project

General

Profile

Actions

Bug #73930

open

ceph-mgr modules rely on deprecated python subinterpreters

Added by Casey Bodley 6 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
tentacle squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

parent tracker for related fixes needed for distros with python 3.12+

This is a very general problem. python3.12 appears to have significantly changed how subinterpreters work. ceph-mgr internally passes objects between subinterpreters freely, which now can cause crashes. Moreover, there are dependencies (py03) which no longer tolerate being imported in a submodule.

We don't yet really know the extent of the problem here. The initial fixes are:
1. https://github.com/ceph/ceph/pull/66244 -- default to running submodules in the main interpreter
2. https://github.com/ceph/ceph/pull/66240 -- serialize objects passed between subinterpreters

1 is the important one here, and will be the one we focus on for tentacle. 2 will be a followup to allow us to break certain modules back into subinterpreters with seperate GILs (allowed in 3.12, likely the reason for the change) for better concurrency.

1 has shown that a bunch of modules depend on magic using interpreter-wide globals. We're working on refactoring that to make 1 effective.

Once 1 is working with refactors to avoid globals, testing may turn up further 3.12 related problems to address.


Related issues 18 (13 open5 closed)

Related to RADOS - Bug #73822: Rocky10 - rados/verify - valgrind error: MismatchedFree operator delete[](void*, unsigned long, std::align_val_t) RocksDBStore::close() RocksDBStore::~RocksDBStore() Pending Backport

Actions
Related to Dashboard - Bug #74643: cherrypy.process.wspbus.ChannelFailures: TypeError('certfile should be a valid filesystem path')Pending BackportNizamudeen A

Actions
Blocked by mgr - Bug #73857: rbd mirror snapshot hang/failure on rocky10Pending BackportSamuel Just

Actions
Blocked by mgr - Bug #73859: ceph-mgr: py03 import error on rocky10/python3.12Pending BackportSamuel Just

Actions
Blocked by RADOS - Bug #73750: rados/basic: Segmentation fault during neorados testsPending BackportAdam Emerson

Actions
Blocked by RADOS - Bug #73839: Rocky10 - workunits/rados/version_number_sanity.sh "Unsupported distro ->rocky<-! Bailing out."ResolvedNitzan Mordechai

Actions
Blocked by Orchestrator - Bug #73823: orch/cephadm: nvme-loop task fails on rocky 10Pending Backport

Actions
Blocked by rgw - Bug #73758: rocky 10: test_rgw_datalog.sh fails with segfaultDuplicateAdam Emerson

Actions
Blocked by mgr - Bug #74042: ceph-mgr: modules need independent CLICommand typesResolvedSamuel Just

Actions
Blocked by mgr - Bug #74220: PyGILState_Check failed with 66244 and 66467Pending BackportSamuel Just

Actions
Blocked by mgr - Bug #74543: Rocky10 - AttributeError in dashboard moduleDuplicateSamuel Just

Actions
Blocked by RADOS - Bug #74568: Rocky10 - g++ missing Pending BackportNitzan Mordechai

Actions
Blocked by mgr - Bug #74577: test_iscsi_setup.sh - No such path /iscsi-targets/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/hosts/iqn.1994-05.com.redhat:client1NewRedouane Kachach Elhicou

Actions
Blocked by Infrastructure - Bug #74620: No URLs in mirrorlist when trying to install ceph-testResolvedDavid Galloway

Actions
Blocked by Dashboard - Bug #74848: Rocky10 - mixed module names in log messages after PR #66244Pending BackportNitzan Mordechai

Actions
Blocked by mgr - Bug #74980: Port 7789 still in use, waiting...Pending BackportNizamudeen A

Actions
Blocked by Orchestrator - Bug #74978: mon: Cannot place <MONSpec for service_name=mon> on trial189: Unknown hostsPending BackportAdam King

Actions
Blocked by Ceph - Bug #75282: ceph CLI does not propagate orchestrator error codes - wip-rocky10-branch-of-the-day-2026-02-24-1771941190Pending BackportNitzan Mordechai

Actions
Actions #1

Updated by Casey Bodley 6 months ago

  • Blocked by Bug #73857: rbd mirror snapshot hang/failure on rocky10 added
Actions #2

Updated by Casey Bodley 6 months ago

  • Blocked by Bug #73859: ceph-mgr: py03 import error on rocky10/python3.12 added
Actions #3

Updated by Casey Bodley 6 months ago

  • Backport set to tentacle squid
Actions #4

Updated by Samuel Just 6 months ago

  • Backport deleted (tentacle squid)

This is a very general problem. python3.12 appears to have significantly changed how subinterpreters work. ceph-mgr internally passes objects between subinterpreters freely, which now can cause crashes. Moreover, there are dependencies (py03) which no longer tolerate being imported in a submodule.

We don't yet really know the extent of the problem here. The initial fixes are:
1. https://github.com/ceph/ceph/pull/66244 -- default to running submodules in the main interpreter
2. https://github.com/ceph/ceph/pull/66240 -- serialize objects passed between subinterpreters

1 is the important one here, and will be the one we focus on for tentacle. 2 will be a followup to allow us to break certain modules back into subinterpreters with seperate GILs (allowed in 3.12, likely the reason for the change) for better concurrency.

1 has shown that a bunch of modules depend on magic using interpreter-wide globals. We're working on refactoring that to make 1 effective.

Once 1 is working with refactors to avoid globals, testing may turn up further 3.12 related problems to address.

Actions #5

Updated by Samuel Just 6 months ago

  • Description updated (diff)
  • Backport set to tentacle, squid
Actions #6

Updated by Samuel Just 6 months ago

  • Backport changed from tentacle, squid to tentacle squid
Actions #7

Updated by Yaarit Hatuka 6 months ago

  • Related to Bug #73822: Rocky10 - rados/verify - valgrind error: MismatchedFree operator delete[](void*, unsigned long, std::align_val_t) RocksDBStore::close() RocksDBStore::~RocksDBStore() added
Actions #8

Updated by Yaarit Hatuka 6 months ago

  • Blocked by Bug #73750: rados/basic: Segmentation fault during neorados tests added
Actions #9

Updated by Yaarit Hatuka 6 months ago

  • Blocked by Bug #73839: Rocky10 - workunits/rados/version_number_sanity.sh "Unsupported distro ->rocky<-! Bailing out." added
Actions #10

Updated by Yaarit Hatuka 6 months ago

  • Blocked by Bug #73823: orch/cephadm: nvme-loop task fails on rocky 10 added
Actions #11

Updated by Yaarit Hatuka 6 months ago

  • Blocked by Bug #73758: rocky 10: test_rgw_datalog.sh fails with segfault added
Actions #12

Updated by Yaarit Hatuka 5 months ago

  • Blocked by Bug #74042: ceph-mgr: modules need independent CLICommand types added
Actions #13

Updated by Yaarit Hatuka 5 months ago

  • Blocked by Bug #74220: PyGILState_Check failed with 66244 and 66467 added
Actions #14

Updated by Yaarit Hatuka 3 months ago

  • Blocked by Bug #74543: Rocky10 - AttributeError in dashboard module added
Actions #15

Updated by Yaarit Hatuka 3 months ago

  • Blocked by Bug #74568: Rocky10 - g++ missing added
Actions #16

Updated by Yaarit Hatuka 3 months ago

  • Blocked by Bug #74577: test_iscsi_setup.sh - No such path /iscsi-targets/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw/hosts/iqn.1994-05.com.redhat:client1 added
Actions #17

Updated by Yaarit Hatuka 3 months ago

  • Blocked by Bug #74620: No URLs in mirrorlist when trying to install ceph-test added
Actions #18

Updated by Yaarit Hatuka 3 months ago

  • Blocked by Bug #74848: Rocky10 - mixed module names in log messages after PR #66244 added
Actions #19

Updated by Nitzan Mordechai 3 months ago

  • Blocked by Bug #74980: Port 7789 still in use, waiting... added
Actions #20

Updated by Nitzan Mordechai 2 months ago

  • Blocked by Bug #74978: mon: Cannot place <MONSpec for service_name=mon> on trial189: Unknown hosts added
Actions #21

Updated by Nizamudeen A 2 months ago

  • Related to Bug #74643: cherrypy.process.wspbus.ChannelFailures: TypeError('certfile should be a valid filesystem path') added
Actions #22

Updated by Nitzan Mordechai 2 months ago

  • Blocked by Bug #75282: ceph CLI does not propagate orchestrator error codes - wip-rocky10-branch-of-the-day-2026-02-24-1771941190 added
Actions

Also available in: Atom PDF