Project

General

Profile

Actions

Bug #64213

closed

MGR modules incompatible with later PyO3 versions - PyO3 modules may only be initialized once per interpreter process

Added by Chris Palmer about 2 years ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Category:
build
Target version:
% Done:

100%

Source:
Community (user)
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Tags (freeform):
Fixed In:
v20.3.0-1761-g7094a5a44d
Released In:
Upkeep Timestamp:
2025-08-12T18:23:46+00:00

Description

Many MGR modules cannot be used on platforms with later versions of PyO3. The error message

PyO3 modules may only be initialized once per interpreter process

is issued in many places, including (but not limited to) the dashboard, and any TLS communication.

This occurs in Debian 12 (bookworm), but has been noted in other distributions too.

An example crash is:

$ ceph crash info 2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/cephadm/__init__.py\", line 1, in <module>\n    from .module import CephadmOrchestrator",
        "  File \"/usr/share/ceph/mgr/cephadm/module.py\", line 15, in <module>\n    from cephadm.service_discovery import ServiceDiscovery",
        "  File \"/usr/share/ceph/mgr/cephadm/service_discovery.py\", line 20, in <module>\n    from cephadm.ssl_cert_utils import SSLCerts",
        "  File \"/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py\", line 8, in <module>\n    from cryptography import x509",
        "  File \"/lib/python3/dist-packages/cryptography/x509/__init__.py\", line 6, in <module>\n    from cryptography.x509 import certificate_transparency",
        "  File \"/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py\", line 10, in <module>\n    from cryptography.hazmat.bindings._rust import x509 as rust_x509",
        "ImportError: PyO3 modules may only be initialized once per interpreter process" 
    ],
    "ceph_version": "18.2.1",
    "crash_id": "2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52",
    "entity_name": "mgr.xxxxx01",
    "mgr_module": "cephadm",
    "mgr_module_caller": "PyModule::load_subclass_of",
    "mgr_python_exception": "ImportError",
    "os_id": "12",
    "os_name": "Debian GNU/Linux 12 (bookworm)",
    "os_version": "12 (bookworm)",
    "os_version_id": "12",
    "process_name": "ceph-mgr",
    "stack_sig": "7815ad73ced094695056319d1241bf7847da19b4b0dfee7a216407b59a7e3d84",
    "timestamp": "2024-01-12T11:10:03.938478Z",
    "utsname_hostname": "xxxxx01.xxx.xxx",
    "utsname_machine": "x86_64",
    "utsname_release": "6.1.0-17-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30)" 
}

My understanding of the relevant background is:

  • MGR modules use python subinterpreters for isolation between modules.
  • Several modules (including but not limited to dashboard & restful) use python3-cryptography for hashing and TLS (and possibly other things).
  • python3-cryptography delegates some crypto functions to Rust functions. These include bcrypt and TLS-related functions.
  • python3-cryptography uses PyO3 to invoke Rust functions.
  • PyO3 does not support being used by subinterpreters. In the past this has been allowed but was actually unsafe. Now PyO3 throws an exception when it detects multiple initialisations.

So it appears that the MGR use of these functions has always been unsafe, and is now forbidden.

PR54710 identified that the code necessary for the bcrypt hashing used during authentication could easily be written in a small amount of native python, thus avoiding the whole PyO3 area altogether.

However there was a note in the discussions that you also had to disable TLS. And it only applied to the dashboard. My stacktrace above shows the exception during TLS initialisation.

As PyO3 updates are adopted in other linux distributions and containers this is likely to break a number of MGR modules. As there does not seem to be any subinterpreter support in PyO3 coming soon, the only option may be to completely eliminate use of python3-cryptopgraphy from all MGR modules. (It is possible MGR modules may also use other python3 modules that use PyO3 to invoke Rust).

Unfortunately for us, we didn't find this until we had upgraded all MONs in a cluster to reef, at which point we can't downgrade them to quincy. And we can't upgrade the MGR. As a temporary measure (this cluster had MON/MGR/MDS/RGW colocated on 2 hosts) we have added another bookworm host running a reef MON to ensure we can maintain quorum. We are not sure whether it is safe to upgrade the other components (OSD, MDS, RGW) while the MGR remains at quincy,

This has been discussed on ceph-users, and the postings contain links to several other sources of information. Only Problem 2 referred to on that thread is relevant to this bug:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VEN3IU53ZVU343S3U25QKKPCOER4X7AG/#53P4RTSCPPHQYIEISBBAEJXQJNEQSWYL

Copied from the issue #63529:

Our users noticed that Ceph's dashboard is broken in Proxmox Virtual Environment 8. On a more closer investigation, this is apparently caused by an import check in Python modules that use PyO3:

> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Traceback (most recent call last):
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/__init__.py", line 60, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from .module import Module, StandbyModule  # noqa: F401
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/module.py", line 30, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from .controllers import Router, json_error_page
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 1, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from ._api_router import APIRouter
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/controllers/_api_router.py", line 1, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from ._router import Router
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/controllers/_router.py", line 7, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from ._base_controller import BaseController
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 11, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from ..services.auth import AuthManager, JwtManager
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/usr/share/ceph/mgr/dashboard/services/auth.py", line 12, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     import jwt
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/jwt/__init__.py", line 1, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from .api_jwk import PyJWK, PyJWKSet
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/jwt/api_jwk.py", line 6, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from .algorithms import get_default_algorithms
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/jwt/algorithms.py", line 6, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from .utils import (
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/jwt/utils.py", line 7, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/ec.py", line 11, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from cryptography.hazmat._oid import ObjectIdentifier
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File "/lib/python3/dist-packages/cryptography/hazmat/_oid.py", line 7, in <module>
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:     from cryptography.hazmat.bindings._rust import (
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: ImportError: PyO3 modules may only be initialized once per interpreter process
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Class not found in module 'dashboard'
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Error loading module 'dashboard': (2) No such file or directory
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.470+0200 7fecdc91e000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 7fecdc91e000 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
> Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 7fecdc91e000 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: dashboard
> 
This error doesn't just appear in PVE, but also in:

This is due to the fact that every ceph-mgr module that uses Python modules that use PyO3 to provide bindings to Rust code will raise an ImportError if imported more than once under the presence of multiple Python sub-interpreters. This check is present in PyO3 version 0.17.0 and upwards and was introduced in this pull request: https://github.com/PyO3/pyo3/pull/2523

So, for now it seems that this "only" completely breaks the dashboard; however, other ceph-mgr modules might also be affected in the future, depending on whether they use PyO3 Python modules or not.

This also means that the official distribution of Ceph will be affected sooner or later, either when a newer version of Python's cryptography or PyO3 is used.

The above is essentially a summary of my findings and posts over at the Proxmox forum. For the curious, you can read up on everything starting from this post over here: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/post-587100


Subtasks 1 (0 open1 closed)

mgr - Bug #71977: now cryptography version ubuntu 22.04 cause mgr dashboard and cephadm module can't runDuplicate

Actions

Related issues 2 (1 open1 closed)

Related to CI - Bug #66914: start building/testing for Ubuntu 24.04 LTS (Noble Numbat)New

Actions
Has duplicate mgr - Bug #63529: Python Sub-Interpreter Model Used by ceph-mgr Incompatible With Python Modules Based on PyO3Duplicate

Actions
Actions #1

Updated by Matthew Vernon about 2 years ago

https://tracker.ceph.com/issues/63529 is a subset of this issue (relating to the dashboard), and has a fix just for the dashboard committed to main.

Actions #2

Updated by Peter Razumovsky about 2 years ago

centos 9stream is also affected btw, we are affected by this pyo3 import error issue. Subscribing on this issue.

Actions #3

Updated by Chris Palmer about 2 years ago

Interesting... Because of this problem, and the fact that debian-ceph packages are not even tested before release, I am in the middle of moving 3 ceph clusters from debian11/quincy to centos-9-stream/reef. I've done 2/3 clusters so far without encountering this problem. But the only modules I am really using are dashboard and restful, and we do have TLS enabled.

Do you have any more information about which modules are causing the it on centos9? And which version of ceph you are using?

Actions #4

Updated by Ernesto Puerta almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Ernesto Puerta
  • Priority changed from Normal to High
Actions #5

Updated by Nizamudeen A almost 2 years ago

  • Related to Bug #63529: Python Sub-Interpreter Model Used by ceph-mgr Incompatible With Python Modules Based on PyO3 added
Actions #6

Updated by Hector Martin almost 2 years ago

Pretty much all modules import `bcrypt` and cause this issue. `cryptography` was removed in the linked issue but that is not enough, since `bcrypt` triggers it too. This affects Fedora 39/40 too.

Actions #7

Updated by Hector Martin almost 2 years ago

To be clear, the problem is not "cryptography" delegating bcrypt to Rust (cryptography is a problem but its usage was already removed). The problem is the Python module named "bcrypt" itself which is imported in mgr_util.py (and also dashboard/services/access_control.py) and therefore pulled in by ~every single mgr module. I don't really understand why the linked bug was closed, and this one left open, despite the fact that this one confusingly only talks about the former and not the latter.

TL;DR we need to remove bcrypt from mgr_util.py and access_control.py and replace it with something else, whether another implementation or a vendored version of an older `bcrypt` (3.x).

Actions #8

Updated by Niklas Hambuechen over 1 year ago

cryptography is a problem but its usage was already removed

Could somebody elaborate:

https://github.com/ceph/ceph/pull/55689/files removes `PyJWT` from `src/pybind/mgr/dashboard/requirements.txt`, but `cryptography` is still in `src/pybind/mgr/requirements-required.txt`.

So this removal you're referring to is only for the dashboard, and the `cryptography` problem still exists for non-dashboard parts of the mgr, is that correct?

Actions #9

Updated by Iggy Jackson over 1 year ago

Niklas Hambuechen wrote in #note-8:

cryptography is a problem but its usage was already removed

Could somebody elaborate:

https://github.com/ceph/ceph/pull/55689/files removes `PyJWT` from `src/pybind/mgr/dashboard/requirements.txt`, but `cryptography` is still in `src/pybind/mgr/requirements-required.txt`.

Maybe that was just a missed reference?

So this removal you're referring to is only for the dashboard, and the `cryptography` problem still exists for non-dashboard parts of the mgr, is that correct?

It's bcrypt, not cryptography, but yes, it impacts the rest
of the mgr modules

Actions #10

Updated by Niklas Hambuechen over 1 year ago

Iggy Jackson wrote in #note-9:

cryptography is a problem but its usage was already removed

Maybe that was just a missed reference?

The code

src/pybind/mgr/cephadm/ssl_cert_utils.py

still does:

from cryptography import ...
Actions #11

Updated by Ranjan Ghosh about 1 year ago

After updating from Ubuntu 24.04 to 24.10 I get this error for all(?) modules: "13 mgr modules have failed dependencies". I don't understand: How is this possible? No Ceph modules work with the new Ubuntu version? Why did it work before? Is there a workaround/fix? Is there anyone out there who can explain this? What can we expect for the next Ubuntu version (25.04)? Will this version have a Ceph version that works properly?

Actions #12

Updated by Casey Bodley 12 months ago

  • Related to Bug #66914: start building/testing for Ubuntu 24.04 LTS (Noble Numbat) added
Actions #13

Updated by Casey Bodley 12 months ago

@Ernesto Puerta it sounds like this may be a blocker for ubuntu 24.04 support which is now a year overdue. maybe you could share an update on where you think this stands? does it need to be reassigned?

Actions #14

Updated by Ranjan Ghosh 12 months ago

Interestingly, I don't have that problem under Ubuntu 24.04. Only under 24.10.

Actions #15

Updated by Ernesto Puerta 12 months ago

  • Pull request ID set to 61737
Actions #16

Updated by Ernesto Puerta 12 months ago

  • Description updated (diff)
Actions #17

Updated by Ernesto Puerta 12 months ago

  • Related to deleted (Bug #63529: Python Sub-Interpreter Model Used by ceph-mgr Incompatible With Python Modules Based on PyO3)
Actions #18

Updated by Ernesto Puerta 12 months ago

  • Has duplicate Bug #63529: Python Sub-Interpreter Model Used by ceph-mgr Incompatible With Python Modules Based on PyO3 added
Actions #19

Updated by Ernesto Puerta 12 months ago

  • Description updated (diff)
Actions #20

Updated by Ernesto Puerta 12 months ago

  • Description updated (diff)
Actions #21

Updated by Konstantin Shalygin 12 months ago

  • Target version set to v20.0.0
  • Affected Versions v18.2.5 added
  • Affected Versions deleted (v18.2.1)
Actions #22

Updated by John Mulligan 9 months ago

  • Pull request ID changed from 61737 to 62951

I've updated the PR id from the closed one to my open PR. Do note that even when that PR is merged the k8sevents and rook modules do still have the possibility of triggering pyo3 errors.

Actions #23

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 7094a5a44d90e705141dbae9739e6c0835bf7ce3
  • Fixed In set to v20.3.0-1761-g7094a5a44d
  • Upkeep Timestamp set to 2025-07-17T20:41:45+00:00
Actions #24

Updated by Upkeep Bot 7 months ago

  • Status changed from In Progress to Resolved
  • Upkeep Timestamp changed from 2025-07-17T20:41:45+00:00 to 2025-08-12T18:23:46+00:00
Actions #25

Updated by John Mulligan 7 months ago

  • Subtask #71977 added
Actions

Also available in: Atom PDF