Skip to content

python-common/cryptotools: avoid interpreter issues of pyo3 workaround#67904

Open
benaryorg wants to merge 3 commits intoceph:mainfrom
benaryorg:tentacle-mgr-pyo3-fix
Open

python-common/cryptotools: avoid interpreter issues of pyo3 workaround#67904
benaryorg wants to merge 3 commits intoceph:mainfrom
benaryorg:tentacle-mgr-pyo3-fix

Conversation

@benaryorg
Copy link
Copy Markdown
Contributor

@benaryorg benaryorg commented Mar 20, 2026

This code has been tested on tentacle (20.2.0, Python 3.12, with the PyO3 patches).
Specifically this is needed with nixpkgs/NixOS since the Python interpreter in this instance is not on the PATH as it is provided as an absolute path.
However as the commit messages detail this is not the only situation in which issues can arise.
Without this patch the dashboard will not function as it will fail to find python3.
Other components using the cryptotools may be affected too.

This PR is split into two parts; the first part being the minimum fix for the code to call the correct interpreter.
The second part is a larger rewrite which removes the custom JSON error passing and the subprocess invocation, restores regular exception handling and generally handles things via provided Python facilities.

The second part is technically optional, but I was already rewriting the code to use multiprocessing and only found the actual bug preventing the interpreter from being properly accessible to the code through that detour.
Both parts have been tested individually, so the first part will run without the second.
If the project decides to only merge the first part that's fine by me, however I encourage you to consider the surface of the currently-merged-but-not-released code given, just as an example, the incredibly broad way exceptions have to be caught or the security concerns raised by the code in question.
Both of these issues are already solved by multiprocessing, along with some other features which may benefit further development or maintenance in the future.

The clearest indicator of why this is a good idea is probably the diff stat of the last commit:

src/python-common/ceph/cryptotools/cryptotools.py | 135 --------------
src/python-common/ceph/cryptotools/remote.py      | 210 ++++++----------------
src/python-common/ceph/cryptotools/select.py      |   4 +-
3 files changed, 55 insertions(+), 294 deletions(-)

Either way, backporting this to tentacle before the next release would be great, given that the current PyO3 workarounds are not actually contained in the latest release, this'd avoid anyone running into these issues in the first place (other than those who backported the patches on their own of course).

Checklist

Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@benaryorg benaryorg requested a review from a team as a code owner March 20, 2026 04:49
@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

14 similar comments
@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Thank you for your contribution. Since you, the author, are not a member of the Ceph GitHub Org yet, our CI will not automatically run. Any member of the Ceph Org may comment "ok to test" to allow the Jenkins jobs to run.

The macro actually created a wide string with the *name* of the constant, not its content.
Thus Python failed to find its interpreter instead using `stat(2)` on every element in *PATH* such as `/usr/bin/MGR_PYTHON_EXECUTABLE`.
As Python also offers a direct way to hand over the C-string, we can let Python handle the conversion directly, getting rid of the macro entirely.

Signed-off-by: benaryorg <binary@benary.org>
Python has support for multiple installed versions and has had so for a long time.
The Python version that *ceph-mgr* (for instance) was compiled against may not be the default *python3* from *PATH*.
This is the case when testing against a different version, but it is also the case on systems which do not have Python installed globally, or ship the Python version used by Ceph separately (to avoid compatibility issues for instance).

Luckily *ceph-mgr* has access to the exact Python version and interpreter via CMake already, and it passes this through to Python via the *program_name* setting.
This sets the interpreter appropriately and allows us to use that instead.

Fixes: https://tracker.ceph.com/issues/75768
Signed-off-by: benaryorg <binary@benary.org>
@benaryorg benaryorg force-pushed the tentacle-mgr-pyo3-fix branch from 0654f91 to ee71018 Compare March 28, 2026 04:40
@benaryorg
Copy link
Copy Markdown
Contributor Author

Minor rewriting to get the typing to check out (except for the one place where even upstream says it likely will never do so), still running the tests (i.e. seeing if the dashboard comes up), then I'll push the fixed version.

Python has support for multiple versions being installed in parallel, as well as there being no guarantee for *python3* to be on the *PATH*.
For instance when testing the same codebase against two different Python versions one might want to specify a different non-default Python 3.
Determining the interpreter and all required parameters to run that interpreter is non-trivial.
Instead the built-in Python *multiprocessing* library can be used to run Python code in a separate process taking care of those parameters by itself.
There are three modes this can operate in; *fork* is the one we explicitly do not want since it could potentially taint the new process.
The other two are *forkserver*, which is to *spawn* as FastCGI is to CGI, basically.
Since there is only one global proxy object in use at any time, we can use spawn to be on the safe side and get a completely fresh interpreter, while still benefitting from only running a single process which does all the cryptography.

This removes a large chunk of the previous code, including a non-negligible amount of JSON parsing and error handling which is now handled using Python internal pickle mechanics, ensuring that exceptions are raised without any special handling.
Effectively both versions; "internal" and "remote" now have the exact same interface, types, etc., without any code in need of maintenance by Ceph.

Signed-off-by: benaryorg <binary@benary.org>
@benaryorg benaryorg force-pushed the tentacle-mgr-pyo3-fix branch from ee71018 to 47b9e5f Compare March 28, 2026 18:09
@benaryorg
Copy link
Copy Markdown
Contributor Author

Note that the failing dashboard tests seem to be unrelated and also failing on main.
NixOS does have some (yet to be merged) testing for the dashboard which does pass with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants