Skip to content

mgr/dashboard: Simplify authentication protocol#54710

Merged
nizamial09 merged 3 commits intoceph:mainfrom
kalaspuffar:replace_jwt_with_pure_python
Dec 19, 2023
Merged

mgr/dashboard: Simplify authentication protocol#54710
nizamial09 merged 3 commits intoceph:mainfrom
kalaspuffar:replace_jwt_with_pure_python

Conversation

@kalaspuffar
Copy link
Contributor

Hi Team

I got introduced to the thread mentioned below where Proxmox had issues with upgrades for multiple months going to bookworm. They have packaged Ceph themselves but could not get the JWT implementation in the dashboard to work because of Python Interpreters not liking the rust code that ran under the hood in the crypto library.

From the thread:

To sum the dependencies up:
- The Ceph dashboard uses [PyJWT](https://github.com/jpadilla/pyjwt) for authentication, which is a Python library for[ JSON Web Token](https://datatracker.ietf.org/doc/html/rfc7519).
- PyJWT in turn uses[ cryptography](https://github.com/pyca/cryptography), a Python library for cryptographic primitives.
- A part of cryptography's source is written in Rust, and in order to be able to interoperate with that Rust code, it makes use of[ PyO3](https://github.com/PyO3/pyo3).
- (That Rust code calls OpenSSL among other things, which is written in C, so it really is[ turtles all the way down](https://en.wikipedia.org/wiki/Turtles_all_the_way_down).)

JWT is a pretty simple specification and Ceph dashboard only use one encryption algorithm that is already available in Python so why not replace it with a small implementation of the encode and decode functions.

By removing the dependency to PyJWT we also remove the dependency to the cryptographic library, which in the dashboard module will create a crash. In newer implementations of the library PyO3 is used to run rust code in order to encrypt with Elliptic Curves. This is never used in the dashboard communication so a much simpler implementation where we only use the hmac sha256 algorithm to create the signed JWT message could be used.

Fixes: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371
Impacts: Dashboard
No doc update is appropriate because it's a simplification of code.
Added no extra tests.

@kalaspuffar kalaspuffar requested a review from a team as a code owner November 29, 2023 11:13
@kalaspuffar kalaspuffar requested review from Pegonzal and nizamial09 and removed request for a team November 29, 2023 11:13
@Fabian-Gruenbichler
Copy link
Contributor

issue with more background information: https://tracker.ceph.com/issues/63529

just for the record - we didn't have issues with upgrades, the dashboard is simply not functional with more recent versions of pyo3 (such as the one in bookworm or later). we don't use the dashboard whatsoever. the cephadm component also uses python-cryptography FWIW, although I am not sure whether that part runs in the context of ceph-mgr(-dashboard) or not.

@nizamial09
Copy link
Member

thank you guys for the context here. I'll test this out locally in the coming days.

I have a query here, this looks like an error in the cryptography package. and I have seen some threads saying this might be fixed in the latest cryptography and some indicating that pyo3 itself might be fixing the issue. does that sounds true to you guys since you guys spend more time looking at the error than me?

Also, its possible that other packages can also affected by this and we might not be able to remove those dependencies. But that looks like a discussion for future (very near future) i guess.

@Fabian-Gruenbichler
Copy link
Contributor

proper support for sub interrpeters on the pyo3 side is still far off - currently, the check only got changed from being over-zealous to actually only checking for different sub interpreters. but ceph explicitly uses such sub interpreters to have different contexts for each module, so it also doesn't work with the improved upstream packages. and yes, basically everything using pyo3 (or the cryptography module, which uses pyo3) is affected when using sub interpreters.

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @kalaspuffar for the contribution. I just left some comments around, also for the @ceph/dashboard team to review.

I'm hesitant about this change, as it hard-codes some external dependency inside Dashboard code, which prevents us from benefiting of any future CVE reports and fixes from the original python-JWT project.

Isn't there any move/workaround from the cryptography contributors to cope with this breaking change?

deps =
-rrequirements.txt
-cconstraints.txt
PyJWT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it here because we use it in the QA testing framework. Using it to test the dashboard could be a benefit to verify that the new code is following the specifications. We could change the QA test to use our code and remove this dependency in the future if we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Makes sense. There's a requirements-test.txt file for test-only deps (and we should ensure pinning that to a specific version).

That said, I'd only keep this dep if we actually used in a new test added to tests/test_auth.py in order to validate our custom JWT encoding/decoding with an external implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @epuertat

So if qa/tasks/mgr/dashboard/test_auth.py uses it I'm not allowed to keep that dependency? Then how do I add it so that the test can use the dependency? Does the QA part of the code have a separate dependency file?

Best regards
Daniel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong here, adding it to the requirements-test.txt will also pull these dependencies in our qa suites.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, everything under qa/ dir is integration/e2e (teuthology) testing. Just to verify a codec, it's a total overkill to rely on that (it requires building packages for all supported distros, which takes 2-3 hours, and then deploy a real Ceph cluster).

That would be the perfect example of a unit tests: you feed the function under test with a specific set of inputs, and compare the output provided with the expected output (and you don't even need the jwt module imported there, just to hard-code a list of static inputs vs outputs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat
Copy link
Member

BTW, we should also remove the python-JWT dep from the ceph.spec.in and debian/control.

Copy link
Member

@nizamial09 nizamial09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've tested this with the dashboard and it looks good to me!

As @epuertat mentioned we have these python-jwt (for fedora...) and PyJWT (for suse) dependencies provided in the ceph.spec.in and debian/control, so probably that needs to be removed as well!

@kalaspuffar
Copy link
Contributor Author

So far I've tested this with the dashboard and it looks good to me!

As @epuertat mentioned we have these python-jwt (for fedora...) and PyJWT (for suse) dependencies provided in the ceph.spec.in and debian/control, so probably that needs to be removed as well!

Great, I've made the changes locally and I'm running the tests. Will push when done.

By removing the dependency to PyJWT we also remove the dependency to the cryptographic library which
in the dashboard module will create a crash. In newer implementations of the library PyO3 is used to run
rust code in order to encrypt with Elliptic Curves. This is never used in the dashboard communication so
a much simpler implementation where we only use the hmac sha256 algorithm to create the signed JWT message
could be used.

Fixes: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371
Signed-off-by: Daniel Persson <mailto.woden@gmail.com>
@kalaspuffar kalaspuffar force-pushed the replace_jwt_with_pure_python branch from a8cb685 to c616a9d Compare December 2, 2023 18:31
@kalaspuffar
Copy link
Contributor Author

Hi @epuertat and @nizamial09

I've now made the changes suggested by @epuertat.

Best regards
Daniel

Copy link
Member

@nizamial09 nizamial09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I'll add some more people here to test this out locally. Thank you @kalaspuffar

@nizamial09
Copy link
Member

shaman builds are passing ✔️

@epuertat
Copy link
Member

@kalaspuffar can you plz update the last 2 follow-up commit titles to be consistent with the community guidelines? Thanks!

Move the JWT requirement to the test requirements file. Also remove JWT from ceph specification and debian build.

Signed-off-by: Daniel Persson <mailto.woden@gmail.com>
Seemed that the test dependencies was separated in two different requirements files
one for the testing and one for linting. Added the JWT dependency in the linting file
as well.

Signed-off-by: Daniel Persson <mailto.woden@gmail.com>
@kalaspuffar kalaspuffar force-pushed the replace_jwt_with_pure_python branch from 1eb8ec5 to 06765e6 Compare December 13, 2023 09:46
@kalaspuffar
Copy link
Contributor Author

@kalaspuffar can you plz update the last 2 follow-up commit titles to be consistent with the community guidelines? Thanks!

I hope these amendments to the commit messages are more in line with the guidelines. Another thing we could consider is using the first commits message and squash the other commits into one.

Best regards
Daniel

@epuertat
Copy link
Member

@kalaspuffar can you plz update the last 2 follow-up commit titles to be consistent with the community guidelines? Thanks!

I hope these amendments to the commit messages are more in line with the guidelines. Another thing we could consider is using the first commits message and squash the other commits into one.

Best regards Daniel

Yup, these are ok to me! Thanks @kalaspuffar !

@nizamial09
Copy link
Member

jenkins test windows

@nizamial09 nizamial09 merged commit c52c2dc into ceph:main Dec 19, 2023
@iggy
Copy link

iggy commented Dec 31, 2023

Unfortunately this only fixed the dashboard module. Most of the rest of the modules use bcrypt and fail to load. In particular pg_autoscaler is how I stumbled on the issue. My new cluster has a bunch of pools with 1 PG and it seems pretty unhappy in general.

@hswong3i
Copy link
Contributor

Unfortunately this only fixed the dashboard module. Most of the rest of the modules use bcrypt and fail to load. In particular pg_autoscaler is how I stumbled on the issue. My new cluster has a bunch of pools with 1 PG and it seems pretty unhappy in general.

This could related to pyca/bcrypt#694 which I had been reported.

Most likely bcrypt side should already give enough fix, but need PyO3 side for sub-interpreter support (seems no real timeline schedule yet...)

So ideally if ceph-mgr related don't really need bcrypt support we better remove it ;-(

ProxBot pushed a commit to proxmox/ceph that referenced this pull request Jan 15, 2024
After upgrading from PVE 7 to PVE 8, some users noted that the Ceph
Dashboard does not work anymore. [0] A user from our community
provided a pull request [1] which removes a dependency to `PyJWT`
(Python). This commit adds a backport of this PR as a single patch.

This patch by itself however does not yet allow the dashboard to run
with TLS enabled.

[0]: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/
[1]: ceph/ceph#54710

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
@sebastian-philipp
Copy link
Contributor

Uh, fun with sub-interpreters again. Seems really to be a never ending source of joy

@linas
Copy link

linas commented Dec 1, 2024

Meta: it's now December 2024, ceph-dashboard in both Debian Stable and Debian Testing still don't work. Here's the status:

This is, by the way, a blocker: the ceph dsahboard (in reef) cannot be installed. Period, End of story.

I currently see five alternatives:

  • Get the patch here: zer0def/pyo3@eba5243 applied to PyO3. This patch allows "unsafe operation" of PyO3, and degrades this bug from being a blocker to merely being a "critical" bug.
  • Nag Debian maintainers to revert to earlier versions of python/crypt. I expect strong pushback against this suggestion.
  • Nag the python crypt maintainer to revert whatever change he did to trigger this bug. He already clear expressed no, and that he's OK if this stays broken forever. Strange attitude for the software world, but crypto people think like that.
  • Nag Ceph people to not use subinterpreters. Also seems unlikely, given visible discussions.
  • Find someone who knows both python internals, and also Rust, and have them fix the PyO3 module. Seems impossible, since I am not a corporate manager with a discretionary budget I can spend on hiring developers to fix critical bugs.

I conclude the only viable solution is the first bullet: Is there anything else?

I received this comment:

Subinterpreters have a long and checkered history in Python and projects that consume them ultimately need to either minimize their deps to only those who explicitly claim support or work with upstreams to create such support.

Based on this comment, I conclude the use of python subinterpreters in Ceph was a design flaw.

@kalaspuffar
Copy link
Contributor Author

Hi @linas

I've noticed this issue and have it in our environment today. So, there is no way we can upgrade our production environment. But we run our development environment on this code, so I have the problem in question.

I wanted to discuss this issue with the team during Cephalocon this week which I will attend, hope to see you all there.

But my main points are that this code is either unnecessary or something we need to rebuild to be in line with the community.

When I last reviewed the code, we had added new code, so the crypto library was not only used for the login function, which I rewrote because it did not need that library. But now we have added nice-to-have functionality to create and verify SSL certificates in Python code. That is not something that is strictly necessary, but it is great to simplify things for the customer. These use the library OpenSSL and have recently started to use the problematic code from cryptography / PyO3.

I see three different approaches we could pursue.

  • Stop using sub-interpreters for Python modules
  • I can rebuild the code to create and validate SSL certificates without using the OpenSSL library with the risk that someone will reintroduce these libraries in the future.
  • We can remove the problematic functionality and document the reason. Customers that need certificates can create them locally and there is already functionality in other tools to do this without having it built into the modules. Verification is not strictly required. It's up to the customer to handle these, but adding this functionality without OpenSSL should be doable in the future as the format is known.

I'm looking forward to seeing you soon.

Best regards
Daniel

@linas
Copy link

linas commented Dec 2, 2024

Hi Daniel,

I will not be at Cephalocon; I'm just an ordinary private user, running ceph for myself, and not for some company. Have no money for travel.

  • Creating self-signed certs using external tools is not hard. I did that decades ago. The issue is that, when I go to a search engine, and type in "how to set up ceph-dashboard", I end up at websites that don't explain how to import external certificates. Fixing documentation is hard, because it takes a year or two to percolate to various stack-exchange and reddit and proxmox discussion groups which illustrate actual issues with actual users, and provide, explain, detail these confusions.
  • Although ceph-dashboard is the most severely affected by the PyO3 issue, it is clear that most or all other ceph components are generating error logs with this error, even if their day-to-day operation is minimally impacted. Search engines show dozens of discussions on this issue.
  • The correct long-term fix is to fix PyO3. This can be done by senior programming staff. Based on the discussions, it appears that a junior programmer already attempted this fix, and got overwhelmed. I've dealt with python internals in the past, and recognize the threading/locking function names. Python-to-C internals are quite nasty and recalcitrant. I have personally watched 3 or 4 other junior developers get completely overwhelmed by this stuff, and I had to step in and fix it myself. I presume the PyO3 issue is similar. I know I could fix this; I estimate one week to maybe one month, full time. A few days to set up the environment; a few days to set up rust test cases. A few days to try various fixes, a few days to get them pushed into appropriate git repos. This assumes that no major redesign of PyO3 is required. Will I do this? No. Why? Sadly, my primary desktop has died, and my wife told me to stop spending money. That and 1001 other projects.
  • I have absolutely no idea how hard it would be to not use python sub-interpreters in ceph. It sounds like they are being used due to some kind of security domain isolation between different threads in the same process. Could this be fixed by single-threading, completely shutting down python when moving to another security domain? Is it even safe to have different security domains in different sub-interpreters? Does the use of sub-interpreters in ceph need a design review? Probably. But this is unknown to me. It could be as simple as changing a few dozen lines of code in an afternoon, or as complicated as some major restructuring that takes months. "Been there, done that."

@Aequitosh
Copy link
Contributor

Hey there,

just wanted to weigh in here regarding the Python sub-interpreter issue and perhaps clarify a few things -- I made that tracking issue PyO3/pyo3#3451 a while ago, and yes, unfortunately I've been getting delayed and delayed again due to issues I won't elaborate on right now. But nevertheless, this will move forward eventually.

Though, I want to clarify, I'm not a maintainer of PyO3 (and neither did I get overwhelmed 😉) -- one of the reasons why the RFC PR PyO3/pyo3#4162 I had submitted didn't see any more progress is because the PyO3 maintainers focused their efforts on bringing freethreading support to PyO3 (see e.g. PyO3/pyo3#4588) and I had to attend to other matters in the meantime as well. So, since both the maintainers and I were busy elsewhere, naturally there was no movement forward on the whole sub-interpreter ordeal.

With all that being said, I'm in the process of rebasing my PyO3 branch. But I'd rather not over-promise anything and instead soon deliver something more concrete and lay out a path forward.

@epuertat
Copy link
Member

epuertat commented Dec 9, 2024

@linas @kalaspuffar :

  • removing sub-interpreter support in Ceph is a non-trivial task: it might require rewriting ~33% of the ceph-mgr C++/CPython code (20 kLoC). It's shocking to read that sub-interpreters was such a wrong choice when one of the biggest things to come in Python 3.13-3.14 is... bringing sub-interpreters to the standard library.
  • AFAIK, cryptography, bcrypt, etc., are used very scarcely, for rather ad-hoc tasks (creating certificates, hashes, etc.). One proposal made was to move all these methods to a separate (new) mgr module (running on its own sub-interpreter) and invoke the crypto calls from there (through the ceph-mgr remote() call). This might work as long as no 3rd-party/transitive dependency of each mgr module relies in turn on cryptography.

@kalaspuffar
Copy link
Contributor Author

Hi @epuertat

We talked about it at Cephalocon and did not really come to any conclusions, but I think either a new module or perhaps moving the functionality into the RestAPI module, and then anyone needing that functionality could just make a call to the already established APIs to do those kinds of workloads.

Best regards
Daniel

Aequitosh added a commit to Aequitosh/ceph that referenced this pull request Sep 12, 2025
This is a backport of ceph#54710 which
fixes the Ceph Dashboard not being able to launch on Ceph Reef running
on Debian Bookworm.

This is achieved by removing the dependency on `PyJWT` (Python) and thus
transitively also removing the dependency on `cryptography` (Python).
For more information, see the original pull request.

Note that the Ceph Dashboard still cannot be used if TLS is activated,
because `pyOpenSSL` is used to verify certs during launch. Disabling
TLS via `ceph config set mgr mgr/dashboard/ssl false` and using e.g.
a reverse proxy can be used as a workaround.

A separate patch is required to allow the dashboard to run with TLS
enabled.

Fixes: https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371
Signed-off-by: Daniel Persson <mailto.woden@gmail.com>
Signed-off-by: Max Carrara <m.carrara@proxmox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants