Skip to content

mgr/cephadm: adding mTLS for ceph mgmt-gateway and backend services communication#58402

Merged
adk3798 merged 5 commits intoceph:mainfrom
rkachach:fix_issue_mtls_support
Aug 2, 2024
Merged

mgr/cephadm: adding mTLS for ceph mgmt-gateway and backend services communication#58402
adk3798 merged 5 commits intoceph:mainfrom
rkachach:fix_issue_mtls_support

Conversation

@rkachach
Copy link
Contributor

@rkachach rkachach commented Jul 2, 2024

This PR builds on top of the PR #57535 which introduced the mgmt-gateway. In this PR mTLS support is introduced. Cephadm is acting as the root CA that generates and signs all the certificates to be used by all the backend applications (dashboard, monitoring, etc).

Changes included in this PR:

  • Introducing cert_mgr new class to centralize certificates management in cephadm
  • Introducing new cephadm command to generate self-signed certs (ceph orch certmgr generate-certificates)
  • Add support for multiple ips/fqdns certificates generation
  • Adding mTLS support for mgmt-gateway and backend applications internal communication
  • Introducing monitoring high availability based on mgmt-gateway service
  • Adding TLS/SSL support for ceph-exporter

Known issues:

  • Grafana doesn't support mTLS (but end-point is protected by basic-auth)

The following diagram represents the legacy Ceph mgmt backend architecture:

The new architecture takes benefit of the mgmt-gateway (introduced by #57535) and adds mTLS to improve the security for internal communications between the nginx reverse proxy and the backend applications (Ceph dashboard, Prometheus, Alertmanager, etc..). Direct connections between most of the applications are replaced by requests that are routed through the nginx reverse proxy to an upstream that represents the new end-point of the corresponding service. This way all the routing goes through the internal server of the nginx and we can benefit from its high availability support.

Note:
Please notice that there're still direct connections between Prometheus and mgr-prometheus module and with alertmanager as well. This is because in this case we are using the service-discovery feature and we rely on Prometheus support to handle multiple targets. Following diagram depicts a simplified view of the new architecture:

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rkachach rkachach force-pushed the fix_issue_mtls_support branch 22 times, most recently from 1eb4636 to 2265162 Compare July 3, 2024 17:43
@rkachach rkachach changed the title Fix issue mtls support adding mTLS support for ceph mgmt backend services communicatio Jul 3, 2024
@rkachach rkachach marked this pull request as ready for review July 3, 2024 17:44
@rkachach
Copy link
Contributor Author

jenkins test dashboard

Copy link
Member

@nizamial09 nizamial09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small review on the dashboard code.

Comment on lines +42 to +45
for f in [ca_cert_file, cert_file, key_file]:
if f:
f.close()
os.unlink(f.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicated below as well so shall we have a separate function to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +93 to +95
cert_file = tempfile.NamedTemporaryFile(delete=False)
cert_file.write(cert.encode('utf-8'))
cert_file.flush() # cert_tmp must not be gc'ed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably this could be also in its own function as its used repeatedely the same way in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@rkachach rkachach force-pushed the fix_issue_mtls_support branch from 109ca08 to 97dcd81 Compare July 11, 2024 11:28
@rkachach rkachach requested a review from nizamial09 July 11, 2024 11:30
@adk3798 adk3798 changed the title adding mTLS for ceph mgmt-gateway and backend services communication mgr/cephadm: adding mTLS for ceph mgmt-gateway and backend services communication Jul 11, 2024
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@rkachach rkachach force-pushed the fix_issue_mtls_support branch from 432f183 to b2f17a3 Compare July 12, 2024 13:51
@rkachach rkachach force-pushed the fix_issue_mtls_support branch 4 times, most recently from 0fc76cf to b351690 Compare July 12, 2024 20:06
rkachach added 2 commits July 31, 2024 08:47
cert_mgr will be the unique responsible of managing all certificates
generated and maintained by cephadm. Cephadm in addition now provides
a new cmd to generate certificates for external modules.

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
this new Cephadm cmd introduces the ability to generate self-signed
certificates for external modules, signed by Cephadm as the root CA.
This feature is essential for implementing mTLS. Previously, if the
user did not provide a certificate and key, the dashboard would
generate its own. With this update, the dashboard now calls Cephadm
to generate self-signed certificates, enabling secure mTLS
communication with other backend applications. Prometheus module
also makes use of this new functionality to generate self-signed
certificates.

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
@rkachach
Copy link
Contributor Author

jenkins test make check

1 similar comment
@rkachach
Copy link
Contributor Author

jenkins test make check

rkachach and others added 2 commits July 31, 2024 19:37
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
This commit adds SSL support to the ceph-exporter deployment
made by cephadm. When `secure_monitoring_stack` is set to `True`,
the `ceph-exporter` container is restarted with SSL enabled.

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
@adk3798
Copy link
Contributor

adk3798 commented Aug 1, 2024

@adk3798
Copy link
Contributor

adk3798 commented Aug 1, 2024

jenkins test make check

@adk3798
Copy link
Contributor

adk3798 commented Aug 1, 2024

jenkins test windows

it seems that with Grafana 10.4.0 the domain parameter is taken into
account while building the final url (earlier versions didn't seem to
behave the same way). This change sets the domain to the hostname where
Grafana daemon is running instead of '*.lab'. serve_from_sub_path is
removed as it's no needed and when add it causes some undesirable
redirections that could break monitoring HA.

Signed-off-by: Redouane Kachach <rkachach@ibm.com>
@rkachach
Copy link
Contributor Author

rkachach commented Aug 1, 2024

jenkins retest this please

@rkachach
Copy link
Contributor Author

rkachach commented Aug 1, 2024

jenkins test rook e2e

@rkachach
Copy link
Contributor Author

rkachach commented Aug 2, 2024

jenkins test dashboard cephadm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants