Skip to content

feat: Add support for SSL cert modification#697

Draft
johnramsden wants to merge 5 commits intocanonical:mainfrom
johnramsden:john/CEPH-1162-ssl
Draft

feat: Add support for SSL cert modification#697
johnramsden wants to merge 5 commits intocanonical:mainfrom
johnramsden:john/CEPH-1162-ssl

Conversation

@johnramsden
Copy link
Member

@johnramsden johnramsden commented Mar 21, 2026

Description

Add support for SSL cert modification.

Currently there is no proper way to update certificates. This adds support for updating the existing certificates.

The design includes a new certificates subcommand leaving room for adding additional certificate types. The alternative would be moving the functionality under "enable rgw".

Unfortunately there is no current way of actually reloading certificates live. SIGHUP has no effect. There is a ssl_reload option for beast but it is only available in tentacle ceph/ceph#65842.

Note: Once we have migrated to tentacle, we can add functionality that supports auto reload. I propose what we should do here is have a reload occurring every 24 hours and issue a warning saying certificates will the updated in the background every 24 hours if the user does not issue the restart flag. I already mapped out this functionality once, it is fairly easy to add it is just not currently supported and the reload flag does nothing. Ill open an issue after merge.

Fixes #421

Type of change

Delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Documentation update (change to documentation only)

How has this been tested?

Added unit and integration tests verifying:

  • Certificate and key files are written with correct content and 0600 permissions
  • Command is rejected when RGW is not running
  • Command is rejected when RGW was not configured with SSL
  • Invalid base64 certificate or private key is rejected with descriptive errors
  • --restart causes immediate certificate pickup
  • Without --restart, the old certificate continues to be served until manual restart
  • --target rotates the certificate on a specific node (single-node self-target and
    multi-node cross-node)

Contributor checklist

Please check that you have:

  • self-reviewed the code in this PR
  • added code comments, particularly in less straightforward areas
  • checked and added or updated relevant documentation
  • checked and added or updated relevant release notes
  • added tests to verify effectiveness of this change

johnramsden and others added 5 commits March 17, 2026 16:40
Adds a scheduled GitHub Actions workflow that queries recent job results
and posts a failure rate report to a designated GitHub issue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: John Ramsden <john.ramsden@canonical.com>
# Description

When an operator attempts to do something before the cluster is up they
can receive unexpected failures because bootstrap is not finished or
microcluster is not yet available. This can be particularly problematic
in CI or scripting.

Add an additional subcommand (similar to lxd waitready)
https://manpages.debian.org/unstable/lxd/lxd.waitready.1

To confirm the cluster is up we check for the microcluster daemon to be
ready, and for ceph to be ready (ceph -s)

On failure we get a message like the following if we haven't
bootstrapped for example:

```
microceph waitready --timeout 30
Error: ceph not ready: timed out waiting for Ceph to become ready: context deadline exceeded
```

Running the following you should expect it to wait before running
status, and it should succeed

```
sudo microceph cluster bootstrap &
sudo microceph waitready
sudo microceph status
[1] 35966
MicroCeph deployment summary:
- microceph (10.56.203.112) Services: mds, mgr, mon Disks: 0
```

Also add --storage flag:

When --storage is passed, after daemon and monitor readiness, poll until
enough OSDs are up to satisfy pool replication requirements.

The required count is max(pool.Size) across all pools, falling back to
osd_pool_default_size if no pools exist.

Update GetOSDPools to accept a context allowing us to reuse
functionality

Fixes canonical#653 
Fixes: canonical#683

## Type of change

- Bug fix (non-breaking change which fixes an issue)

## How has this been tested?

Added tests demonstrating waiting and timeout prior to bootstrap, and
waiting succeeding post bootstrap.

## Contributor checklist

Please check that you have:

- [x] self-reviewed the code in this PR
- [x] added code comments, particularly in less straightforward areas
- [x] checked and added or updated relevant documentation
- [x] checked and added or updated relevant release notes
- [x] added tests to verify effectiveness of this change

---------

Signed-off-by: John Ramsden <john.ramsden@canonical.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a new CLI command to rotate RGW SSL certificates without needing to
disable and re-enable the service. The command writes the new certificate
and key to disk, and optionally restarts RGW for immediate pickup.

Without --restart, a warning is emitted advising that the service must
be restarted manually for the new certificate to take effect. The command
also validates that RGW was originally configured with SSL before
attempting to write certificates.

Refactors SSL file writing into a shared writeSSLFiles() helper used by
both EnableRGW and UpdateRGWCertificates.

Signed-off-by: John Ramsden <john.ramsden@canonical.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add command reference and how-to guide for rotating RGW TLS
certificates using the new certificate set rgw command.

Signed-off-by: John Ramsden <john.ramsden@canonical.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unit tests for UpdateRGWCertificates (valid certs, invalid cert,
invalid key, RGW not active, SSL not configured) and RestartRGW
(success and failure). Update existing EnableRGW tests for refactored
error messages.

Add integration tests to CI workflow: certificate rotation with
--restart, rotation without restart, manual restart pickup, self-target
rotation, cross-node target rotation, and failure when RGW is not
running.

Signed-off-by: John Ramsden <john.ramsden@canonical.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: John Ramsden <john.ramsden@canonical.com>
@johnramsden johnramsden marked this pull request as ready for review March 21, 2026 02:02
@johnramsden johnramsden marked this pull request as draft March 21, 2026 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding easy certificate rotation for Rados Gateway in combination with LetsEncrypt Certificates

1 participant