Skip to content

rgw: beast add ssl hot-reload#65842

Merged
cbodley merged 1 commit intoceph:mainfrom
henrichter:main
Oct 30, 2025
Merged

rgw: beast add ssl hot-reload#65842
cbodley merged 1 commit intoceph:mainfrom
henrichter:main

Conversation

@henrichter
Copy link
Contributor

@henrichter henrichter commented Oct 8, 2025

Adds the ssl_reload config option to the beast frontend.

This sets an interval in seconds to periodically reload the ssl context to pick up changes without restarting. It can be disabled (default) be setting it to 0.

Its implemented slightly different the the previous ssl_short_trust to have less performance impact.

Resolves:

Unblocks:

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@dmick
Copy link
Member

dmick commented Oct 9, 2025

jenkins test make check arm64

@henrichter henrichter changed the title feat(rgw beast): ssl config reload rgw: beast add ssl hot-reload Oct 12, 2025
@henrichter henrichter marked this pull request as ready for review October 12, 2025 19:17
@henrichter henrichter requested review from a team as code owners October 12, 2025 19:17
Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs lgtm

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i like the use of shared_ptr to support multiple ssl contexts so requests don't have to block during the reload 👍

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style/formatting feedback based on https://github.com/ceph/ceph/blob/main/CodingStyle

@henrichter
Copy link
Contributor Author

jenkins test make check

@henrichter
Copy link
Contributor Author

jenkins test make check arm64

@henrichter henrichter requested a review from cbodley October 15, 2025 09:07
@mkogan1 mkogan1 self-requested a review October 16, 2025 12:04
Copy link
Contributor

@mkogan1 mkogan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested working OK - LGTM

@cbodley a note about keep-alive -
configured to rotate every 30 seconds:

./bin/radosgw ...  --rgw_frontends="beast port=8000 tcp_nodelay=1 request_timeout_ms=0 ssl_port=8443 ssl_certificate=./rgw.crt ssl_private_key=./rgw.key  ssl_reload=30" -f

an hsbench workload that takes longer than 30 seconds to complete (like the example below) was not interrupted while running - it was able to complete successfully even that the cert was rotated during it's execution, the next execution iteration failed because of a certificate error as expected

❯ time (nice numactl -N 1 -m 1 -- env SSL_CERT_FILE=./rgw.pem  ~/go/bin/hsbench -a b2345678901234567890 -s b234567890123456789012345678901234567890 -u https://127.0.0.1:8443 -z 4K -d -1 -t $(( $(numactl -N 0 -- nproc) / 2 )) -b $(( $(numactl -N 0 -- nproc) * 2 )) -n 100000 -m cxip -bp b01b- -op 'folder01/stage01_' |& tee hsbench.log | stdbuf -o0 colrm $((${COLUMNS} - 1)) | ccze -Aonolookups)
2025/10/16 12:05:50 Hotsauce S3 Benchmark Version 0.1 
2025/10/16 12:05:50 Parameters: 
2025/10/16 12:05:50 url=https://127.0.0.1:8443 
2025/10/16 12:05:50 object_prefix=folder01/stage01_ 
2025/10/16 12:05:50 bucket_prefix=b01b- 
2025/10/16 12:05:50 region=us-east-1 
2025/10/16 12:05:50 modes=cxip 
2025/10/16 12:05:50 output= 
2025/10/16 12:05:50 json_output= 
2025/10/16 12:05:50 max_keys=1000 
2025/10/16 12:05:50 object_count=100000 
2025/10/16 12:05:50 bucket_count=32 
2025/10/16 12:05:50 duration=-1 
2025/10/16 12:05:50 threads=8 
2025/10/16 12:05:50 loops=1 
2025/10/16 12:05:50 size=4K 
2025/10/16 12:05:50 interval=1.000000 
2025/10/16 12:05:50 Running Loop 0 BUCKET CLEAR TEST 
2025/10/16 12:05:50 Loop: 0, Int: TOTAL, Dur(s): 0.5, Mode: BCLR, Ops: 0, MB/s: 0.00, IO/s: 0, Lat(ms): [ min: 0.0, avg: 0.0, 99%: 0.0, max: 0.0 ], Slowdowns: 0 
2025/10/16 12:05:50 Running Loop 0 BUCKET DELETE TEST 
2025/10/16 12:05:51 Loop: 0, Int: TOTAL, Dur(s): 0.4, Mode: BDEL, Ops: 0, MB/s: 0.00, IO/s: 0, Lat(ms): [ min: 0.0, avg: 0.0, 99%: 0.0, max: 0.0 ], Slowdowns: 0 
2025/10/16 12:05:51 Running Loop 0 BUCKET INIT TEST 
2025/10/16 12:05:51 FATAL: Unable to create bucket b01b-000000000001 (is your access and secret correct?): RequestError: send request failed 
caused by: Put https://127.0.0.1:8443/b01b-000000000001: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate au 
( nice numactl -N 1 -m 1 -- env SSL_CERT_FILE=./rgw.pem  ~/go/bin/hsbench -a )  0.22s user 0.10s system 25% cpu 1.255 total

@cbodley
Copy link
Contributor

cbodley commented Oct 16, 2025

@cbodley a note about keep-alive

thanks @mkogan1, that's an interesting point. we could potentially disable keepalive on any connections using the old cert to force them to reconnect with the new one. but that would require some additional synchronization that i'm not sure would be worth the complexity

Adds the `ssl_reload` config option to the beast frontend.
This sets an interval in seconds to periodically reload the ssl context to pick up changes without restarting. It can be disabled (default) be setting it to `0`.

Fixes: 65470
Signed-off-by: Henry Richter <henry.richter@cloudandheat.com>
@anrao19
Copy link
Contributor

anrao19 commented Oct 30, 2025

PR testing completed and approved by @ivancich. Details : https://tracker.ceph.com/issues/73623
@h3nryc0ding , if no further testing is not needed please let me know if this can be merged

@cbodley cbodley merged commit a4df7f9 into ceph:main Oct 30, 2025
12 checks passed
@github-actions
Copy link

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/18941318909

@cbodley
Copy link
Contributor

cbodley commented Oct 30, 2025

thanks @h3nryc0ding. does rook want this backported?

:Description: Optional interval in seconds to periodically recreate the SSL
context, which reloads the SSL certificate and private key from
their specified paths. A value of ``0`` disables this feature.
The reload is non-disruptive to existing connections. If the re-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future PRs, please don’t hyphenate words for line breaks, leave them whole.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what happened. Apparently I forget the -s for fold. Sorry about that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see a line go over 80 chars than hardcoded hyphenation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#66112 entered to fix, as this PR was merged before it could be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! I'll definitely keep it in mind for future PRs.

@henrichter
Copy link
Contributor Author

henrichter commented Nov 3, 2025

thanks @h3nryc0ding. does rook want this backported?

Yes, would be amazing if we could get this backported.

Edit: #66107 and #66112 should probably get included.

#ifdef __cpp_lib_atomic_shared_ptr
const auto ssl_ctx = ssl_context.load(std::memory_order_acquire);
#else
const auto ssl_ctx = std::atomic_load_excplicit(&ssl_context, std::memory_order_acquire);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomic_load_excplicit

this typo somehow made it through qa, and ubuntu jammy builds are broken on main as a result. i've opened #66113 with a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry about that. I think #66107 already addresses this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants