Add server side TLS certificate expiry tracking and INFO telemetry#2913
Conversation
5e796c7 to
5534e39
Compare
|
Hi @zuiderkwast , since we discussed this in the issue, I wanted to share the PR here. This PR is still missing tests, before I add them, I wanted to check whether the overall approach looks correct to you. No rush at all, and happy to iterate based on your feedback. Thank you! |
371e49d to
1ce7c35
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2913 +/- ##
============================================
- Coverage 74.86% 74.86% -0.01%
============================================
Files 129 129
Lines 71201 71247 +46
============================================
+ Hits 53308 53336 +28
- Misses 17893 17911 +18
🚀 New features to boost your workflow:
|
|
good feature |
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
0f933ca to
d59827b
Compare
|
@zuiderkwast Thanks for the detailed feedback! I've made the changes you suggested:
Let me know if this looks better! |
2582512 to
1c234ee
Compare
madolson
left a comment
There was a problem hiding this comment.
Broadly looks good. We don't have a majority of TSC in the meeting, but madolson, zuiderkwast, and Ran conditionally approved it. If no one else comments in two weeks we can mark it major decision approved.
a25b612 to
f7014a1
Compare
|
Hi @madolson Thanks for the review and the helpful comments! I removed the redundant I also added a configurable tls-cert-warning-days (default 7) to control when daily expiry log start. If a cert is already expired, we still log a warning regardless of the threshold. All three certs are included. example: LOG: |
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
f7014a1 to
22b6c5b
Compare
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
|
Hi @madolson @zuiderkwast, thanks for the review! I’ve verified the TLS tests locally and everything is passing. It’s been about two weeks since your note, so I wanted to check in and see if you’re comfortable moving forward with these changes. |
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
c20c8ce to
9906cd9
Compare
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
There was a problem hiding this comment.
Pull request overview
This PR adds server-side TLS certificate expiry tracking, logging, and INFO telemetry so operators can monitor impending TLS certificate expirations and act before outages occur.
Changes:
- Track absolute expiration times and serial numbers for server, client, and CA certificates in the core TLS layer, and expose remaining lifetime in a new
INFO tlssection. - Add configuration for pre-expiry logging (
tls-cert-expiry-warning-days) plus periodic logging that distinguishes between “expiring soon” (NOTICE) and “already expired” (WARNING). - Extend the TLS test certificates and unit tests to cover INFO TLS telemetry, CA bundles/directories, client-cert presence/absence, and state transitions when TLS is disabled and re-enabled.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
valkey.conf |
Documents the new tls-cert-expiry-warning-days config option and how pre-expiry vs post-expiry logging behaves. |
utils/gen-test-certs.sh |
Extends TLS test cert generation to build a multi-cert CA bundle and hashed CA directory used to validate INFO tls CA counting and earliest-expiry selection. |
tests/unit/tls.tcl |
Adds comprehensive tests for INFO tls behavior (serials, expiry countdown, CA count), behavior when TLS is disabled, CA bundle/directory semantics, and TLS disable/enable transitions. |
src/tls.c |
Implements ASN.1 time parsing and conversion to epoch, tracks expiry times and serials for server/client/CA certificates, aggregates CA bundle/dir info, and adds the tlsLogServerCertExpiry logging path. |
src/server.h |
Extends the valkeyServer struct with TLS certificate expiry/serial fields and declares the new TLS helper functions used by config and cron. |
src/server.c |
Initializes the new TLS expiry state, adds the INFO tls section that exposes serials, seconds-to-expiry, and CA count, and wires tlsLogServerCertExpiry into serverCron. |
src/config.c |
Wires in the tls-cert-expiry-warning-days config and ensures TLS expiry state is reset appropriately when TLS configs or ports change. |
madolson
left a comment
There was a problem hiding this comment.
So, only high level thing is I'm still not very convinced the log for the warning is a great config, since I think people won't really watch for those and I'm also not clear about needing a metric for number of parsed certificates. Other than that the code mostly looks good, mostly just suggestions for simplication. Looks close to be being merged.
|
|
||
| #else /* USE_OPENSSL */ | ||
|
|
||
| void tlsResetCertInfo(void) { |
There was a problem hiding this comment.
Isn't this almost the exact same as the TLS version?
There was a problem hiding this comment.
Good point. We already pulled the shared clearing logic into tlsClearAllCertInfo(), so the duplicated code is minimal now. The remaining tlsResetCertInfo() implementations live in different compile paths and have slightly different semantics (TLS build has the guard; non‑TLS always clears).
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
|
Thanks for the feedback @madolson . I’ve addressed the comments and pushed updates. could you please take another look when you have a moment? summary:
|
Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com>
madolson
left a comment
There was a problem hiding this comment.
I'm ok with this. We talked during todays meeting about adding a log when the cert is actually expired, but I'm fine with or without that.
Thanks for the approval @madolson, really appreciate it. I didn’t include the expiry log here because I noticed there are multiple places where it should be handled (startup/reload as well as runtime). To keep things focused, I started a separate PR to address logging more consistently: Happy to continue the discussion there. |
…alkey-io#2913) - Parse TLS certificate expiry/serials and refresh cached values at startup and on TLS reconfigure; - expose INFO # TLS fields so monitoring can alert on impending expiry. INFO: ``` # TLS tls_server_cert_serial:826F01147AC63276 tls_server_cert_expires_in_seconds:123456 tls_client_cert_serial:826F01147AC63276 tls_client_cert_expires_in_seconds:123456 tls_ca_cert_serial:11A566... tls_ca_cert_expires_in_seconds:987654 ``` --------- Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com> Co-authored-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: sachinvmurthy <sachin.murthy97@gmail.com>
…alkey-io#2913) - Parse TLS certificate expiry/serials and refresh cached values at startup and on TLS reconfigure; - expose INFO # TLS fields so monitoring can alert on impending expiry. INFO: ``` # TLS tls_server_cert_serial:826F01147AC63276 tls_server_cert_expires_in_seconds:123456 tls_client_cert_serial:826F01147AC63276 tls_client_cert_expires_in_seconds:123456 tls_ca_cert_serial:11A566... tls_ca_cert_expires_in_seconds:987654 ``` --------- Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com> Co-authored-by: Yiwen Zhang <yiwen_zhang@apple.com>
…alkey-io#2913) - Parse TLS certificate expiry/serials and refresh cached values at startup and on TLS reconfigure; - expose INFO # TLS fields so monitoring can alert on impending expiry. INFO: ``` # TLS tls_server_cert_serial:826F01147AC63276 tls_server_cert_expires_in_seconds:123456 tls_client_cert_serial:826F01147AC63276 tls_client_cert_expires_in_seconds:123456 tls_ca_cert_serial:11A566... tls_ca_cert_expires_in_seconds:987654 ``` --------- Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com> Co-authored-by: Yiwen Zhang <yiwen_zhang@apple.com>
…alkey-io#2913) - Parse TLS certificate expiry/serials and refresh cached values at startup and on TLS reconfigure; - expose INFO # TLS fields so monitoring can alert on impending expiry. INFO: ``` # TLS tls_server_cert_serial:826F01147AC63276 tls_server_cert_expires_in_seconds:123456 tls_client_cert_serial:826F01147AC63276 tls_client_cert_expires_in_seconds:123456 tls_ca_cert_serial:11A566... tls_ca_cert_expires_in_seconds:987654 ``` --------- Signed-off-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com> Co-authored-by: Yiwen Zhang <yiwen_zhang@apple.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
…#3572) This reverts commit eb3f465. As discussed in the TSC Core Meeting, this PR reverts #2999 from Valkey 9.1. We plan to do in next major version as it's a breaking change. **Note**: Merge conflicts in `src/tls.c` includes were resolved due to overlapping changes from PR #2913, which added `tlsUpdateCertInfoFromDir()` with shared header dependencies. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
regarding issue: #2784
Parse TLS certificate expiry/serials and refresh cached values at startup and on TLS reconfigure;
expose INFO # TLS fields so monitoring can alert on impending expiry.
INFO: