feat: add default endpoint probe and TLS expiry alerts#2530
Merged
jasonwashburn merged 10 commits intomainfrom Mar 30, 2026
Merged
feat: add default endpoint probe and TLS expiry alerts#2530jasonwashburn merged 10 commits intomainfrom
jasonwashburn merged 10 commits intomainfrom
Conversation
69212c6 to
cb075f2
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds opinionated, configurable default probe alerting to UDS Core’s monitoring stack (via the uds-prometheus-config chart), with accompanying Helm unit tests, Vitest E2E coverage, and documentation updates describing the new defaults and tuning paths.
Changes:
- Add default
PrometheusRuleprobe alerts for endpoint downtime and TLS certificate expiry, gated by Helm values. - Add Helm-unittest suites plus a new Vitest E2E test that validates the alert rules are loaded in Prometheus.
- Update monitoring docs (concepts, reference, and how-to guides) to document the shipped rules and configuration knobs.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/vitest/default-probe-alerts.spec.ts | New Vitest E2E test that polls Prometheus for the shipped probe alert rule names. |
| src/prometheus-stack/tasks.yaml | Runs npm ci once and executes Vitest suites for Prometheus, default probe alerts, and blackbox exporter. |
| src/prometheus-stack/chart/values.yaml | Adds udsCoreDefaultAlerts values surface (enablement, severities, durations, TLS day thresholds). |
| src/prometheus-stack/chart/tests/probe_alerting_rules_test.yaml | Helm-unittest coverage for default rendering, toggles, and override behavior. |
| src/prometheus-stack/chart/tests/probe_alerting_rules_no_crd_test.yaml | Ensures probe alerts do not render when the PrometheusRule CRD API version is unavailable. |
| src/prometheus-stack/chart/templates/probe-alerting-rules.yaml | New PrometheusRule template implementing UDSProbeEndpointDown + TLS expiry warning/critical alerts. |
| docs/reference/configuration/monitoring-and-observability.md | Documents shipped default probe alert rules, labels, and Helm configuration surface with examples. |
| docs/how-to-guides/monitoring-and-observability/set-up-uptime-monitoring.mdx | Notes the existence of default probe alerts and points readers to tuning guidance. |
| docs/how-to-guides/monitoring-and-observability/overview.mdx | Updates guide card text to reflect tuning of built-in probe defaults. |
| docs/how-to-guides/monitoring-and-observability/create-metric-alerting-rules.mdx | Adds guidance and examples for tuning/disabling UDS Core probe defaults alongside upstream defaults. |
| docs/concepts/core-features/monitoring-observability.mdx | Updates concepts to include default probe alert rules as part of built-in uptime monitoring. |
mjnagel
reviewed
Mar 27, 2026
Contributor
mjnagel
left a comment
There was a problem hiding this comment.
Few smaller comments - nothing major overall.
Contributor
briantwatson
left a comment
There was a problem hiding this comment.
Nice addition, some comments below for consideration.
Co-authored-by: Brian Watson <brianwatson@defenseunicorns.com> Co-authored-by: Micah Nagel <micah.nagel@gmail.com>
f0b2c4b to
c474b34
Compare
mjnagel
approved these changes
Mar 30, 2026
briantwatson
approved these changes
Mar 30, 2026
chance-coleman
pushed a commit
that referenced
this pull request
Apr 1, 2026
🤖 I have created a release *beep* *boop* --- ## [1.1.0](v1.0.0...v1.1.0) (2026-03-31) ### Features * add default endpoint probe and TLS expiry alerts ([#2530](#2530)) ([625527c](625527c)) * add support for image volumes in policy ([#2552](#2552)) ([46b653e](46b653e)) * default uptime probe overrides ([#2520](#2520)) ([0c80295](0c80295)) ### Bug Fixes * **docs:** llm friendly docs ([#2535](#2535)) ([107f181](107f181)) * remove aggressive whitespace trimming in keycloak statefulset template ([#2539](#2539)) ([231fa5c](231fa5c)) ### Miscellaneous * **ci:** cleanup old cve workflow ([#2550](#2550)) ([f67afa8](f67afa8)) * **ci:** ensure concurrency on all workflows ([#2527](#2527)) ([3ccf9ef](3ccf9ef)) * **deps-dev:** bump picomatch from 4.0.3 to 4.0.4 in /scripts/renovate ([#2538](#2538)) ([ba0ed10](ba0ed10)) * **deps-dev:** bump picomatch from 4.0.3 to 4.0.4 in /scripts/root-ca-retriever ([#2537](#2537)) ([7bfaaa0](7bfaaa0)) * **deps-dev:** bump rollup from 4.57.1 to 4.60.1 in /docs/.c4 ([#2551](#2551)) ([abcb422](abcb422)) * **deps:** bump brace-expansion ([#2544](#2544)) ([9cdc76e](9cdc76e)) * **deps:** bump flatted from 3.4.1 to 3.4.2 ([#2512](#2512)) ([7c08658](7c08658)) * **deps:** bump picomatch ([#2536](#2536)) ([38baaa8](38baaa8)) * **deps:** bump yaml from 2.8.2 to 2.8.3 in /scripts/renovate ([#2542](#2542)) ([f78b6df](f78b6df)) * **deps:** update iac-support-deps ([#2534](#2534)) ([5098a93](5098a93)) * **deps:** update keycloak to v26.5.6 ([#2502](#2502)) ([ba6a2c0](ba6a2c0)) * **deps:** update pepr to v1.1.5 ([#2540](#2540)) ([21bc575](21bc575)) * **deps:** update prometheus-stack ([#2518](#2518)) ([c9dfd05](c9dfd05)) * **deps:** update setup-uv (support dep) to v8 ([#2548](#2548)) ([2b55d1c](2b55d1c)) * **deps:** update support dependencies to v4.35.1 ([#2545](#2545)) ([a964617](a964617)) * **deps:** update UDS CLI to 0.30.0, Zarf init to 0.74.0 ([#2526](#2526)) ([bb0fed5](bb0fed5)) * **deps:** update velero ([#2405](#2405)) ([609947e](609947e)) * **docs:** add release notes for 1.1.0 ([#2555](#2555)) ([3cc107e](3cc107e)) * **docs:** remove old doc images and diagrams ([#2521](#2521)) ([7ef96c8](7ef96c8)) * **renovate:** add minimumReleaseAge for npm support dependencies ([#2553](#2553)) ([94ff4d6](94ff4d6)) * **renovate:** set min release age for pepr to null ([#2554](#2554)) ([ee5dbf0](ee5dbf0)) * replace use of `uds` with `./uds` in uds tasks ([#2541](#2541)) ([d165ec6](d165ec6)) * update contributing doc link in PR template ([#2532](#2532)) ([0651180](0651180)) ### Documentation * cleanup old doc sites references;cleanup readme ([#2525](#2525)) ([c980914](c980914)) * fix incorrect link to configuration overview on reference overview ([#2533](#2533)) ([32fe181](32fe181)) * loki storage configuration reference ([#2529](#2529)) ([25bd0e7](25bd0e7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds default UDS Core probe alert rules for endpoint downtime and TLS certificate expiry, with Helm-configurable thresholds, durations, and severities.
Included changes
uds-prometheus-configchart:UDSProbeEndpointDownUDSProbeTLSExpiryWarningUDSProbeTLSExpiryCriticalRelated Issue
Fixes # CORE-72
Type of change
Steps to Validate
uds run test-single-layer --set LAYER=monitoringto deploy and run e2e testsChecklist before merging