[monitoring] Add SLACK_SEVERITY_FILTER field and VMAgent for tenant monitoring #1712

IvanHunters · 2025-12-10T22:32:55Z

[monitoring] Add SLACK_SEVERITY_FILTER field and VMAgent for tenant monitoring
What this PR does
This PR introduces the SLACK_SEVERITY_FILTER environment variable in the Alerta deployment to enable
filtering of alert severities for Slack notifications based on the disabledSeverity configuration.
Additionally, it adds a VMAgent resource template for scraping metrics within tenant namespaces, improving
monitoring granularity and control.

[monitoring] Add SLACK_SEVERITY_FILTER for filtering Slack alert severities and VMAgent configuration for
tenant-specific metrics scraping.

Summary by CodeRabbit

New Features
- Added configurable severity filtering for Telegram alerts.
- Extended Slack severity filtering to accept lists of severities.
Bug Fixes / Behavior
- Severity settings now accept arrays (multiple severities) instead of single comma-separated strings.
Documentation
- Updated configuration docs and examples to show list-style severity settings.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T22:33:06Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Introduce array-based disabledSeverity fields for Slack and Telegram and emit JSON-quoted severity filters as environment variables in the Alerta deployment template when corresponding integrations are enabled.

Changes

Cohort / File(s)	Summary
Alerta deployment template `packages/extra/monitoring/templates/alerta/alerta.yaml`	Conditionally emit two new env vars: `TELEGRAM_DISABLE_NOTIFICATION_SEVERITY` from `.Values.alerta.alerts.telegram.disabledSeverity` (JSON-quoted) when Telegram is configured, and `SLACK_SEVERITY_FILTER` from `.Values.alerta.alerts.slack.disabledSeverity` (JSON-quoted) when Slack is configured and severity is provided.
Schema & chart values `packages/extra/monitoring/values.schema.json`, `packages/extra/monitoring/values.yaml`	Change `disabledSeverity` for Telegram from string to array of strings; add `disabledSeverity` as array of strings for Slack. Defaults set to empty arrays and descriptions updated to show array examples.
Documentation `packages/extra/monitoring/README.md`	Update Alerta configuration table formatting and reflect `disabledSeverity` fields as arrays with example values.
Cluster resource schema / keys order `packages/system/cozystack-resource-definitions/cozyrds/monitoring.yaml`	Update embedded OpenAPI schema and chart values JSON to use array `disabledSeverity` for Slack and Telegram; add new keysOrder entries for the Slack/Grafana disabledSeverity paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review template conditionals and correct JSON quoting/escaping for env values in alerta.yaml.
Confirm schema change (string → array) is consistent across values.schema.json, values.yaml, and embedded OpenAPI in monitoring.yaml.
Validate README examples match the new types.
Check keysOrder modifications in monitoring.yaml for ordering correctness.

Poem

I’m a rabbit in the config glade, hopping light and merry,
I changed severities from lone strings to a boundless array,
Slack and Telegram now wear tidy filters like a crown,
I nibble bugs and stitch the YAML gown,
Hooray for clearer alerts — carrots for the deploy day! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title mentions SLACK_SEVERITY_FILTER and VMAgent monitoring, but the primary changes are converting disabledSeverity from string to array types across Telegram and Slack configurations, with template updates to use these new arrays.	Revise the title to accurately reflect the main change: something like '[monitoring] Convert disabledSeverity to array type for Slack and Telegram alerts' would better represent the core modifications.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/slack-notification-from-alerta

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-10T22:33:14Z

Summary of Changes

Hello @IvanHunters, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the monitoring-agents package by integrating Alertmanager, a crucial component for handling and routing alerts. The changes establish a complete alert notification pipeline, primarily focusing on Slack integration, and provide the necessary Kubernetes resources and configuration defaults to deploy and manage Alertmanager effectively. This ensures that critical system alerts can be efficiently processed and dispatched to relevant communication channels.

Highlights

Alertmanager Integration: Introduced a new Alertmanager deployment, service, and configuration secret to enable robust alert routing and notification capabilities within the monitoring system.
Slack Notification Setup: Configured Alertmanager to send notifications to Slack, utilizing templated values for webhook URL, channel, username, and icon emoji, making it easily configurable.
VMAlert Integration: Added a VMAlert custom resource that is configured to forward alerts to the newly deployed Alertmanager instance.
Workload Monitoring: Included a WorkloadMonitor custom resource specifically for Alertmanager, defining its monitoring characteristics within the system.
Configuration Defaults: Added default configuration values for Alertmanager in values.yaml, allowing for easy enablement, replica scaling, and Slack integration setup.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces Alertmanager with Slack integration to the monitoring stack. While this is a valuable addition, the current implementation has several areas that need improvement to be considered production-ready. My review includes critical feedback on using a Deployment for a stateful service like Alertmanager, which should be a StatefulSet to handle state and high availability correctly. I've also pointed out issues with hardcoded values, conditional configuration for the Slack integration to prevent errors, and missing resource definitions. Addressing these points will significantly improve the robustness and maintainability of this new component.

gemini-code-assist · 2025-12-10T22:35:19Z

packages/system/monitoring-agents/templates/alertmanager-deployment.yaml

@@ -0,0 +1,54 @@
+{{- if .Values.alertmanager.enabled }}
+apiVersion: apps/v1
+kind: Deployment


For a stateful application like Alertmanager, a StatefulSet is more appropriate than a Deployment. A StatefulSet provides stable network identifiers and persistent storage, which are crucial for features like alert deduplication, silences, and high availability.

Using a Deployment with an emptyDir volume (as configured on line 53) will cause all state to be lost on pod restarts. Furthermore, disabling clustering via --cluster.listen-address= (line 26) will lead to duplicate notifications if you scale to more than one replica.

gemini-code-assist · 2025-12-10T22:35:19Z

packages/system/monitoring-agents/templates/alertmanager-secret.yaml

+    route:
+      group_by: ['alertname']
+      group_wait: 10s
+      group_interval: 10s
+      repeat_interval: 1h
+      receiver: 'slack'
+      routes:
+      - match:
+          alertname: Watchdog
+        receiver: 'null'
+
+    receivers:
+    - name: 'null'
+    - name: 'slack'
+      slack_configs:
+      - api_url: {{ .Values.alertmanager.slack.webhookUrl | quote }}
+        channel: {{ .Values.alertmanager.slack.channel | quote }}
+        username: {{ .Values.alertmanager.slack.username | quote }}
+        icon_emoji: {{ .Values.alertmanager.slack.iconEmoji | quote }}
+        title: '{{`{{ .GroupLabels.alertname }}`}}'
+        text: '{{`{{ range .Alerts }}{{ .Annotations.summary }} - {{ .Annotations.description }}{{ end }}`}}'


The configuration unconditionally sets 'slack' as the default receiver and defines the slack_configs. If alertmanager.slack.webhookUrl is not provided in values.yaml, Alertmanager will fail to send notifications because the api_url will be empty.

To prevent this, you should conditionally configure the receiver and the route based on the presence of the webhookUrl.

route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: {{ if .Values.alertmanager.slack.webhookUrl }}'slack'{{ else }}'null'{{ end }} routes: - match: alertname: Watchdog receiver: 'null' receivers: - name: 'null' {{- if .Values.alertmanager.slack.webhookUrl }} - name: 'slack' slack_configs: - api_url: {{ .Values.alertmanager.slack.webhookUrl | quote }} channel: {{ .Values.alertmanager.slack.channel | quote }} username: {{ .Values.alertmanager.slack.username | quote }} icon_emoji: {{ .Values.alertmanager.slack.iconEmoji | quote }} title: '{{`{{ .GroupLabels.alertname }}`}}' text: '{{`{{ range .Alerts }}{{ .Annotations.summary }} - {{ .Annotations.description }}{{ end }}`}}' {{- end }}

gemini-code-assist · 2025-12-10T22:35:19Z

packages/system/monitoring-agents/templates/alertmanager-deployment.yaml

+          image: "prom/alertmanager:v0.27.0"
+          imagePullPolicy: IfNotPresent


The Alertmanager image, tag, and pull policy are hardcoded. It's a best practice to make these configurable in values.yaml to simplify upgrades.

Please add an image object to the alertmanager section in values.yaml like this:

alertmanager: # ... other values image: repository: prom/alertmanager tag: v0.27.0 pullPolicy: IfNotPresent

Then use these values in the template.

image: "{{ .Values.alertmanager.image.repository }}:{{ .Values.alertmanager.image.tag }}" imagePullPolicy: {{ .Values.alertmanager.image.pullPolicy }}

gemini-code-assist · 2025-12-10T22:35:19Z

packages/system/monitoring-agents/templates/alertmanager-workloadmonitor.yaml

+  name: alertmanager
+spec:
+  replicas: {{ .Values.alertmanager.replicas }}
+  minReplicas: 1


minReplicas is hardcoded to 1. If a user sets alertmanager.replicas to 0 in values.yaml to disable the deployment, this WorkloadMonitor will have replicas: 0 and minReplicas: 1, which is an impossible condition to satisfy and will likely result in a persistent non-operational status for this workload. minReplicas should be 0 when the desired replica count is 0.

minReplicas: {{ if gt (int .Values.alertmanager.replicas) 0 }}1{{ else }}0{{ end }}

gemini-code-assist · 2025-12-10T22:35:19Z

packages/system/monitoring-agents/templates/vmalert.yaml

+    url: http://vmselect-shortterm.tenant-root.svc:8481/select/0/prometheus
+  remoteWrite:
+    url: http://vminsert-shortterm.tenant-root.svc:8480/insert/0/prometheus/api/v1/write
+  resources: {}


The resources for VMAlert are empty. It is a best practice to specify resource requests and limits for all workloads to ensure cluster stability and proper scheduling. These should be configurable via values.yaml.

Example for values.yaml:

vmalert: resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi

resources: {{- toYaml .Values.vmalert.resources | nindent 4 }}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10d3574 and 8ca7e80.

📒 Files selected for processing (6)

packages/system/monitoring-agents/templates/alertmanager-deployment.yaml (1 hunks)
packages/system/monitoring-agents/templates/alertmanager-secret.yaml (1 hunks)
packages/system/monitoring-agents/templates/alertmanager-service.yaml (1 hunks)
packages/system/monitoring-agents/templates/alertmanager-workloadmonitor.yaml (1 hunks)
packages/system/monitoring-agents/templates/vmalert.yaml (1 hunks)
packages/system/monitoring-agents/values.yaml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

packages/system/monitoring-agents/templates/alertmanager-secret.yaml