Is there an existing issue for this?
What happened?
Description
Similar, but not the same as #7051
I'm using the new operator version 0.80, that supports the msteamsv2_configs for the updated workflows within teams.
You can use the webhook_url version, but not the webhook_url_file version (which allows you to keep you secrets better). According to the alertmanager config docs these two fields are mutually exclusive (https://prometheus.io/docs/alerting/latest/configuration/#msteamsv2_config).
Steps to Reproduce
Using this config:
prometheus:
alertmanager:
receivers:
- name: "teams"
msteamsv2_configs:
- webhook_url_file: /etc/alertmanager/secrets/monitoring-secrets/teams-webhook-url
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
Operator is happy, alertmanager is not.
Error in alertmanager log:
time=2025-02-17T09:54:18.801Z level=INFO source=coordinator.go:112 msg="Loading configuration file" component=configuration file=/etc/alertmanager/config_out/alertmanager.env.yaml
time=2025-02-17T09:54:18.802Z level=ERROR source=coordinator.go:117 msg="Loading configuration file failed" component=configuration file=/etc/alertmanager/config_out/alertmanager.env.yaml err="unsupported scheme \"\" for URL"
When I shell into it to view the alertmanager.env.yaml, I can see the config is being written as:
- name: teams
msteamsv2_configs:
- webhook_url: ""
webhook_url_file: /etc/alertmanager/secrets/monitoring-secrets/teams-webhook-url
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
Where I can confirm that the issue is the line webhook_url: "" which I suppose is being generated by the operator.
Expected Result
Config loaded and teams workflow invocation working.
Actual Result
Fail to load config.
Confirmed workaround
If I shell into the alertmanager container and vi edit the yaml file and edit the config to be how I intended it:
- name: "teams"
msteamsv2_configs:
- webhook_url_file: /etc/alertmanager/secrets/monitoring-secrets/teams-webhook-url
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
Save, THEN IT WORKS! (You have to do it in all replicas of course)
Alertmanager happily loads the new config and I did get my teams workflow running.
So I'm very confident this is a prometheus operator and not an alertmanager issue.
This is not a great workaround as it will be overwritten at next deploy, but it clearly points at how to fix it 😊
My pod/container versions
k get pods -n monitoring -o json | ConvertFrom-Json | Select -ExpandProperty items |% { $.metadata.name + ":" + ($.spec.containers |% {"`n - " + $.name + ":" +$.image})}
alertmanager-monitoring-prometheus-alertmanager-0:
- alertmanager:quay.io/prometheus/alertmanager:v0.28.0
- config-reloader:quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0
alertmanager-monitoring-prometheus-alertmanager-1:
- alertmanager:quay.io/prometheus/alertmanager:v0.28.0
- config-reloader:quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0
monitoring-grafana-8595fd78b6-5xxs4:
- grafana-sc-dashboard:quay.io/kiwigrid/k8s-sidecar:1.30.0
- grafana-sc-datasources:quay.io/kiwigrid/k8s-sidecar:1.30.0
- grafana:docker.io/grafana/grafana:11.5.1
monitoring-kube-state-metrics-76f6f58dd5-wkngx:
- kube-state-metrics:registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0
monitoring-prometheus-node-exporter-f25s8:
- node-exporter:quay.io/prometheus/node-exporter:v1.8.2
monitoring-prometheus-node-exporter-tqp29:
- node-exporter:quay.io/prometheus/node-exporter:v1.8.2
monitoring-prometheus-operator-75fd665cdc-4wnzb:
- prometheus:quay.io/prometheus-operator/prometheus-operator:v0.80.0
prometheus-monitoring-prometheus-prometheus-0:
- prometheus:quay.io/prometheus/prometheus:v3.1.0
- config-reloader:quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0
prometheus-monitoring-prometheus-prometheus-1:
- prometheus:quay.io/prometheus/prometheus:v3.1.0
- config-reloader:quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0
Prometheus Operator Version
Name: monitoring-prometheus-operator-75fd665cdc-4wnzb
Namespace: monitoring
Priority: 0
Service Account: monitoring-prometheus-operator
Node: aks-agentpool-10287619-vmss00008q/10.1.16.10
Start Time: Sun, 16 Feb 2025 09:11:55 +0100
Labels: app=prometheus-operator
app.kubernetes.io/component=prometheus-operator
app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=prometheus-prometheus-operator
app.kubernetes.io/part-of=prometheus
app.kubernetes.io/version=69.3.1
chart=prometheus-69.3.1
heritage=Helm
pod-template-hash=75fd665cdc
release=monitoring
Annotations: <none>
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.244.1.147
IPs:
IP: 10.244.1.147
Controlled By: ReplicaSet/monitoring-prometheus-operator-75fd665cdc
Containers:
prometheus:
Container ID: containerd://c93c7d37a1772685264f0ad2dd23aa27737fd665c9a5fb6e6deb9a5f6cbb0009
Image: quay.io/prometheus-operator/prometheus-operator:v0.80.0
Image ID: quay.io/prometheus-operator/prometheus-operator@sha256:83b3705f139e7799c8fefef81ce96161bcd0a328187d829cf26836339d8802d7
Port: 10250/TCP
Host Port: 0/TCP
Args:
--kubelet-service=kube-system/monitoring-prometheus-kubelet
--kubelet-endpoints=true
--kubelet-endpointslice=false
--localhost=127.0.0.1
--prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0
--config-reloader-cpu-request=0
--config-reloader-cpu-limit=0
--config-reloader-memory-request=0
--config-reloader-memory-limit=0
--thanos-default-base-image=quay.io/thanos/thanos:v0.37.2
--secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
--cluster-domain=cluster.local
--web.enable-tls=true
--web.cert-file=/cert/tls.crt
--web.key-file=/cert/tls.key
--web.listen-address=:10250
--web.tls-min-version=VersionTLS13
State: Running
Started: Sun, 16 Feb 2025 09:12:01 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 800m
memory: 400Mi
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
GOGC: 30
Mounts:
/cert from tls-secret (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rs76m (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tls-secret:
Type: Secret (a volume populated by a Secret)
SecretName: monitoring-prometheus-admission
Optional: false
kube-api-access-rs76m:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Kubernetes Version
kubectl version -o yaml
clientVersion:
buildDate: "2024-06-11T20:29:44Z"
compiler: gc
gitCommit: 39683505b630ff2121012f3c5b16215a1449d5ed
gitTreeState: clean
gitVersion: v1.30.2
goVersion: go1.22.4
major: "1"
minor: "30"
platform: windows/arm64
kustomizeVersion: v5.0.4-0.20230601165947-6ce0bf390ce3
serverVersion:
buildDate: "2025-01-16T18:50:20Z"
compiler: gc
gitCommit: af64d838aacd9173317b39cf273741816bd82377
gitTreeState: clean
gitVersion: v1.31.5
goVersion: go1.22.10
major: "1"
minor: "31"
platform: linux/amd64
Kubernetes Cluster Type
AKS
How did you deploy Prometheus-Operator?
helm chart:prometheus-community/kube-prometheus-stack
Manifests
prometheus:
alertmanager:
receivers:
- name: "teams"
msteamsv2_configs:
- webhook_url_file: /etc/alertmanager/secrets/monitoring-secrets/teams-webhook-url
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
prometheus-operator log output
ts=2025-02-17T09:50:10.26945454Z level=info caller=/workspace/pkg/prometheus/server/operator.go:766 msg="sync prometheus" component=prometheus-controller key=monitoring/monitoring-prometheus-prometheus
ts=2025-02-17T09:50:10.67056353Z level=info caller=/workspace/pkg/alertmanager/operator.go:544 msg="sync alertmanager" component=alertmanager-controller key=monitoring/monitoring-prometheus-alertmanager
ts=2025-02-17T09:50:10.866029562Z level=info caller=/workspace/pkg/prometheus/server/operator.go:766 msg="sync prometheus" component=prometheus-controller key=monitoring/monitoring-prometheus-prometheus
ts=2025-02-17T10:03:46.826701462Z level=info caller=/workspace/pkg/alertmanager/operator.go:544 msg="sync alertmanager" component=alertmanager-controller key=monitoring/monitoring-prometheus-alertmanager
Anything else?
Thanks for an amazing product!
Is there an existing issue for this?
What happened?
Description
Similar, but not the same as #7051
I'm using the new operator version 0.80, that supports the msteamsv2_configs for the updated workflows within teams.
You can use the webhook_url version, but not the webhook_url_file version (which allows you to keep you secrets better). According to the alertmanager config docs these two fields are mutually exclusive (https://prometheus.io/docs/alerting/latest/configuration/#msteamsv2_config).
Steps to Reproduce
Using this config:
Operator is happy, alertmanager is not.
Error in alertmanager log:
When I shell into it to view the alertmanager.env.yaml, I can see the config is being written as:
Where I can confirm that the issue is the line webhook_url: "" which I suppose is being generated by the operator.
Expected Result
Config loaded and teams workflow invocation working.
Actual Result
Fail to load config.
Confirmed workaround
If I shell into the alertmanager container and vi edit the yaml file and edit the config to be how I intended it:
Save, THEN IT WORKS! (You have to do it in all replicas of course)
Alertmanager happily loads the new config and I did get my teams workflow running.
So I'm very confident this is a prometheus operator and not an alertmanager issue.
This is not a great workaround as it will be overwritten at next deploy, but it clearly points at how to fix it 😊
My pod/container versions
k get pods -n monitoring -o json | ConvertFrom-Json | Select -ExpandProperty items |% { $.metadata.name + ":" + ($.spec.containers |% {"`n - " + $.name + ":" +$.image})}
alertmanager-monitoring-prometheus-alertmanager-0:
alertmanager-monitoring-prometheus-alertmanager-1:
monitoring-grafana-8595fd78b6-5xxs4:
monitoring-kube-state-metrics-76f6f58dd5-wkngx:
monitoring-prometheus-node-exporter-f25s8:
monitoring-prometheus-node-exporter-tqp29:
monitoring-prometheus-operator-75fd665cdc-4wnzb:
prometheus-monitoring-prometheus-prometheus-0:
prometheus-monitoring-prometheus-prometheus-1:
Prometheus Operator Version
Name: monitoring-prometheus-operator-75fd665cdc-4wnzb Namespace: monitoring Priority: 0 Service Account: monitoring-prometheus-operator Node: aks-agentpool-10287619-vmss00008q/10.1.16.10 Start Time: Sun, 16 Feb 2025 09:11:55 +0100 Labels: app=prometheus-operator app.kubernetes.io/component=prometheus-operator app.kubernetes.io/instance=monitoring app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=prometheus-prometheus-operator app.kubernetes.io/part-of=prometheus app.kubernetes.io/version=69.3.1 chart=prometheus-69.3.1 heritage=Helm pod-template-hash=75fd665cdc release=monitoring Annotations: <none> Status: Running SeccompProfile: RuntimeDefault IP: 10.244.1.147 IPs: IP: 10.244.1.147 Controlled By: ReplicaSet/monitoring-prometheus-operator-75fd665cdc Containers: prometheus: Container ID: containerd://c93c7d37a1772685264f0ad2dd23aa27737fd665c9a5fb6e6deb9a5f6cbb0009 Image: quay.io/prometheus-operator/prometheus-operator:v0.80.0 Image ID: quay.io/prometheus-operator/prometheus-operator@sha256:83b3705f139e7799c8fefef81ce96161bcd0a328187d829cf26836339d8802d7 Port: 10250/TCP Host Port: 0/TCP Args: --kubelet-service=kube-system/monitoring-prometheus-kubelet --kubelet-endpoints=true --kubelet-endpointslice=false --localhost=127.0.0.1 --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.80.0 --config-reloader-cpu-request=0 --config-reloader-cpu-limit=0 --config-reloader-memory-request=0 --config-reloader-memory-limit=0 --thanos-default-base-image=quay.io/thanos/thanos:v0.37.2 --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1 --cluster-domain=cluster.local --web.enable-tls=true --web.cert-file=/cert/tls.crt --web.key-file=/cert/tls.key --web.listen-address=:10250 --web.tls-min-version=VersionTLS13 State: Running Started: Sun, 16 Feb 2025 09:12:01 +0100 Ready: True Restart Count: 0 Limits: cpu: 800m memory: 400Mi Requests: cpu: 100m memory: 200Mi Liveness: http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:https/healthz delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: GOGC: 30 Mounts: /cert from tls-secret (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rs76m (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: tls-secret: Type: Secret (a volume populated by a Secret) SecretName: monitoring-prometheus-admission Optional: false kube-api-access-rs76m: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none>Kubernetes Version
Kubernetes Cluster Type
AKS
How did you deploy Prometheus-Operator?
helm chart:prometheus-community/kube-prometheus-stack
Manifests
prometheus-operator log output
Anything else?
Thanks for an amazing product!