Skip to content

fix(kubernetes): record cluster_ip services in dns_programming_duration metric#7951

Merged
yongtang merged 1 commit into
coredns:masterfrom
syedazeez337:fix/kubernetes-dns-programming-metric-cluster-ip
Mar 24, 2026
Merged

fix(kubernetes): record cluster_ip services in dns_programming_duration metric#7951
yongtang merged 1 commit into
coredns:masterfrom
syedazeez337:fix/kubernetes-dns-programming-metric-cluster-ip

Conversation

@syedazeez337

Copy link
Copy Markdown
Contributor

The coredns_kubernetes_dns_programming_duration_seconds metric has three documented service_kind values (cluster_ip, headless_with_selector, headless_without_selector), but only headless_with_selector was ever emitted. The original PR (#3171) deferred the rest and it never got finished — the README even had a Bugs entry for it.

The guard in record() was if !isHeadless || l.TT.IsZero(), which skipped all ClusterIP services unconditionally. The fix drops the !isHeadless condition and branches on Headless() to pick the right label. The EndpointsLastChangeTriggerTime annotation is set by the endpoints-controller for any service with a selector (headless or ClusterIP), so the existing TT check already captures the right population.

headless_without_selector is still not covered — that would need a new field on the Service struct which the struct comment explicitly gates on maintainer sign-off. Leaving that for a follow-up.

Also added the first tests for the object package (five table-driven cases) and updated the README Bugs section.

Fixes #7644

@syedazeez337 syedazeez337 force-pushed the fix/kubernetes-dns-programming-metric-cluster-ip branch from ffe8edc to 7f3ea4c Compare March 22, 2026 17:47
…on metric

Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
@syedazeez337 syedazeez337 force-pushed the fix/kubernetes-dns-programming-metric-cluster-ip branch from 7f3ea4c to 98c3738 Compare March 22, 2026 17:50
@syedazeez337

Copy link
Copy Markdown
Contributor Author

The ci/circleci: kubernetes-tests failure is a known consequence of this fix, not a regression in CoreDNS itself.

What happened: TestDNSProgrammingLatencyEndpoints/EndpointSlice in coredns/ci selects the metric to check via Metric[0] (the first label value in the histogram family). Before this fix, only headless_with_selector was ever recorded, so Metric[0] was always that label. After this fix, cluster_ip is also recorded and, being alphabetically first, becomes Metric[0]. The test's expected delta (+2 for headless observations) no longer matches what it's measuring.

The fix is in coredns/ci — select the metric by label value instead of by index position. I'll open a companion PR there.

syedazeez337 added a commit to syedazeez337/ci that referenced this pull request Mar 22, 2026
…e test

cluster_ip is now also recorded by the dns_programming_duration metric
(coredns/coredns#7951), so Metric[0] is no longer reliably the
headless_with_selector series. Use a label-based lookup instead.

Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
@syedazeez337

Copy link
Copy Markdown
Contributor Author

Companion fix for the CircleCI failure opened at coredns/ci#174.

@yongtang yongtang merged commit f582a01 into coredns:master Mar 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plugin/kubernetes: DNS programming duration metric is only recording for headless services

2 participants