Skip to content

Add optional TLS support to /metrics endpoint#7255

Merged
johnbelamaric merged 2 commits into
coredns:masterfrom
Nordix:peppi-lotta/metrics-tls-support
Mar 12, 2026
Merged

Add optional TLS support to /metrics endpoint#7255
johnbelamaric merged 2 commits into
coredns:masterfrom
Nordix:peppi-lotta/metrics-tls-support

Conversation

@peppi-lotta

@peppi-lotta peppi-lotta commented Apr 15, 2025

Copy link
Copy Markdown
Contributor

1. Why is this pull request needed?

It adds optional TLS support to the CoreDNS metrics endpoint, allowing metrics to be served over HTTPS. This improves security by encrypting metrics traffic. TLS is disabled by default, so existing configurations continue to work.

2. Related issues

#7109

3. Documentation changes

  • Update metrics plugin docs to include TLS settings and examples
  • Add instructions for providing certificates and configuring Prometheus to scrape over HTTPS
  • Add troubleshooting notes

4. Backward compatibility

No breaking changes. TLS is optional and off by default.

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch 8 times, most recently from 89c8dbf to 7ac7560 Compare April 15, 2025 10:51
@peppi-lotta

Copy link
Copy Markdown
Contributor Author

@chrisohaver @SuperQ Would you have time to review? :)

@kashifest

Copy link
Copy Markdown

@chrisohaver @SuperQ can you please take a look on this PR? It would be nice to have this PR moving forward

Comment thread plugin/metrics/setup.go Fixed
@codecov

codecov Bot commented Apr 25, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.46512% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.19%. Comparing base (93c57b6) to head (8c5515c).
⚠️ Report is 1706 commits behind head on master.

Files with missing lines Patch % Lines
plugin/metrics/setup.go 0.00% 11 Missing and 1 partial ⚠️
plugin/metrics/metrics.go 83.87% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7255      +/-   ##
==========================================
+ Coverage   55.70%   63.19%   +7.49%     
==========================================
  Files         224      278      +54     
  Lines       10016    15129    +5113     
==========================================
+ Hits         5579     9561    +3982     
- Misses       3978     4879     +901     
- Partials      459      689     +230     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kashifest

Copy link
Copy Markdown

It would be really nice to have someone review this and start the discussion rolling

@amshankaran

Copy link
Copy Markdown

It would be helpful if someone could take a look at this.

Comment thread plugin/metrics/metrics.go Outdated
@peppi-lotta

Copy link
Copy Markdown
Contributor Author

@jameshartig @miekg @SuperQ @greenpau @Tantalor93 Can I get a review on this? What are the next steps needed to get this merged?

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from f12b5dd to ce041ac Compare June 2, 2025 09:06
@miekg

miekg commented Jun 19, 2025

Copy link
Copy Markdown
Member

I consider TLS on metric endpoints to be an anti feature

@JanMkl

JanMkl commented Jun 19, 2025

Copy link
Copy Markdown

I consider TLS on metric endpoints to be an anti feature

I understand the desire to minimise complexity—especially on internal metrics endpoints—but in a zero-trust architecture, all network traffic, regardless of source or function, must be both authenticated and encrypted.

Tampering with metrics may seem low-impact, but it can lead to unpredictable or harmful behavior. For example:

  • It can reveal service topology and workload characteristics.
  • It can trigger alerts or automated remediation based on falsified data.

In multi-tenant or regulated environments where the network is not implicitly trusted, TLS is not an anti-feature—it’s a baseline security control. Even when the content isn’t sensitive, preserving integrity and controlling access are essential.

To be clear, the PR does not propose mandatory TLS—it introduces optional support, as indicated in the title: “Add optional TLS support to /metrics endpoint”. Rejecting TLS outright disregards the realities of deployment models built on zero-trust principles.

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from ce041ac to 8f81c13 Compare September 22, 2025 09:37
@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch 3 times, most recently from 9239ecc to 2b84a94 Compare October 21, 2025 07:23
@peppi-lotta

Copy link
Copy Markdown
Contributor Author

I've simplified this TLS implementation quite a bit. It's still using the Prometheus exporter-toolkit but now in the intended way. There is no more creating any temp files (this was the concern brought up in this comment: #7255 (comment)).

@jameshartig @miekg @SuperQ @greenpau @Tantalor93 Can I get a review on this? What are the next steps needed to get this merged?

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from 2b84a94 to 8c5515c Compare October 27, 2025 11:15

@kashifest kashifest left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, it looks ok. It would nice to have someone with review rights have a look into this.

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch 3 times, most recently from 1fbe753 to cd49281 Compare November 11, 2025 08:09
@kashifest

Copy link
Copy Markdown

@johnbelamaric would you please take a look at this one?

@yongtang

yongtang commented Dec 4, 2025

Copy link
Copy Markdown
Member

@SuperQ any additional feedback on the updated PR?

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from cd49281 to 5c93f42 Compare December 8, 2025 12:10
@kashifest

kashifest commented Jan 12, 2026

Copy link
Copy Markdown

@johnbelamaric @SuperQ Can we please decide how to proceed with this PR?

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from 5c93f42 to 839f096 Compare January 13, 2026 14:39
@johnbelamaric

Copy link
Copy Markdown
Member

@peppi-lotta can you rebase?

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from 839f096 to ef076fe Compare March 9, 2026 06:24
@peppi-lotta

Copy link
Copy Markdown
Contributor Author

@johnbelamaric I've rebased now :)

Comment thread plugin/metrics/metrics.go Outdated
Comment thread plugin/metrics/metrics.go Outdated
select {
case err := <-startResult:
return err
case <-time.After(200 * time.Millisecond):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. Is there instead a status or something we can check on the control structure?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @johnbelamaric, joining the discussion.

I have been discussing this with @peppi-lotta, and I think there is a way to avoid waiting a predefined time for the service to be ready. I also agree that a timer-based approach can be problematic for various reasons. The ServeTLS method in net/http, which web.Serve ultimately calls, unfortunately does not provide a usable hook for checking service status. I don't think building an additional health-check mechanism that sends probe messages is a feasible solution either.

However, we know that ServeTLS essentially follows two paths: it either exits with an error or it calls Accept. ServeTLS creates a new listener but uses the listener argument to accept connections. Therefore, we can wrap the listener passed to web.Serve like this, e.g.:

// startupListener wraps a net.Listener to detect when Accept() is first called
type startupListener struct {
	readyOnce sync.Once
	ready     chan struct{}
	net.Listener
}

func newStartupListener(l net.Listener) *startupListener {
	return &startupListener{
		Listener: l,
		ready:    make(chan struct{}),
	}
}

func (sl *startupListener) Accept() (net.Conn, error) {
	// Signal ready on first Accept() call (server is running)
	sl.readyOnce.Do(func() {
		close(sl.ready)
	})
	return sl.Listener.Accept()
}

func (sl *startupListener) Ready() <-chan struct{} {
	return sl.ready
}

...
	ln, err := reuseport.Listen("tcp", m.Addr)
	m.ln := newStartupListener(ln)
...
	err := web.Serve(m.ln, server, webConfig, logger)
	// Wait for server to be ready (first Accept() called) or error
	select {
	case <-startupListener.Ready():
		log.Println("Server is ready and accepting connections")
	case err := <-serverErr:
		log.Fatalf("Server failed to start: %v", err)
	}

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit implementing the suggested approach. Seems to work well :)

Comment thread plugin/metrics/metrics.go Outdated
WebConfigFile: &m.tlsConfigPath,
}

logger := slog.New(slog.NewTextHandler(os.Stderr, nil))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we need this particular object for the prometheus exporter? How does this interplay with our existing logging?

@peppi-lotta peppi-lotta Mar 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the prometheus web toolkit requires a slog.Logger type object to be passed to it. I think it interplays nicely. There is not a lot of logging when the the TLS setup is successful as can be seen from bellow.

$ kubectl logs -n kube-system          coredns-67ddfdcbd-6lhfq 
maxprocs: Leaving GOMAXPROCS=22: CPU quota undefined
time=2026-03-11T06:43:05.148Z level=INFO msg="Listening on" address=[::]:9153
time=2026-03-11T06:43:05.148Z level=INFO msg="TLS is enabled." http2=true address=[::]:9153
.:53
[INFO] plugin/reload: Running configuration SHA512 = 815c08e687bdc067019d64a774daf517a0fd3bb19531a7596741f57e78e3abb3dade4ae3bbe4898839f7bab7622c2bb91b27ce2f2c92d52b53e992dcd26ecbad
CoreDNS-1.14.2
linux/amd64, go1.25.5, 

@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from ef076fe to b10bd56 Compare March 11, 2026 06:45
…dpoint

Signed-off-by: peppi-lotta <peppi-lotta.saari@est.tech>
@peppi-lotta peppi-lotta force-pushed the peppi-lotta/metrics-tls-support branch from b10bd56 to 8ee9ace Compare March 11, 2026 06:50
Signed-off-by: peppi-lotta <peppi-lotta.saari@est.tech>
@johnbelamaric johnbelamaric merged commit 7ff001d into coredns:master Mar 12, 2026
11 checks passed
@johnbelamaric

Copy link
Copy Markdown
Member

thanks!

yongtang pushed a commit to yongtang/coredns that referenced this pull request Mar 18, 2026
* Use exporter-toolkit to enable optional TLS encryption on /metrics endpoint

Signed-off-by: peppi-lotta <peppi-lotta.saari@est.tech>

* Implement startup listener to signal server readiness

Signed-off-by: peppi-lotta <peppi-lotta.saari@est.tech>

---------

Signed-off-by: peppi-lotta <peppi-lotta.saari@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants