plugin/forward: add max_age option to enforce an absolute connection lifetime#7903
Merged
Conversation
50fcafd to
7c5aaf4
Compare
thevilledev
approved these changes
Mar 6, 2026
thevilledev
left a comment
Collaborator
There was a problem hiding this comment.
Good catch @cangming! I think this is the good approach. Just one minor comment, other than that LGTM.
Introduce a max_age setting on Transport that closes connections based on creation time, independent of idle-timeout (expire). Background: PR coredns#7790 changed the connection pool from LIFO to FIFO for source-port diversity. Under FIFO, every connection is cycled through the pool and its used timestamp is refreshed continuously. When request rate is high enough that pool_size / request_rate < expire, no connection ever becomes idle and expire never fires. This prevents CoreDNS from opening new connections to upstreams that scale out (e.g. new Kubernetes pods behind a ClusterIP service with conntrack pinning). max_age addresses this by enforcing an absolute upper bound on connection lifetime regardless of activity: - persistConn gains a created field set at dial time. - Transport gains maxAge (default 0 = unlimited, preserving existing behaviour). - Dial(): rejects cached connections whose creation age exceeds max_age. - cleanup(): when maxAge > 0, uses a linear scan over both idle-timeout and max-age predicates; when maxAge == 0, preserves the original binary-search path on used time (sorted by FIFO insertion order). - Both hot paths pre-compute the deadline outside any inner loop to avoid repeated time.Now() calls. Tests added: - TestMaxAgeExpireByCreation: connection with old created but fresh used must be rejected even though idle-timeout would pass. - TestMaxAgeFIFORotation: three FIFO-rotated connections (old created, fresh used) must all be rejected, confirming that continuous rotation cannot prevent max-age expiry. Signed-off-by: cangming <cangming@cangming.app>
Expose Transport.SetMaxAge through the forward plugin so operators can
set an absolute upper bound on connection lifetime via the Corefile.
Usage:
forward . 1.2.3.4 {
max_age 30s
}
Default is 0 (unlimited), which preserves existing behaviour.
A positive value causes connections older than max_age to be closed and
re-dialled on the next request, ensuring CoreDNS reconnects to newly
scaled-out upstream pods even under sustained load where the idle
timeout (expire) would never fire.
If max_age is set, it must not be less than expire; the parser rejects
this combination at startup with a clear error message.
Signed-off-by: cangming <cangming@cangming.app>
7c5aaf4 to
ddc66b4
Compare
yongtang
approved these changes
Mar 9, 2026
yongtang
pushed a commit
to yongtang/coredns
that referenced
this pull request
Mar 18, 2026
…lifetime (coredns#7903) * plugin/pkg/proxy: add max_age for per-connection lifetime cap Introduce a max_age setting on Transport that closes connections based on creation time, independent of idle-timeout (expire). Background: PR coredns#7790 changed the connection pool from LIFO to FIFO for source-port diversity. Under FIFO, every connection is cycled through the pool and its used timestamp is refreshed continuously. When request rate is high enough that pool_size / request_rate < expire, no connection ever becomes idle and expire never fires. This prevents CoreDNS from opening new connections to upstreams that scale out (e.g. new Kubernetes pods behind a ClusterIP service with conntrack pinning). max_age addresses this by enforcing an absolute upper bound on connection lifetime regardless of activity: - persistConn gains a created field set at dial time. - Transport gains maxAge (default 0 = unlimited, preserving existing behaviour). - Dial(): rejects cached connections whose creation age exceeds max_age. - cleanup(): when maxAge > 0, uses a linear scan over both idle-timeout and max-age predicates; when maxAge == 0, preserves the original binary-search path on used time (sorted by FIFO insertion order). - Both hot paths pre-compute the deadline outside any inner loop to avoid repeated time.Now() calls. Tests added: - TestMaxAgeExpireByCreation: connection with old created but fresh used must be rejected even though idle-timeout would pass. - TestMaxAgeFIFORotation: three FIFO-rotated connections (old created, fresh used) must all be rejected, confirming that continuous rotation cannot prevent max-age expiry. Signed-off-by: cangming <cangming@cangming.app> * plugin/forward: add max_age option Expose Transport.SetMaxAge through the forward plugin so operators can set an absolute upper bound on connection lifetime via the Corefile. Usage: forward . 1.2.3.4 { max_age 30s } Default is 0 (unlimited), which preserves existing behaviour. A positive value causes connections older than max_age to be closed and re-dialled on the next request, ensuring CoreDNS reconnects to newly scaled-out upstream pods even under sustained load where the idle timeout (expire) would never fire. If max_age is set, it must not be less than expire; the parser rejects this combination at startup with a clear error message. Signed-off-by: cangming <cangming@cangming.app> --------- Signed-off-by: cangming <cangming@cangming.app>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Why is this pull request needed and what does it do?
Background
PR #7790 changed the connection pool from LIFO to FIFO for source-port diversity. Under FIFO every connection is cycled through the pool on each request, continuously refreshing its
usedtimestamp. When request rate is high enough thatpool_size / request_rate < expire, no connection ever becomes idle andexpire(which is based on last-used time) never fires.This creates a regression in Kubernetes environments using ClusterIP services: because of conntrack pinning, each existing connection is pinned to one specific backend pod. After a transient network disruption or a scale-out event, affected pods receive no new traffic from CoreDNS until old connections are closed and re-dialled. Before #7790 (LIFO), the
expireidle-timeout would eventually clear old connections during any quiet period. Under FIFO at sustained load, no such recovery occurs.The following graph captures this regression in production. CoreDNS 1.14.1 forwards to a Kubernetes ClusterIP service backed by multiple bind9 pods. Around 16:50, one upstream pod experienced a brief network disruption. The blue line (that pod's query count) drops sharply to near zero. Despite the pod recovering quickly, the traffic imbalance persisted for over 55 minutes — CoreDNS never re-opened connections to that pod because the FIFO-rotated connection pool kept refreshing
usedtimestamps, preventingexpirefrom ever firing.Bind Query Count — Each line represents query traffic to one upstream bind9 pod behind a ClusterIP service. At 16:50, the blue pod suffered a brief network disruption. While the pod recovered within seconds, CoreDNS continued directing negligible traffic to it for the remainder of the observation window (~55 min), demonstrating that
expirealone cannot restore balance under sustained FIFO rotation.What this PR does
Adds a
max_ageoption to theforwardplugin (default0= unlimited, preserving existing behaviour) that closes connections based on creation time rather than idle time. When configured, connections older thanmax_ageare rejected from the cache and re-dialled, ensuring traffic rebalances to recovered or newly scaled-out upstreams regardless of request rate.Implementation details:
persistConngains acreatedfield, set once at dial time.TransportgainsmaxAge(zero value = disabled) andSetMaxAge().Dial(): checkspc.createdagainst the max-age deadline before returning a cached connection. Whenmax_agefires,cleanup()(which runs everydefaultExpire) typically clears stale connections beforeDial()encounters them, keeping the hot path O(1); the O(n) scan inDial()is an existing characteristic of the FIFO pool and is not worsened by this change.cleanup(): whenmaxAge > 0, a linear scan evaluates both the idle-timeout (used) and max-age (created) predicates; whenmaxAge == 0the original O(log n) binary-search path is fully preserved.time.Now().Add(-maxAge)) outside inner loops to avoid repeatedtime.Now()calls.max_age < expireat startup with a clear error message, since a max-age shorter than the idle-timeout would makeexpireunreachable under active load.TestMaxAgeExpireByCreation(freshused, oldcreatedmust be rejected) andTestMaxAgeFIFORotation(FIFO-rotated pool must not bypass max-age).Example Corefile
2. Which issues (if any) are related?
Related to #6804, which proposes a similar
max_connection_ageoption. Key differences from that PR:max_connection_agemax_age(consistent withexpirestyle)expirehandlingexpire, introducesidle_timeoutas replacementexpirefully preserved, zero changes to its semanticsmax_agecleanup()scanmax_ageis not setmax_age < expirerejected at parse timeThe main motivation for a separate PR is that #6804's deprecation of
expireis a backward-incompatible argument change, whereas this PR is purely additive.3. Which documentation changes (if any) need to be made?
plugin/forward/README.mdshould document the new option. Suggested additions:In the syntax block:
In the description list:
4. Does this introduce a backward incompatible change or deprecation?
No. The default value is
0(unlimited), identical to the current behaviour. All existing configurations continue to work without any change.expiresemantics are completely untouched.