Skip to content

envoy: Fix xds server npds listeners accounting#44830

Merged
jrajahalme merged 3 commits intocilium:mainfrom
fristonio:pr/fristonio/fix/npds-listeners-accounting
Mar 26, 2026
Merged

envoy: Fix xds server npds listeners accounting#44830
jrajahalme merged 3 commits intocilium:mainfrom
fristonio:pr/fristonio/fix/npds-listeners-accounting

Conversation

@fristonio
Copy link
Copy Markdown
Member

@fristonio fristonio commented Mar 16, 2026

See commit message for details.

This affects v1.17 where if a toFQDNs rule is compounded with a CEC listener(and no other proxy listeners), the DNS response to client would skip waiting for Envoy network policy configuration. This leads to a race, where envoy
network policy update happens after client resolves the DNS and makes the request causing connection failure. For cilium version >= v1.18 we preallocate identities for FQDN selectors, so envoy doesn't need network policy update on DNS lookup.

@fristonio fristonio added area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 16, 2026
@github-actions github-actions Bot added the cilium-cli This PR contains changes related with cilium-cli label Mar 16, 2026
@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch 2 times, most recently from cb8a6a6 to bc0513c Compare March 17, 2026 00:43
@fristonio
Copy link
Copy Markdown
Member Author

/ci-l7

@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch from bc0513c to 26c0b07 Compare March 17, 2026 03:10
@fristonio
Copy link
Copy Markdown
Member Author

/ci-l7

3 similar comments
@cilium-ariane
Copy link
Copy Markdown

cilium-ariane Bot commented Mar 17, 2026

/ci-l7

@fristonio
Copy link
Copy Markdown
Member Author

/ci-l7

@cilium-ariane
Copy link
Copy Markdown

cilium-ariane Bot commented Mar 17, 2026

/ci-l7

@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch from 80441f1 to 37d9247 Compare March 17, 2026 05:34
@fristonio
Copy link
Copy Markdown
Member Author

/ci-l7

1 similar comment
@cilium-ariane
Copy link
Copy Markdown

cilium-ariane Bot commented Mar 17, 2026

/ci-l7

@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch from 37d9247 to 44c8201 Compare March 17, 2026 06:09
@fristonio
Copy link
Copy Markdown
Member Author

/ci-l7

1 similar comment
@cilium-ariane
Copy link
Copy Markdown

cilium-ariane Bot commented Mar 17, 2026

/ci-l7

@fristonio
Copy link
Copy Markdown
Member Author

fristonio commented Mar 17, 2026

/test
Edit: EKS Cluster creation failures for multiple workflows, retriggered.

@fristonio fristonio marked this pull request as ready for review March 17, 2026 16:51
@fristonio fristonio requested review from a team as code owners March 17, 2026 16:51
@fristonio fristonio requested review from christarazi, derailed and nezdolik and removed request for christarazi March 17, 2026 16:51
@fristonio fristonio requested a review from jrajahalme March 17, 2026 16:51
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small fix needed to the CancelCompletions call

Comment thread pkg/envoy/xds_server.go Outdated
Comment thread pkg/envoy/xds_server.go
Comment thread pkg/envoy/xds_server.go Outdated
@nezdolik nezdolik removed their request for review March 20, 2026 15:11
@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch from 44c8201 to 922c3e2 Compare March 22, 2026 22:26
When a network policy is updated, the decision to wait on ACK
from envoy depends on weather or not there is any listener filter that
requires NPDS(presence of `cilium.bpf_metadata` filter). Before this
change, only internal proxy listeners were accounted. This caused
NetworkPolicy update wait to be skipped if **only** the listeners configured
through CiliumEnvoyConfig were present.

Given that all CEC listeners get injected with `cilium.bpf_metadata`
filter for L3 policy enforcement, incorrect accounting leads to
a race condition where BPF policy map redirects can be configured before
envoy receives the up-to-date policy information. This can cause traffic
interruption due to policy denials in envoy.

This commit fixes this issue by keeping track of `cilium.bpf_metadata`
filter across all envoy listeners to determine the need of NPDS
subscription.

Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
This commit adds a new connectivity test to validate toFQDN rules
compounded with a CEC listener for L7 processing.

The new test creates a CNP that redirects http traffic destined for
external target to envoy, where a listener configured through CCEC
proxies the request to the requested origin.

Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
This commit downgrades the severity of "Missing proxy redirect" message
in incremental policy update path to INFO for redirects with listener
references.

Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
@fristonio fristonio force-pushed the pr/fristonio/fix/npds-listeners-accounting branch from 922c3e2 to 6ae9e3d Compare March 22, 2026 23:16
@fristonio
Copy link
Copy Markdown
Member Author

/test

1 similar comment
@cilium-ariane
Copy link
Copy Markdown

cilium-ariane Bot commented Mar 22, 2026

/test

@fristonio fristonio requested a review from jrajahalme March 23, 2026 00:43
Copy link
Copy Markdown
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fristonio Nice work!

Comment thread pkg/envoy/xds_server.go
@jrajahalme jrajahalme requested a review from christarazi March 25, 2026 17:11
@jrajahalme
Copy link
Copy Markdown
Member

@christarazi This needs a ci-structure review, could you help out?

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 25, 2026
@jrajahalme jrajahalme enabled auto-merge March 26, 2026 09:13
@jrajahalme jrajahalme added this pull request to the merge queue Mar 26, 2026
Merged via the queue into cilium:main with commit b5bdf10 Mar 26, 2026
83 checks passed
@viktor-kurchenko viktor-kurchenko added the backport/author The backport will be carried out by the author of the PR. label Mar 30, 2026
@viktor-kurchenko
Copy link
Copy Markdown
Contributor

@fristonio I've labeled the PR with backport/author, because it depends on #44597 which is marked as backport/author.

Is it ok?

@github-actions github-actions Bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. labels Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport/author The backport will be carried out by the author of the PR. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. cilium-cli This PR contains changes related with cilium-cli needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants