Skip to content

gateway-api: remove backends with missing or invalid TLS policies#44743

Merged
julianwiedmann merged 1 commit intocilium:mainfrom
lconnery:pr/gateway-api-backendtlspolicy
Mar 16, 2026
Merged

gateway-api: remove backends with missing or invalid TLS policies#44743
julianwiedmann merged 1 commit intocilium:mainfrom
lconnery:pr/gateway-api-backendtlspolicy

Conversation

@lconnery
Copy link
Copy Markdown
Contributor

@lconnery lconnery commented Mar 12, 2026

This change introduces a small change to how the Cilium Operator's Gateway API reconciler handles a BackendTLSPolicy that has a missing or invalid TLS certificate. In these cases, the backends are immediately removed from the Cilium Envoy Configuration.

Previously, if a backend required HTTPS but the policy wasn't in the cache yet (due to a resource creation race condition), the reconciler would generate a plain-text cluster. Envoy would then hold live traffic in a "warming" state waiting for an SDS secret for 30 seconds (initial_fetch_timeout), routing plain-text to the secure backend and causing unexpected HTTP 400 errors.

I have run TestConformance/BackendTLSPolicyInvalidCACertificateRef several hundred times locally to confirm this does address the flakiness seen in that test.

Fixes: #44043

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 12, 2026
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 12, 2026
@lconnery lconnery force-pushed the pr/gateway-api-backendtlspolicy branch 2 times, most recently from 827a96d to 8ea6cba Compare March 12, 2026 06:14
@lconnery lconnery marked this pull request as ready for review March 12, 2026 19:18
@lconnery lconnery requested a review from a team as a code owner March 12, 2026 19:18
@lconnery lconnery requested a review from youngnick March 12, 2026 19:18
@lconnery lconnery changed the title [WIP] gateway-api: remove backends with missing or invalid TLS policies gateway-api: remove backends with missing or invalid TLS policies Mar 12, 2026
Copy link
Copy Markdown
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @lconnery, LGTM.

@youngnick youngnick added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 13, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 13, 2026
@youngnick
Copy link
Copy Markdown
Contributor

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 13, 2026
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit b93aa23 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 14, 2026
This change introduces a small change to how the Cilium Operator's
Gateway API reconciler handles a BackendTLSPolicy that has a missing or
invalid TLS certificate. In these cases, the backends are immediately
removed from the Cilium Envoy Configuration.

Previously, if a backend required HTTPS but the policy wasn't in the
cache yet (due to a resource creation race condition), the reconciler
would generate a plain-text cluster. Envoy would then hold live
traffic in a "warming" state waiting for an SDS secret for 30 seconds
(`initial_fetch_timeout`), routing plain-text to the secure backend
and causing unexpected HTTP 400 errors.

Signed-off-by: Liam Connery <lconnery@google.com>
@lconnery lconnery force-pushed the pr/gateway-api-backendtlspolicy branch from b93aa23 to 36ebfd5 Compare March 16, 2026 00:50
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 16, 2026
@youngnick
Copy link
Copy Markdown
Contributor

/test

@julianwiedmann
Copy link
Copy Markdown
Member

Thank you! Please consider whether this also needs backports.

Merged via the queue into cilium:main with commit 9c4761b Mar 16, 2026
79 checks passed
@lconnery
Copy link
Copy Markdown
Contributor Author

Thank you! Please consider whether this also needs backports.

@youngnick it looked like some of the Gateway reconciler logic is new as of the last couple of months. Do you know if this needs to be back ported to v1.19?

Thanks again for all the help!

@youngnick
Copy link
Copy Markdown
Contributor

BackendTLSPolicy support didn't make it to v1.19, so we're all good.

@julianwiedmann julianwiedmann removed the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 26, 2026
@julianwiedmann julianwiedmann added the release-note/misc This PR makes changes that have no direct user impact. label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/k8s-gateway-api kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: ci-gateway-api: TestConformance/BackendTLSPolicyInvalidCACertificateRef/BackendTLSPolicy_nonexistent-ca-certificate-ref

3 participants