cilium-cli: Skip external IP NodePort validation on GKE#44014
cilium-cli: Skip external IP NodePort validation on GKE#44014aanm merged 1 commit intocilium:mainfrom
Conversation
|
(note that this needs a rebase) |
ac34763 to
f92de4d
Compare
|
Rebased on latest main (e382a33). Resolved merge conflict by keeping both the new IPv4/IPv6 family check and the GKE external IP skip logic. |
PR cilium#40812 added validation for NodePorts on all node IP addresses (internal and external) to fix dual-stack clusters. However, on GKE, external IPs are not reachable from inside the cluster due to firewall restrictions. The connectivity tests already skip external IPs on GKE, but the upfront validation added by cilium#40812 did not, causing timeouts during test setup. This commit adds the same GKE-aware external IP skip logic to the NodePort validation phase, making it consistent with the actual test behavior. Fixes: cilium#43749 See-also: cilium#39793 Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
f92de4d to
f1a5a64
Compare
|
/ci-gke |
|
/test |
|
The linked issue says the problem does not occur if KPR is enabled. Do we know why? This PR feels more like a workaround without providing a root cause analysis. |
just a hunch - but with KPR enabled, we most likely also have SocketLB attached to pods. Which then wildcard-matches all accesses to NodePort services, and applies service translation at the source (without the request ever actually hitting the remote node). And thus the remote node's external IP doesn't need to be accessible from within the cluster. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [aqua:cilium/cilium-cli](https://redirect.github.com/cilium/cilium-cli) | patch | `0.19.0` → `0.19.2` | --- ### Release Notes <details> <summary>cilium/cilium-cli (aqua:cilium/cilium-cli)</summary> ### [`v0.19.2`](https://redirect.github.com/cilium/cilium-cli/compare/v0.19.1...v0.19.2) [Compare Source](https://redirect.github.com/cilium/cilium-cli/compare/v0.19.1...v0.19.2) ### [`v0.19.1`](https://redirect.github.com/cilium/cilium-cli/releases/tag/v0.19.1) [Compare Source](https://redirect.github.com/cilium/cilium-cli/compare/v0.19.0...v0.19.1) ## Summary of Changes **Minor Changes:** - cli: clustermesh: use ca bundle to connect clusters ([cilium/cilium#42833](https://redirect.github.com/cilium/cilium/issues/42833), [@​MrFreezeex](https://redirect.github.com/MrFreezeex)) **Bugfixes:** - cilium-cli: Fix NodePort deployment check in dual-stack clusters ([cilium/cilium#43888](https://redirect.github.com/cilium/cilium/issues/43888), [@​gandro](https://redirect.github.com/gandro)) - Fix GKE conformance test NodePort timeouts by skipping unreachable external IP validation on GKE ([cilium/cilium#44014](https://redirect.github.com/cilium/cilium/issues/44014), [@​pillai-ashwin](https://redirect.github.com/pillai-ashwin)) **CI Changes:** - cli: Relax warning exclusion for "unable to find key in local cache" ([cilium/cilium#44149](https://redirect.github.com/cilium/cilium/issues/44149), [@​brb](https://redirect.github.com/brb)) **Misc Changes:** - chore(deps): update all-dependencies (main) ([cilium/cilium#43700](https://redirect.github.com/cilium/cilium/issues/43700), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#43824](https://redirect.github.com/cilium/cilium/issues/43824), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#43965](https://redirect.github.com/cilium/cilium/issues/43965), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#44090](https://redirect.github.com/cilium/cilium/issues/44090), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update all-dependencies (main) ([cilium/cilium#44235](https://redirect.github.com/cilium/cilium/issues/44235), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update base-images (main) ([cilium/cilium#43827](https://redirect.github.com/cilium/cilium/issues/43827), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update base-images (main) ([cilium/cilium#43969](https://redirect.github.com/cilium/cilium/issues/43969), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - chore(deps): update base-images (main) ([cilium/cilium#44239](https://redirect.github.com/cilium/cilium/issues/44239), [@​cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot]) - cilium-cli: Fix container name in connectivity test logs ([cilium/cilium#44076](https://redirect.github.com/cilium/cilium/issues/44076), [@​HadrienPatte](https://redirect.github.com/HadrienPatte)) - docs: fix typos in comments ([cilium/cilium#43821](https://redirect.github.com/cilium/cilium/issues/43821), [@​NAM-MAN](https://redirect.github.com/NAM-MAN)) - feat(cilium-cli): Add -r(estart) parameter to cilium upgrade ([cilium/cilium#43722](https://redirect.github.com/cilium/cilium/issues/43722), [@​alagoutte](https://redirect.github.com/alagoutte)) - Introduce end-to-end tests for Cilium's ZTunnel integration. ([cilium/cilium#43166](https://redirect.github.com/cilium/cilium/issues/43166), [@​ldelossa](https://redirect.github.com/ldelossa)) - Replace Index{,Byte} with Cut,Contains ([cilium/cilium#43708](https://redirect.github.com/cilium/cilium/issues/43708), [@​joestringer](https://redirect.github.com/joestringer)) - sysdump: Use label selectors for Hubble UI/Relay deployment collection ([cilium/cilium#44227](https://redirect.github.com/cilium/cilium/issues/44227), [@​darox](https://redirect.github.com/darox)) - chore(deps): update dependency cilium/cilium to v1.18.6 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3168](https://redirect.github.com/cilium/cilium-cli/pull/3168) - Update stable release to v0.19.0 by [@​tklauser](https://redirect.github.com/tklauser) in [#​3169](https://redirect.github.com/cilium/cilium-cli/pull/3169) - chore(deps): update go to v1.25.6 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3170](https://redirect.github.com/cilium/cilium-cli/pull/3170) - chore(deps): update docker.io/library/golang:1.25.6 docker digest to [`ce63a16`](https://redirect.github.com/cilium/cilium-cli/commit/ce63a16) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3172](https://redirect.github.com/cilium/cilium-cli/pull/3172) - chore(deps): update actions/checkout action to v6.0.2 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3171](https://redirect.github.com/cilium/cilium-cli/pull/3171) - ci: Harden the image build process by [@​ferozsalam](https://redirect.github.com/ferozsalam) in [#​3174](https://redirect.github.com/cilium/cilium-cli/pull/3174) - chore(deps): update gcr.io/distroless/static:latest docker digest to [`972618c`](https://redirect.github.com/cilium/cilium-cli/commit/972618c) by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3176](https://redirect.github.com/cilium/cilium-cli/pull/3176) - chore(deps): update all github action dependencies by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3175](https://redirect.github.com/cilium/cilium-cli/pull/3175) - chore(deps): update go to v1.25.7 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3178](https://redirect.github.com/cilium/cilium-cli/pull/3178) - chore(deps): update golangci/golangci-lint docker tag to v2.9.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3179](https://redirect.github.com/cilium/cilium-cli/pull/3179) - chore(deps): update go to v1.26.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3181](https://redirect.github.com/cilium/cilium-cli/pull/3181) - chore(deps): update docker/build-push-action action to v6.19.0 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3180](https://redirect.github.com/cilium/cilium-cli/pull/3180) - chore(deps): update docker/build-push-action action to v6.19.2 by [@​renovate](https://redirect.github.com/renovate)\[bot] in [#​3183](https://redirect.github.com/cilium/cilium-cli/pull/3183) - Prepare for v0.19.1 release by [@​tklauser](https://redirect.github.com/tklauser) in [#​3184](https://redirect.github.com/cilium/cilium-cli/pull/3184) #### New Contributors - [@​ferozsalam](https://redirect.github.com/ferozsalam) made their first contribution in [#​3174](https://redirect.github.com/cilium/cilium-cli/pull/3174) **Full Changelog**: <cilium/cilium-cli@v0.19.0...v0.19.1> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/lambchop4prez/network). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My44LjUiLCJ1cGRhdGVkSW5WZXIiOiI0My4zNi4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9naXRodWItcmVsZWFzZXMiLCJ0eXBlL3BhdGNoIl19-->
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.
Fixes: <commit-id>tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Summary
Fixes #43749 by skipping external IP NodePort validation on GKE, where external IPs are not reachable from inside the cluster.
Details
PR #40812 enhanced NodePort validation to check all node IP addresses (including external IPs) to fix dual-stack clusters. However, on GKE, external IPs are not reachable from within the cluster due to firewall restrictions.
The actual connectivity tests already handle this correctly by skipping external IPs on GKE (
service.go:252-257), but the upfront validation code did not have this check, causing test failures.Changes
deployment.go(lines 2742-2749)service.go:252-257,deployment.go:1781-1786)Root Cause Analysis
Current Code Structure (After Rebase)
The NodePort validation now has three sequential checks to filter addresses:
What Broke
In PR #40812, the upfront NodePort validation was enhanced from:
To:
This correctly fixed dual-stack IPv6 validation but inadvertently started validating external IPs that are unreachable on GKE due to firewall restrictions.
The Fix
The actual connectivity tests already skip external IPs on GKE:
This same pattern exists in another part of
deployment.go:This PR adds the same check to the NodePort validation phase for consistency:
Why External IPs Are Unreachable on GKE
On GKE:
10.128.0.2): ✅ Reachable from inside the cluster34.106.17.216): ❌ Blocked by GKE firewall, not reachable from inside the clusterThe error logs from issue #43749 show timeouts on external IPs:
Testing
This fix should resolve failures in:
no-tunnelconfigtunnelconfigWithout affecting:
tunnel-ingress-controllerconfig (KPR)References
Fixes: #43749
See-also: #39793 (original dual-stack issue fixed by #40812)
See-also: #40812 (PR that introduced the validation enhancement)