Skip to content

cilium-cli: Validate NodePort readiness on all node IP addresses#40812

Merged
joestringer merged 1 commit intocilium:mainfrom
pillai-ashwin:fix/ci-flake-nodeport-ipv6-timeout-39793
Jan 5, 2026
Merged

cilium-cli: Validate NodePort readiness on all node IP addresses#40812
joestringer merged 1 commit intocilium:mainfrom
pillai-ashwin:fix/ci-flake-nodeport-ipv6-timeout-39793

Conversation

@pillai-ashwin
Copy link
Copy Markdown
Member

@pillai-ashwin pillai-ashwin commented Jul 30, 2025

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer's Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Description

Fixes intermittent pod-to-local-nodeport connectivity test failures (exit code 28 - connection timeout) in dual-stack clusters.

Root Cause

The upfront NodePort readiness check in deployment.go only validated HostIP (one IP per node, typically IPv4), but tests iterate through all addresses in node.Status.Addresses[] which includes both IPv4 and IPv6 in dual-stack clusters.

This meant IPv6 NodePorts were never validated for readiness, causing timeouts when tests tried them.

Fix

Enhance the upfront NodePort validation to check all node IP addresses (NodeInternalIP and NodeExternalIP types) instead of just HostIP. This ensures all addresses that will be tested are validated during setup.

Fixes: #39793

Fix intermittent NodePort connectivity test timeouts in dual-stack clusters by validating NodePort readiness on all node IP addresses during test setup.

@pillai-ashwin pillai-ashwin requested review from a team as code owners July 30, 2025 08:19
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 30, 2025
@pillai-ashwin pillai-ashwin requested a review from derailed July 30, 2025 08:19
@github-actions github-actions bot added cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary labels Jul 30, 2025
Copy link
Copy Markdown
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Insufficient timeouts: 2-second connect timeout was too aggressive for IPv6 NodePort connections
  2. IPv6 networking delays: IPv6 connections can take longer to establish than IPv4

I am a bit skeptical about these findings. Establishing a connection should not take 2 seconds, especially when all of it is effective on the same node (no internet involved). If they do, then it seems to me that something is going wrong (packets being dropped, causing TCP level retires, or something related) and increasing the timeout may just mask that.

I also have never heard of IPv6 being slower than IPv4 to establish connections. I don't think we should accept that statement without something to back that up.

Missing retry logic: Single-attempt connections failed on transient network issues

This is essentially saying that we expect Cilium, the thing we are testing to fail. I don't think that is right. These sorts of tests are exactly here to catch issues in Cilium. If the first connection we make always fails or frequently fails then we should investigate why and not plaster over it with retries. We want all connections to work, not just at least one out of three.

Premature testing: Tests started before NodePorts were fully ready

This sounds more like the right direction. I think this is more likely to be the real fix.

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 4538a18 to 4527a2f Compare August 8, 2025 00:32
@pillai-ashwin pillai-ashwin changed the title cilium-cli: Fix IPv6 NodePort connectivity test timeouts cilium-cli: NodePort tests wait for readiness; drop IPv6-specific timeouts Aug 8, 2025
@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch 2 times, most recently from a2d1e38 to 083d5d1 Compare August 8, 2025 00:55
@pchaigno pchaigno added the release-note/ci This PR makes changes to the CI. label Aug 15, 2025
@pchaigno pchaigno removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 15, 2025
@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 083d5d1 to d60f535 Compare August 26, 2025 21:01
@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from d60f535 to ab2dc2a Compare September 3, 2025 21:34
@pillai-ashwin
Copy link
Copy Markdown
Member Author

/test

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from ab2dc2a to 4ee5bec Compare September 4, 2025 16:59
Copy link
Copy Markdown
Contributor

@viktor-kurchenko viktor-kurchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 4ee5bec to c0ea472 Compare September 6, 2025 19:49
@pillai-ashwin
Copy link
Copy Markdown
Member Author

@viktor-kurchenko are we good to go ahead with merge?

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from c0ea472 to 707c054 Compare September 12, 2025 19:55
@viktor-kurchenko
Copy link
Copy Markdown
Contributor

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 13, 2025
@viktor-kurchenko
Copy link
Copy Markdown
Contributor

@viktor-kurchenko are we good to go ahead with merge?

Yes, it's marked as ready-to-merge now.
Thanks for the contribution!

@joestringer
Copy link
Copy Markdown
Member

I see there's been some updates, but the PR title and description (release note) both talk about dropping IPv6-specific timeouts and I don't see that in the code change.

Related, in the commit message:

This removes the need for per-test readiness checks.

If the need is removed, does that mean there should be another change to remove them? Does it make sense to track that with an issue and follow up to remove those checks or were you planning to look into that as part of this PR?

@pillai-ashwin pillai-ashwin changed the title cilium-cli: NodePort tests wait for readiness; drop IPv6-specific timeouts cilium-cli: Validate NodePort readiness on all node IP addresses Dec 9, 2025
@pillai-ashwin
Copy link
Copy Markdown
Member Author

pillai-ashwin commented Dec 9, 2025

@joestringer

I see there's been some updates, but the PR title and description (release note) both talk about dropping IPv6-specific timeouts and I don't see that in the code change.

Updated the pr title, description, commit message and release note. The current approach is simpler:

The fix enhances the existing upfront NodePort validation to check all node IP addresses (NodeInternalIP and NodeExternalIP) instead of just HostIP. This covers IPv6 addresses in dual-stack clusters that were previously unchecked and were causing the intermittent timeouts.

If the need is removed, does that mean there should be another change to remove them? Does it make sense to track that with an issue and follow up to remove those checks or were you planning to look into that as part of this PR?

In an earlier version of this PR I added per-test WaitForNodePorts() calls before each curl, but I dropped that approach to favor fixing the upfront check directly. There are no per-test checks in main to remove, so no follow-up needed.

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 2b00fb3 to 2c0b93c Compare December 9, 2025 04:57
@viktor-kurchenko
Copy link
Copy Markdown
Contributor

/test

@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 2c0b93c to 6f7772e Compare December 12, 2025 05:05
The upfront NodePort readiness check only validated HostIP (typically one
IP per node), but tests target all addresses in node.Status.Addresses which
can include multiple IPs per node (e.g., IPv4 and IPv6 in dual-stack clusters).

This caused intermittent failures (exit code 28) when tests tried IPv6 NodePorts
that were never validated for readiness.

Fix by enhancing the upfront validation in deployment.go to check NodePort
readiness on all node IP addresses (InternalIP and ExternalIP types) during
the setup phase, ensuring all addresses that will be tested are ready before
tests execute.

Fixes: cilium#39793
Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
@pillai-ashwin pillai-ashwin force-pushed the fix/ci-flake-nodeport-ipv6-timeout-39793 branch from 6f7772e to 9607dac Compare December 13, 2025 02:59
@pillai-ashwin
Copy link
Copy Markdown
Member Author

/test

@joestringer joestringer removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Jan 5, 2026
@joestringer joestringer enabled auto-merge January 5, 2026 01:18
@joestringer joestringer added this pull request to the merge queue Jan 5, 2026
Merged via the queue into cilium:main with commit 0b64c92 Jan 5, 2026
76 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jan 5, 2026
@julianwiedmann
Copy link
Copy Markdown
Member

Looks like this resulted in #43749.

pillai-ashwin added a commit to pillai-ashwin/cilium that referenced this pull request Jan 26, 2026
PR cilium#40812 added validation for NodePorts on all node IP addresses
(internal and external) to fix dual-stack clusters. However, on GKE,
external IPs are not reachable from inside the cluster due to firewall
restrictions.

The connectivity tests already skip external IPs on GKE (service.go),
but the upfront validation added by cilium#40812 did not, causing timeouts
during test setup.

This commit adds the same GKE-aware external IP skip logic to the
NodePort validation phase, making it consistent with the actual test
behavior.

Fixes: cilium#43749
See-also: cilium#39793

Signed-off-by: Ash <pillaiashwin96@gmail.com>
pillai-ashwin added a commit to pillai-ashwin/cilium that referenced this pull request Jan 26, 2026
PR cilium#40812 added validation for NodePorts on all node IP addresses
(internal and external) to fix dual-stack clusters. However, on GKE,
external IPs are not reachable from inside the cluster due to firewall
restrictions.

The connectivity tests already skip external IPs on GKE,
but the upfront validation added by cilium#40812 did not, causing timeouts
during test setup.

This commit adds the same GKE-aware external IP skip logic to the
NodePort validation phase, making it consistent with the actual test
behavior.

Fixes: cilium#43749
See-also: cilium#39793

Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
pillai-ashwin added a commit to pillai-ashwin/cilium that referenced this pull request Jan 26, 2026
PR cilium#40812 added validation for NodePorts on all node IP addresses
(internal and external) to fix dual-stack clusters. However, on GKE,
external IPs are not reachable from inside the cluster due to firewall
restrictions.

The connectivity tests already skip external IPs on GKE,
but the upfront validation added by cilium#40812 did not, causing timeouts
during test setup.

This commit adds the same GKE-aware external IP skip logic to the
NodePort validation phase, making it consistent with the actual test
behavior.

Fixes: cilium#43749
See-also: cilium#39793

Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Jan 27, 2026
PR #40812 added validation for NodePorts on all node IP addresses
(internal and external) to fix dual-stack clusters. However, on GKE,
external IPs are not reachable from inside the cluster due to firewall
restrictions.

The connectivity tests already skip external IPs on GKE,
but the upfront validation added by #40812 did not, causing timeouts
during test setup.

This commit adds the same GKE-aware external IP skip logic to the
NodePort validation phase, making it consistent with the actual test
behavior.

Fixes: #43749
See-also: #39793

Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
zocimek pushed a commit to zocimek/home-ops that referenced this pull request Feb 1, 2026
…0 ) (#584)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[aqua:cilium/cilium-cli](https://redirect.github.com/cilium/cilium-cli)
| minor | `0.18.9` → `0.19.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>cilium/cilium-cli (aqua:cilium/cilium-cli)</summary>

###
[`v0.19.0`](https://redirect.github.com/cilium/cilium-cli/releases/tag/v0.19.0)

[Compare
Source](https://redirect.github.com/cilium/cilium-cli/compare/v0.18.9...v0.19.0)

## Summary of Changes

**CI Changes:**

- Add L7 policy traffic disruption tests
([cilium/cilium#42150](https://redirect.github.com/cilium/cilium/issues/42150),
[@&#8203;fristonio](https://redirect.github.com/fristonio))
- Cilium-cli SNI connectivity tests now retry expected successful
operations to recover from failures due to external upstream issues.
([cilium/cilium#42980](https://redirect.github.com/cilium/cilium/issues/42980),
[@&#8203;jrajahalme](https://redirect.github.com/jrajahalme))
- cli: connectivity: fix typo in L7 LB tests
([cilium/cilium#43610](https://redirect.github.com/cilium/cilium/issues/43610),
[@&#8203;julianwiedmann](https://redirect.github.com/julianwiedmann))
- Fix intermittent NodePort connectivity test timeouts in dual-stack
clusters by validating NodePort readiness on all node IP addresses
during test setup.
([cilium/cilium#40812](https://redirect.github.com/cilium/cilium/issues/40812),
[@&#8203;pillai-ashwin](https://redirect.github.com/pillai-ashwin))
- tests: remove identity manager from ignored error messages
([cilium/cilium#42982](https://redirect.github.com/cilium/cilium/issues/42982),
[@&#8203;odinuge](https://redirect.github.com/odinuge))

**Misc Changes:**

- chore(deps): update all-dependencies (main)
([cilium/cilium#43169](https://redirect.github.com/cilium/cilium/issues/43169),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update all-dependencies (main)
([cilium/cilium#43456](https://redirect.github.com/cilium/cilium/issues/43456),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update all-dependencies (main)
([cilium/cilium#43508](https://redirect.github.com/cilium/cilium/issues/43508),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update base-images (main)
([cilium/cilium#43457](https://redirect.github.com/cilium/cilium/issues/43457),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update base-images (main)
([cilium/cilium#43538](https://redirect.github.com/cilium/cilium/issues/43538),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6)
(main)
([cilium/cilium#43303](https://redirect.github.com/cilium/cilium/issues/43303),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- chore(deps): update go to v1.25.5 (main)
([cilium/cilium#43173](https://redirect.github.com/cilium/cilium/issues/43173),
[@&#8203;cilium-renovate](https://redirect.github.com/cilium-renovate)\[bot])
- cilium-cli/connectivity: remove matcher for bpf/init.sh errors
([cilium/cilium#43109](https://redirect.github.com/cilium/cilium/issues/43109),
[@&#8203;tklauser](https://redirect.github.com/tklauser))
- cilium-cli: convert net.IP to netip.Addr
([cilium/cilium#42371](https://redirect.github.com/cilium/cilium/issues/42371),
[@&#8203;phuhung273](https://redirect.github.com/phuhung273))
- cli: Update `network-perf` image ref
([cilium/cilium#43297](https://redirect.github.com/cilium/cilium/issues/43297),
[@&#8203;HadrienPatte](https://redirect.github.com/HadrienPatte))
- chore(deps): update golangci/golangci-lint-action action to v9.2.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3148](https://redirect.github.com/cilium/cilium-cli/pull/3148)
- Update stable release to v0.18.9 by
[@&#8203;michi-covalent](https://redirect.github.com/michi-covalent) in
[#&#8203;3149](https://redirect.github.com/cilium/cilium-cli/pull/3149)
- chore(deps): update golangci/golangci-lint docker tag to v2.7.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3151](https://redirect.github.com/cilium/cilium-cli/pull/3151)
- chore(deps): update go to v1.25.5 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3153](https://redirect.github.com/cilium/cilium-cli/pull/3153)
- ci: clean up disk space in release workflow by
[@&#8203;tklauser](https://redirect.github.com/tklauser) in
[#&#8203;3154](https://redirect.github.com/cilium/cilium-cli/pull/3154)
- chore(deps): update actions/stale action to v10.1.1 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3150](https://redirect.github.com/cilium/cilium-cli/pull/3150)
- chore(deps): update gcr.io/distroless/static:latest docker digest to
[`4b2a093`](https://redirect.github.com/cilium/cilium-cli/commit/4b2a093)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3152](https://redirect.github.com/cilium/cilium-cli/pull/3152)
- chore(deps): update golangci/golangci-lint docker tag to v2.7.2 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3155](https://redirect.github.com/cilium/cilium-cli/pull/3155)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`a22b2e6`](https://redirect.github.com/cilium/cilium-cli/commit/a22b2e6)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3156](https://redirect.github.com/cilium/cilium-cli/pull/3156)
- chore(deps): update actions/upload-artifact action to v6 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3157](https://redirect.github.com/cilium/cilium-cli/pull/3157)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`36b4f45`](https://redirect.github.com/cilium/cilium-cli/commit/36b4f45)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3160](https://redirect.github.com/cilium/cilium-cli/pull/3160)
- chore(deps): update dependency cilium/cilium to v1.18.5 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3159](https://redirect.github.com/cilium/cilium-cli/pull/3159)
- chore(deps): update dependency kubernetes-sigs/kind to v0.31.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3158](https://redirect.github.com/cilium/cilium-cli/pull/3158)
- chore(deps): update docker/setup-buildx-action action to v3.12.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3162](https://redirect.github.com/cilium/cilium-cli/pull/3162)
- chore(deps): update golangci/golangci-lint docker tag to v2.8.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3163](https://redirect.github.com/cilium/cilium-cli/pull/3163)
- chore(deps): update docker.io/library/golang:1.25.5 docker digest to
[`6cc2338`](https://redirect.github.com/cilium/cilium-cli/commit/6cc2338)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3164](https://redirect.github.com/cilium/cilium-cli/pull/3164)
- chore(deps): update gcr.io/distroless/static:latest docker digest to
[`cd64bec`](https://redirect.github.com/cilium/cilium-cli/commit/cd64bec)
by [@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3165](https://redirect.github.com/cilium/cilium-cli/pull/3165)
- chore(deps): update actions/setup-go action to v6.2.0 by
[@&#8203;renovate](https://redirect.github.com/renovate)\[bot] in
[#&#8203;3166](https://redirect.github.com/cilium/cilium-cli/pull/3166)
- Prepare for v0.19.0 release by
[@&#8203;tklauser](https://redirect.github.com/tklauser) in
[#&#8203;3167](https://redirect.github.com/cilium/cilium-cli/pull/3167)

**Full Changelog**:
<cilium/cilium-cli@v0.18.9...v0.19.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi44MS4yIiwidXBkYXRlZEluVmVyIjoiNDIuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9taW5vciJdfQ==-->

Co-authored-by: zocimek-renovate[bot] <134739422+zocimek-renovate[bot]@users.noreply.github.com>
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

CI: no-policies-extra/pod-to-local-nodeport: command terminated with exit code 28

7 participants