Skip to content

ipam: Fix revert logic in reuseIPNets causing v4 CIDR duplication#44832

Merged
christarazi merged 1 commit intocilium:mainfrom
christarazi:pr/christarazi/fix-reuse-ipnets-cross-family-revert
Mar 18, 2026
Merged

ipam: Fix revert logic in reuseIPNets causing v4 CIDR duplication#44832
christarazi merged 1 commit intocilium:mainfrom
christarazi:pr/christarazi/fix-reuse-ipnets-cross-family-revert

Conversation

@christarazi
Copy link
Copy Markdown
Member

@christarazi christarazi commented Mar 17, 2026

When reuseIPNets processes a node with a pre-existing duplicate v6 CIDR,
the v6 allocation fails with ErrCIDRAllocated. The revertStack.Revert()
then releases the successfully-occupied v4 CIDR as well. Because the
updated node object is never synced to the internal state but its
CiliumNode spec still references the v4 CIDR, allocateNext can hand the
same v4 CIDR to another node.

The fix is to handle ErrCIDRAllocated per IP family. That means:

  • Keeping the successful family's allocation
  • Adding the node to n.nodes with partial CIDRs
  • Return the error so allocateNode can report it in the CiliumNode
    status

Also make upsertLocked() detect spec changes so k8s can be updated
instead of only updating the status only.

Fixed a bug in dual-stack cluster-pool IPAM where an operator restart with a pre-existing duplicate IPv6 PodCIDR could cause the affected node's IPv4 PodCIDR to be incorrectly freed and reassigned to another node.

@christarazi christarazi requested a review from a team as a code owner March 17, 2026 00:09
@christarazi christarazi requested a review from tklauser March 17, 2026 00:09
@maintainer-s-little-helper

This comment was marked as outdated.

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 17, 2026
@christarazi christarazi changed the title ipam: fix cross-family revert in reuseIPNets causing IPv4 CIDR duplication ipam: Fix revert logic in reuseIPNets causing v4 CIDR duplication Mar 17, 2026
@christarazi christarazi force-pushed the pr/christarazi/fix-reuse-ipnets-cross-family-revert branch from 4a52a5e to 904baeb Compare March 17, 2026 00:11
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 17, 2026
@christarazi christarazi added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/ipam IP address management, including cloud IPAM needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 17, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 17, 2026
@christarazi christarazi force-pushed the pr/christarazi/fix-reuse-ipnets-cross-family-revert branch from 904baeb to 4ee3c37 Compare March 17, 2026 04:53
@christarazi
Copy link
Copy Markdown
Member Author

/test

@cilium-ariane
Copy link
Copy Markdown

cilium-ariane bot commented Mar 17, 2026

/test

Edit: #44407

Copy link
Copy Markdown
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One non-blocking question/suggestion inline.

@christarazi christarazi force-pushed the pr/christarazi/fix-reuse-ipnets-cross-family-revert branch from 4ee3c37 to 0b4e11e Compare March 17, 2026 18:26
When reuseIPNets processes a node with a pre-existing duplicate v6 CIDR,
the v6 allocation fails with ErrCIDRAllocated. The revertStack.Revert()
then releases the successfully-occupied v4 CIDR as well. Because the
updated node object is never synced to the internal state but its
CiliumNode spec still references the v4 CIDR, allocateNext can hand the
same v4 CIDR to another node.

The fix is to handle ErrCIDRAllocated per IP family. That means:
* Keeping the successful family's allocation
* Adding the node to n.nodes with partial CIDRs
* Return the error so allocateNode can report it in the CiliumNode
  status

Also make upsertLocked() detect spec changes so k8s can be updated
instead of only updating the status only.

Co-authored-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/fix-reuse-ipnets-cross-family-revert branch from 0b4e11e to a76c3b5 Compare March 17, 2026 18:28
@christarazi christarazi enabled auto-merge March 17, 2026 18:29
@christarazi
Copy link
Copy Markdown
Member Author

/test

1 similar comment
@cilium-ariane
Copy link
Copy Markdown

cilium-ariane bot commented Mar 17, 2026

/test

@christarazi christarazi added this pull request to the merge queue Mar 18, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 18, 2026
@christarazi christarazi added this pull request to the merge queue Mar 18, 2026
Merged via the queue into cilium:main with commit 3b82257 Mar 18, 2026
75 of 79 checks passed
@christarazi christarazi deleted the pr/christarazi/fix-reuse-ipnets-cross-family-revert branch March 18, 2026 18:43
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 18, 2026
@github-actions github-actions bot added the backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. label Mar 23, 2026
@tklauser tklauser added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 24, 2026
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ipam IP address management, including cloud IPAM backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants