Skip to content

[ipam/multi-pool] Fix races in the manager#44183

Merged
pippolo84 merged 2 commits intocilium:mainfrom
pippolo84:pr/pippolo84/multi-pool-fix-tests-race
Feb 4, 2026
Merged

[ipam/multi-pool] Fix races in the manager#44183
pippolo84 merged 2 commits intocilium:mainfrom
pippolo84:pr/pippolo84/multi-pool-fix-tests-race

Conversation

@pippolo84
Copy link
Copy Markdown
Member

Fix some races in the manager and in the related unit tests. Please refer to individual commit for more details.

Trying to set a custom trigger to reconcile the CiliumNode in the
multi-pool manager unit tests is racy (see the error returned by the
race detector when running the tests below).  Moreover, the trigger is
not even used anymore after the latest changes in the tests. To avoid
races and tests flakyness, let's simply set a very short (1 ns) debounce
interval for CiliumNode reconciliation.

```
=== RUN   Test_MultiPoolManager_ReleaseUnusedCIDR
==================
WARNING: DATA RACE
Read at 0x00c0005ef8a8 by goroutine 53:
  github.com/cilium/cilium/pkg/ipam.(*multiPoolManager).ciliumNodeUpdated()
      github.com/cilium/cilium/pkg/ipam/multipool_manager.go:403 +0x20c
  github.com/cilium/cilium/pkg/ipam.newMultiPoolManager.func3()
      github.com/cilium/cilium/pkg/ipam/multipool_manager.go:305 +0x264
  github.com/cilium/hive/job.(*jobOneShot).start()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/oneshot.go:138 +0x808
  github.com/cilium/hive/job.(*group).add.func1.gowrap1()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/job.go:263 +0x131

Previous write at 0x00c0005ef8a8 by goroutine 48:
  github.com/cilium/cilium/pkg/ipam.Test_MultiPoolManager_ReleaseUnusedCIDR()
      github.com/cilium/cilium/pkg/ipam/multipool_test.go:541 +0xe58
  testing.tRunner()
      testing/testing.go:1934 +0x21c
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x44
```

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The CiliumNode pointer in the manager is accessed concurrently by
multiple goroutines:

- it is updated when handling an upsert event from the LocalNodeResource
- it is read to get the IPAM pools when an update of the multi-pool
  status is triggered by "multi-pool-cilium-node-updater" timer job

This can be seen below in the error returned by the race detector when
running the multi-pool manager unit test.

To fix this, add another mutex to specifically protect the access to
node in the manager, alongside two unexported methods to read and update
it.

```
==================
WARNING: DATA RACE
Write at 0x00c000848990 by goroutine 66:
  github.com/cilium/cilium/pkg/ipam.(*multiPoolManager).ciliumNodeUpdated()
      github.com/cilium/cilium/pkg/ipam/multipool_manager.go:406 +0x22f
  github.com/cilium/cilium/pkg/ipam.newMultiPoolManager.func3()
      github.com/cilium/cilium/pkg/ipam/multipool_manager.go:305 +0x264
  github.com/cilium/hive/job.(*jobOneShot).start()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/oneshot.go:138 +0x808
  github.com/cilium/hive/job.(*group).add.func1.gowrap1()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/job.go:263 +0x131

Previous read at 0x00c000848990 by goroutine 67:
  github.com/cilium/cilium/pkg/ipam.(*multiPoolManager).updateLocalNode()
      github.com/cilium/cilium/pkg/ipam/multipool_manager.go:587 +0xd64
  github.com/cilium/cilium/pkg/ipam.(*multiPoolManager).updateLocalNode-fm()
      <autogenerated>:1 +0x47
  github.com/cilium/hive/job.(*jobTimer).start()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/timer.go:191 +0x708
  github.com/cilium/hive/job.(*group).add.func1.gowrap1()
      github.com/cilium/hive@v0.0.0-20260108104938-97756f6ff54c/job/job.go:263 +0x131
```

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 requested a review from a team as a code owner February 4, 2026 11:49
@pippolo84 pippolo84 added area/ipam IP address management, including cloud IPAM area/multipool Affects Multi-Pool IPAM labels Feb 4, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 4, 2026
@pippolo84
Copy link
Copy Markdown
Member Author

/test

@pippolo84 pippolo84 added the release-note/misc This PR makes changes that have no direct user impact. label Feb 4, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 4, 2026
Copy link
Copy Markdown
Member

@HadrienPatte HadrienPatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 4, 2026
@pippolo84 pippolo84 added this pull request to the merge queue Feb 4, 2026
Merged via the queue into cilium:main with commit dad4af7 Feb 4, 2026
82 checks passed
@pippolo84 pippolo84 deleted the pr/pippolo84/multi-pool-fix-tests-race branch February 4, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ipam IP address management, including cloud IPAM area/multipool Affects Multi-Pool IPAM ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants