Skip to content

Add Multi-pool remaining IPs metric#41397

Closed
rabelmervin wants to merge 2 commits intocilium:mainfrom
rabelmervin:multipool
Closed

Add Multi-pool remaining IPs metric#41397
rabelmervin wants to merge 2 commits intocilium:mainfrom
rabelmervin:multipool

Conversation

@rabelmervin
Copy link
Copy Markdown

@rabelmervin rabelmervin commented Aug 27, 2025

Fixes #35366

Approach: Multi-Pool IPAM Remaining IPs Metrics

Overview
This PR implements real-time metrics tracking for remaining IP addresses in the multi-pool IPAM allocator, Which tracks remaining ips per pool

Implementation Details
Metrics : cilium_ipam_remaining_ips
Metrics Schema : Gauge

  • Registered metrics in start function present in allocator.go if metrics enabled.
  • Created new functions like GetAvailableAddrsV4(), GetAvailableAddrsV6() to compute available ips and allocatedV4(), allocatedV6() to compute allocated ips.
  • Implemented getRemainingips() to compute remaining ips using the above helpers and updates the metircs. RemainingIPS = availableIPS - allocatedIPS
  • Implemented CountAllocatedIPsInPrefix() in cidr_set.go to compute individual allocated ips in general.

@rabelmervin rabelmervin requested a review from a team as a code owner August 27, 2025 08:55
@rabelmervin rabelmervin requested a review from pippolo84 August 27, 2025 08:55
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 27, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Aug 27, 2025
Copy link
Copy Markdown
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rabelmervin . Thanks for the PR!

Could you please clarify your approach and maybe show an example of the new metrics actually running (maybe in a local kind cluster)?

I'm having some difficulties in understanding the code structure (e.g: where getRemainingIPs is called?).

@rabelmervin
Copy link
Copy Markdown
Author

Hi @pippolo84 I have tested the changes several times with kind yet, the metric is not visible and also the existing trigger metrics like K8sSync, Resync

@ldelossa ldelossa added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/ipam IP address management, including cloud IPAM release-note/misc This PR makes changes that have no direct user impact. labels Sep 2, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 2, 2025
@pippolo84
Copy link
Copy Markdown
Member

Hi @pippolo84 I have tested the changes several times with kind yet, the metric is not visible and also the existing trigger metrics like K8sSync, Resync

Looking at the metrics for both cluster-pool and multi-pool IPAM mode, I realized that:

  • for cluster-pool, we have the k8s sync trigger metrics, but we do not register them, thus they are not exposed. I fixed that in ipam: Register k8s sync metrics for cluster-pool #41531
  • for multi-pool, we do not have any metrics (you can also see that from the fact that in *Allocator.Start the *metrics.Registry parameter is unused)

Therefore, to add metrics for multi-pool IPAM in the operator, you should:

  • create the needed metrics
  • register them to the registry passed in to the multi-pool allocator (if enable-metrics is true for the operator)

You can take a look at #41531 to see how you can look at the metrics exposed by a running operator.

Depending on the metrics you want to add, you might need to add them in the agent (e.g: how many IPs are used for a node). The steps needed to do that should be the same anyway.

@maintainer-s-little-helper
Copy link
Copy Markdown

Commit bbd7aa3 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Sep 16, 2025
@rabelmervin
Copy link
Copy Markdown
Author

rabelmervin commented Sep 16, 2025

PR has been closed due to deletion of root commit and local errors. made a new pr HERE

@tklauser
Copy link
Copy Markdown
Member

tklauser commented Sep 17, 2025

@rabelmervin you can rebase your PR branch on latest main. There should be no need to close the PR and create a new one. Let's keep the discussion in this PR.

@maintainer-s-little-helper
Copy link
Copy Markdown

Commit bbd7aa3 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Sep 17, 2025
@rabelmervin
Copy link
Copy Markdown
Author

@rabelmervin you can rebase your PR branch on latest main. There should be no need to close the PR and create a new one. Let's keep the discussion in this PR.

It worked thanks for your feedback @tklauser could you please review the changes?

@tklauser
Copy link
Copy Markdown
Member

@rabelmervin could you please add some description to the PR and the commit message explaining the approach (as requested by @pippolo84 in #41397 (review)).

Also, please describe how you tested this and ideally also add a unit test to verify that the metric is working as intended.

Copy link
Copy Markdown
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differently from cluster-pool, here we are not adding metrics tied to a trigger, we just want to update a metric that keeps track of the available IPs for each pool.
I suggest to rework the code without relying too much on the cluster-pool implementation. You can consider it as an inspiration to see how the metrics are registered, but the logic to update it will inevitably be different.

nodes map[string]poolToCIDRs // nodeName -> pool -> cidrs
ready bool
metricsAPI metricsAPI
CountAllocatedIPsInPrefix func(cidr *netip.Prefix) *big.Int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need a function as a field of this struct. What is the reason behind this?

Copy link
Copy Markdown
Author

@rabelmervin rabelmervin Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper functions using it (allocatedv4 and v6) to calculate the allocated ips

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why a field in a struct and not a simple function in the multipool package?

Also, CountAllocatedIPsInPrefix is initialized in NewPoolAllocator to always returns 0 and never overwritten, so it will always return 0. What's the point?

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Oct 5, 2025
@rabelmervin rabelmervin marked this pull request as draft October 23, 2025 12:01
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Nov 1, 2025
rabelmervin added a commit to rabelmervin/cilium that referenced this pull request Nov 1, 2025
Signed-off-by: rabelmervin <rabelmervin@gmail.com>

Added Prometheus metrics for multipool IPAM remaining IPs

Add metric for remaining IPs in multi-pool setup

Introduce a metric to track the remaining IPs available in a multi-pool
environment. This enhancement enables better monitoring and resource
management in scenarios where multiple IP pools are utilized.

- Provides visibility into IP usage across all pools
- Ensures proactive alerting for potential IP exhaustion
- Aligns with the existing observability features

Fixes cilium#41397
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Nov 1, 2025
@rabelmervin rabelmervin marked this pull request as ready for review November 4, 2025 16:52
@rabelmervin rabelmervin marked this pull request as draft November 4, 2025 17:39
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
@rabelmervin rabelmervin marked this pull request as ready for review November 5, 2025 03:02
@pchaigno pchaigno requested a review from pippolo84 November 24, 2025 15:18
@pchaigno
Copy link
Copy Markdown
Member

@rabelmervin It seems there's at least one question from Fabio left answered above.

@pchaigno
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Pool IPAM allocates IP addresses in two steps. Consider that the pool 1.1.1.0/24 is available from a CiliumPodIPPool resource.

  1. The operator divide that CIDR into smaller CIDRs with the same mask size that will be allocated to each agent. As an example, say that mask size is equal to 28, then the operator will allocate 1.1.1.0/28, then 1.1.1.16/28 and so on.
  2. Each agent allocates single IP addresses to the scheduled pods on the node. Following the example, say that a node has been assigned the CIDR 1.1.1.0/28 from the operator: it will assign to the pods the IP addresses 1.1.1.1, then 1.1.1.2 and so on.

If we want to add metrics for remaining IPs, the first question we should ask is what are we going to expose in the metrics. One possible approach is:

  1. adding metrics in the operator to expose how many CIDRs of size maskSize we can still allocate to nodes from all the CIDRs in each pools.
  2. adding metrics in the agents to expose how many IP addresses we can still allocate from the CIDRs of size maskSize that have been assigned to the node.

I suggest to start from the operator metrics and leave 2) for a follow up PR.

Comment on lines +482 to +489
allocatedV4 := pool.allocatedV4()
assert.NotNil(t, allocatedV4)
assert.True(t, allocatedV4.Cmp(big.NewInt(0)) >= 0, "Allocated IPv4 should be non-negative")

// Test allocatedV6()
allocatedV6 := pool.allocatedV6()
assert.NotNil(t, allocatedV6)
assert.True(t, allocatedV6.Cmp(big.NewInt(0)) >= 0, "Allocated IPv6 should be non-negative")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the test meaningful we should try to allocate some CIDRs and verify that the numbers of allocated and avalaible IPs are the ones expected.

Comment on lines +491 to +495
remainingV4 := new(big.Int).Sub(availableV4, allocatedV4)
remainingV6 := new(big.Int).Sub(availableV6, allocatedV6)

assert.True(t, remainingV4.Cmp(big.NewInt(0)) >= 0, "Remaining IPv4 should be non-negative")
assert.True(t, remainingV6.Cmp(big.NewInt(0)) >= 0, "Remaining IPv6 should be non-negative")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of testing big.Int.Sub?

@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 22ca075 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Dec 14, 2025
@joestringer joestringer added dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed release-note/misc This PR makes changes that have no direct user impact. labels Jan 9, 2026
@joestringer joestringer changed the title Implementation of Multi-pool remaining IPs metric Add Multi-pool remaining IPs metric Jan 9, 2026
@aanm aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 14, 2026
@pchaigno
Copy link
Copy Markdown
Member

Switching to draft while the review is being addressed.

@pchaigno pchaigno marked this pull request as draft February 11, 2026 13:04
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 14, 2026
@github-actions
Copy link
Copy Markdown

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ipam IP address management, including cloud IPAM area/metrics Impacts statistics / metrics gathering, eg via Prometheus. dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CFP: Multi-pool remaining IPs metric

7 participants