Add Multi-pool remaining IPs metric#41397
Conversation
pippolo84
left a comment
There was a problem hiding this comment.
Hi @rabelmervin . Thanks for the PR!
Could you please clarify your approach and maybe show an example of the new metrics actually running (maybe in a local kind cluster)?
I'm having some difficulties in understanding the code structure (e.g: where getRemainingIPs is called?).
|
Hi @pippolo84 I have tested the changes several times with kind yet, the metric is not visible and also the existing trigger metrics like |
Looking at the metrics for both cluster-pool and multi-pool IPAM mode, I realized that:
Therefore, to add metrics for multi-pool IPAM in the operator, you should:
You can take a look at #41531 to see how you can look at the metrics exposed by a running operator. Depending on the metrics you want to add, you might need to add them in the agent (e.g: how many IPs are used for a node). The steps needed to do that should be the same anyway. |
3f6e396 to
e2a099a
Compare
|
Commit bbd7aa3 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
PR has been closed due to deletion of root commit and local errors. made a new pr HERE |
|
@rabelmervin you can rebase your PR branch on latest |
|
Commit bbd7aa3 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
bbd7aa3 to
70d71de
Compare
It worked thanks for your feedback @tklauser could you please review the changes? |
|
@rabelmervin could you please add some description to the PR and the commit message explaining the approach (as requested by @pippolo84 in #41397 (review)). Also, please describe how you tested this and ideally also add a unit test to verify that the metric is working as intended. |
pippolo84
left a comment
There was a problem hiding this comment.
Differently from cluster-pool, here we are not adding metrics tied to a trigger, we just want to update a metric that keeps track of the available IPs for each pool.
I suggest to rework the code without relying too much on the cluster-pool implementation. You can consider it as an inspiration to see how the metrics are registered, but the logic to update it will inevitably be different.
| nodes map[string]poolToCIDRs // nodeName -> pool -> cidrs | ||
| ready bool | ||
| metricsAPI metricsAPI | ||
| CountAllocatedIPsInPrefix func(cidr *netip.Prefix) *big.Int |
There was a problem hiding this comment.
Not sure why we need a function as a field of this struct. What is the reason behind this?
There was a problem hiding this comment.
Helper functions using it (allocatedv4 and v6) to calculate the allocated ips
There was a problem hiding this comment.
But why a field in a struct and not a simple function in the multipool package?
Also, CountAllocatedIPsInPrefix is initialized in NewPoolAllocator to always returns 0 and never overwritten, so it will always return 0. What's the point?
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: rabelmervin <rabelmervin@gmail.com> Added Prometheus metrics for multipool IPAM remaining IPs Add metric for remaining IPs in multi-pool setup Introduce a metric to track the remaining IPs available in a multi-pool environment. This enhancement enables better monitoring and resource management in scenarios where multiple IP pools are utilized. - Provides visibility into IP usage across all pools - Ensures proactive alerting for potential IP exhaustion - Aligns with the existing observability features Fixes cilium#41397
c375c5e to
90a0d69
Compare
90a0d69 to
50c4e47
Compare
Signed-off-by: rabelmervin <rabelmervin@gmail.com>
a2a2e8f to
dd601a7
Compare
|
@rabelmervin It seems there's at least one question from Fabio left answered above. |
|
/test |
pippolo84
left a comment
There was a problem hiding this comment.
Multi-Pool IPAM allocates IP addresses in two steps. Consider that the pool 1.1.1.0/24 is available from a CiliumPodIPPool resource.
- The operator divide that CIDR into smaller CIDRs with the same mask size that will be allocated to each agent. As an example, say that mask size is equal to 28, then the operator will allocate 1.1.1.0/28, then 1.1.1.16/28 and so on.
- Each agent allocates single IP addresses to the scheduled pods on the node. Following the example, say that a node has been assigned the CIDR 1.1.1.0/28 from the operator: it will assign to the pods the IP addresses 1.1.1.1, then 1.1.1.2 and so on.
If we want to add metrics for remaining IPs, the first question we should ask is what are we going to expose in the metrics. One possible approach is:
- adding metrics in the operator to expose how many CIDRs of size
maskSizewe can still allocate to nodes from all the CIDRs in each pools. - adding metrics in the agents to expose how many IP addresses we can still allocate from the CIDRs of size
maskSizethat have been assigned to the node.
I suggest to start from the operator metrics and leave 2) for a follow up PR.
| allocatedV4 := pool.allocatedV4() | ||
| assert.NotNil(t, allocatedV4) | ||
| assert.True(t, allocatedV4.Cmp(big.NewInt(0)) >= 0, "Allocated IPv4 should be non-negative") | ||
|
|
||
| // Test allocatedV6() | ||
| allocatedV6 := pool.allocatedV6() | ||
| assert.NotNil(t, allocatedV6) | ||
| assert.True(t, allocatedV6.Cmp(big.NewInt(0)) >= 0, "Allocated IPv6 should be non-negative") |
There was a problem hiding this comment.
To make the test meaningful we should try to allocate some CIDRs and verify that the numbers of allocated and avalaible IPs are the ones expected.
| remainingV4 := new(big.Int).Sub(availableV4, allocatedV4) | ||
| remainingV6 := new(big.Int).Sub(availableV6, allocatedV6) | ||
|
|
||
| assert.True(t, remainingV4.Cmp(big.NewInt(0)) >= 0, "Remaining IPv4 should be non-negative") | ||
| assert.True(t, remainingV6.Cmp(big.NewInt(0)) >= 0, "Remaining IPv6 should be non-negative") |
There was a problem hiding this comment.
What is the purpose of testing big.Int.Sub?
|
Commit 22ca075 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
Switching to draft while the review is being addressed. |
|
This pull request has been automatically marked as stale because it |
|
This pull request has not seen any activity since it was marked stale. |
Fixes #35366
Approach: Multi-Pool IPAM Remaining IPs Metrics
Overview
This PR implements real-time metrics tracking for remaining IP addresses in the multi-pool IPAM allocator, Which tracks remaining ips per pool
Implementation Details
Metrics :
cilium_ipam_remaining_ipsMetrics Schema :
Gaugestartfunction present inallocator.goif metrics enabled.GetAvailableAddrsV4(),GetAvailableAddrsV6()to compute available ips andallocatedV4(),allocatedV6()to compute allocated ips.getRemainingips()to compute remaining ips using the above helpers and updates the metircs.RemainingIPS = availableIPS - allocatedIPSCountAllocatedIPsInPrefix()incidr_set.goto compute individual allocated ips in general.