ipam: add metrics to track per node capacity#24776
Conversation
|
Commit b98e92d31f6aca94e40acbdf2d089a3d5e1759bc does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
b98e92d to
0f6ff5f
Compare
|
Commit b98e92d31f6aca94e40acbdf2d089a3d5e1759bc does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
/test |
0f6ff5f to
56ab89f
Compare
64d7c15 to
60c3c86
Compare
add59a5 to
7339b89
Compare
There was a problem hiding this comment.
Nice work, clean code and easy to read commits!
I don't have any other comment besides a nit to follow and what Sebastian already pointed out. When reading some of the commits, I was initially confused by what "CNI pods" meant and then realized it meant "managed (by Cilium) pods". The latter is typically how we convey the relationship as you probably already know, but thought I'd call it out.
7339b89 to
e7ed5e5
Compare
Updated metrics docs with new ipam metrics. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Mention the addition of ipam_{available,used,needed} metrics.
As well, notify in upgrade guide about intended deprecation of ipam_ips metric.
Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Previously IsPrefixDelegationEnabled implementation on *ipam.Node would check if the receiver node pointer was nil. If so it defaulted to false. This is a bit dangerous as function on nil concrete types will still be invoked, whereas interface types will panic (i.e. because a nil interface doesn't actually have a function to lookup). This removes that code, and defers nil checking to the caller. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
97ac4c1 to
0013798
Compare
|
/test |
|
@zacharysarah no problem, thanks for the review! |
|
@gandro @christarazi Made some changes to fix failing tests and lint issues. |
zacharysarah
left a comment
There was a problem hiding this comment.
@tommyp1ckles Thanks for the updates! ✨ LGTM
Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in cilium#24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in cilium#24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
This PR adds metrics to help track IPAM Node capacity with more granularity.
Specifically, the goal is to help add alerting/dashboards for such questions such as:
This adds three new Operator metrics:
As well, this seeks to deprecate the :
cilium_operator_ipam_ipsmetric, aside from it being redundant it also is very confusing - thetype=availabledoes not track what you expect it to track.Which are all labelled by "target_node" (i.e. the name of the CiliumNode).
How does this differ from the existing IPAM metrics?
There's a bit of a blind spot in the current set of metrics: