Add cluster health metrics to NodeController#30686
Add cluster health metrics to NodeController#30686k8s-github-robot merged 1 commit intokubernetes:masterfrom
Conversation
| prometheus.GaugeOpts{ | ||
| Subsystem: NodeControllerSubsystem, | ||
| Name: ZoneHealthStatisticKey, | ||
| Help: "Gauge measuring percentage of healty nodes per zone.", |
There was a problem hiding this comment.
I'd prefer that this be a raw number (e.g. number of healthy nodes per zone) rather than a percentage. This allows for broader usage of this metric, including allowing a dashboard creator to combine with the other "ZoneSize" metric for percentage graphs.
There was a problem hiding this comment.
Only if you feel strongly about it. I used percentage because NodeController logic is percentage-based, so it'll be easier to reason about cluster behavior in the current version.
There was a problem hiding this comment.
Maybe just do both? (i.e. three metrics)
|
@daviopp @fabioy - I added "evictionsInLast" metric together with pretty complex logic needed for it. Can you please review this PR ASAP, so it might have a chance of going in 1.4? |
|
I must figure out what to do with the fact that metrics appear only after eviction takes place. I need to make sure that we'll be showing appropriate set of zones if they change during runtime. |
|
Unfortunately I'm not going to have time to review this before the deadline. Assigning to @wojtek-t to find a reviewer. |
| } | ||
|
|
||
| // Register prometheus metrics | ||
| Register() |
There was a problem hiding this comment.
I think we generally want to register metrics in init() function (instead of here) - that is the patern that we are using in the code.
|
@wojtek-t PTAL |
| newState := nc.computeZoneStateFunc(v) | ||
| ZoneSize.WithLabelValues(k).Set(float64(len(v))) | ||
| unhealthy, newState := nc.computeZoneStateFunc(v) | ||
| ZoneHealth.WithLabelValues(k).Set(float64(100*(len(v)-unhealthy)) / float64(len(v))) |
There was a problem hiding this comment.
Are we sure that len(v) > 0 here?
There was a problem hiding this comment.
Yes, if it were 0, then there would be no Nodes in the given zone, and hence the zone would not exist. This map is created from scratch each time monitorNodeStatus is called.
|
LGTM |
|
GCE e2e build/test passed for commit 5d8cb17. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test passed for commit 5d8cb17. |
|
Automatic merge from submit-queue |
|
Thanks a lot for reviewing this, @wojtek-t |
Follow up of #28832
This adds metrics to monitor cluster/zone status.
cc @alex-mohr @fabioy @wojtek-t @Q-Lee
This change is