In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics.#61423
In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics.#61423bskiba wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
/assign @MaciekPytel @DirectXMan12 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bskiba Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
32f11a5 to
5b9f005
Compare
|
/kind bug |
In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing). If all metrics are missing, return error and set HPA iScalingActive condition to reason coming from first invalid metric. If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.
5b9f005 to
a44e006
Compare
|
/test pull-kubernetes-integration |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
/remove-lifecycle stale |
|
@DirectXMan12 @MaciekPytel Would you have time to have a look at this? |
|
@bskiba: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
/unassign @DirectXMan12 |
|
/remove-lifecycle rotten |
| var metricNameProposal string | ||
| // If all metrics are invalid or some are invalid and we would scale down, | ||
| // return error and set condition on hpa based on first invalid metric. | ||
| if invalidMetricsCount >= len(metricSpecs) || (invalidMetricsCount > 0 && replicas < currentReplicas) { |
There was a problem hiding this comment.
It feels like the naming of replicas here is a bit misleading now (although the name still makes sense at return time) as it's only the current proposed new replica count at the time of this comparison.
It took me a few scans to figure out what it actually corresponded to at the point where this comparison is done.
|
Think this can now be closed as it's been superseded by #78503 |
|
/close |
|
@yastij: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Add support for scaling to zero pods minReplicas is allowed to be zero condition is set once Based on kubernetes#61423 set original valid condition add scale to/from zero and invalid metric tests Scaling up from zero pods ignores tolerance validate metrics when minReplicas is 0 Document HPA behaviour when minReplicas is 0 Documented minReplicas field in autoscaling APIs
Add support for scaling to zero pods minReplicas is allowed to be zero condition is set once Based on kubernetes/kubernetes#61423 set original valid condition add scale to/from zero and invalid metric tests Scaling up from zero pods ignores tolerance validate metrics when minReplicas is 0 Document HPA behaviour when minReplicas is 0 Documented minReplicas field in autoscaling APIs Kubernetes-commit: d55f037b7d84e61c81a6b8c20606bb06b6eb20f2
What this PR does / why we need it:
In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).
If all metrics are missing, return error and set HPA condition to error from first invalid metric.
If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.
Reasoning behind this solution: if one metric is unavailable but the other says we should scale up, it is safe to scale up as the missing metric could only tell us to scale even more. If one metric is unavailable (can be transient unavailability) it is not safe to scale up as the missing metric can be the one telling us to keep the replicas count at the current level (or even scale up).
When multiple metrics are unavailable, I set the HPA ScalingActive condition to reason coming from first invalid metric with a reasouning that if you have multiple problems, you will start eliminating them one by one. I can also consider adding a new condition to describe the multiple metrics unavailable situation.
Metric statuses are correctly set on metrics that are available. Events for unavailable metrics are sent as before.
Tested this by
Which issue(s) this PR fixes:
Fixes #61007
Release note: