In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics. by bskiba · Pull Request #61423 · kubernetes/kubernetes

bskiba · 2018-03-20T18:38:41Z

What this PR does / why we need it:
In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).
If all metrics are missing, return error and set HPA condition to error from first invalid metric.
If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.

Reasoning behind this solution: if one metric is unavailable but the other says we should scale up, it is safe to scale up as the missing metric could only tell us to scale even more. If one metric is unavailable (can be transient unavailability) it is not safe to scale up as the missing metric can be the one telling us to keep the replicas count at the current level (or even scale up).

When multiple metrics are unavailable, I set the HPA ScalingActive condition to reason coming from first invalid metric with a reasouning that if you have multiple problems, you will start eliminating them one by one. I can also consider adding a new condition to describe the multiple metrics unavailable situation.

Metric statuses are correctly set on metrics that are available. Events for unavailable metrics are sent as before.

Tested this by

creating hpa with two metric sources, one metric unavailable, second triggering scale up - successfully triggered scaled up for the valid metric.
creating hpa with two metric sources, one metric unavailable, second triggering scale down - verified no scale down.

Which issue(s) this PR fixes:
Fixes #61007

Release note:

In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).

bskiba · 2018-03-20T18:39:03Z

/assign @MaciekPytel @DirectXMan12

k8s-ci-robot · 2018-03-20T18:39:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bskiba
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: directxman12

Assign the PR to them by writing /assign @directxman12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

pkg/controller/podautoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

DirectXMan12 · 2018-03-23T18:02:24Z

/kind bug
/priority important-longterm

In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing). If all metrics are missing, return error and set HPA iScalingActive condition to reason coming from first invalid metric. If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.

bskiba · 2018-03-29T11:35:42Z

/test pull-kubernetes-integration

fejta-bot · 2018-06-27T12:51:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bskiba · 2018-06-28T09:06:14Z

/remove-lifecycle stale

bskiba · 2018-06-28T09:07:14Z

@DirectXMan12 @MaciekPytel Would you have time to have a look at this?

k8s-ci-robot · 2018-07-20T22:29:10Z

@bskiba: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2018-12-18T20:49:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

thejasbabu · 2018-12-19T12:57:06Z

/remove-lifecycle stale

fejta-bot · 2019-03-19T13:51:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-04-18T14:24:10Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

DirectXMan12 · 2019-04-29T20:34:54Z

/unassign @DirectXMan12

yastij · 2019-05-02T16:35:56Z

/remove-lifecycle rotten
@kubernetes/sig-autoscaling-misc

gjtempleton · 2019-05-02T16:53:56Z

pkg/controller/podautoscaler/horizontal.go

-		var metricNameProposal string
+	// If all metrics are invalid or some are invalid and we would scale down,
+	// return error and set condition on hpa based on first invalid metric.
+	if invalidMetricsCount >= len(metricSpecs) || (invalidMetricsCount > 0 && replicas < currentReplicas) {


It feels like the naming of replicas here is a bit misleading now (although the name still makes sense at return time) as it's only the current proposed new replica count at the time of this comparison.

It took me a few scans to figure out what it actually corresponded to at the point where this comparison is done.

Must work so much better.

gjtempleton · 2019-06-04T21:20:12Z

Think this can now be closed as it's been superseded by #78503

yastij · 2019-06-04T23:23:02Z

/close

k8s-ci-robot · 2019-06-04T23:23:04Z

@yastij: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Add support for scaling to zero pods minReplicas is allowed to be zero condition is set once Based on kubernetes#61423 set original valid condition add scale to/from zero and invalid metric tests Scaling up from zero pods ignores tolerance validate metrics when minReplicas is 0 Document HPA behaviour when minReplicas is 0 Documented minReplicas field in autoscaling APIs

Add support for scaling to zero pods minReplicas is allowed to be zero condition is set once Based on kubernetes/kubernetes#61423 set original valid condition add scale to/from zero and invalid metric tests Scaling up from zero pods ignores tolerance validate metrics when minReplicas is 0 Document HPA behaviour when minReplicas is 0 Documented minReplicas field in autoscaling APIs Kubernetes-commit: d55f037b7d84e61c81a6b8c20606bb06b6eb20f2

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 20, 2018

k8s-ci-robot assigned DirectXMan12 and MaciekPytel Mar 20, 2018

k8s-ci-robot requested review from DirectXMan12 and fgrzadkowski March 20, 2018 18:39

bskiba force-pushed the multiple-metrics branch from 32f11a5 to 5b9f005 Compare March 20, 2018 18:39

bskiba force-pushed the multiple-metrics branch from 5b9f005 to a44e006 Compare March 29, 2018 08:52

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 27, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2018

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2018

fgrzadkowski removed their request for review September 19, 2018 20:04

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2018

DXist mentioned this pull request Feb 25, 2019

Support scaling HPA to/from zero pods for object/external metrics #74526

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 18, 2019

k8s-ci-robot unassigned DirectXMan12 Apr 29, 2019

k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 2, 2019

gjtempleton reviewed May 2, 2019

View reviewed changes

gjtempleton mentioned this pull request May 29, 2019

Multiple metrics hpa #78503

Merged

k8s-ci-robot closed this Jun 4, 2019

Conversation

bskiba commented Mar 20, 2018

Uh oh!

bskiba commented Mar 20, 2018

Uh oh!

k8s-ci-robot commented Mar 20, 2018

Uh oh!

DirectXMan12 commented Mar 23, 2018

Uh oh!

bskiba commented Mar 29, 2018

Uh oh!

fejta-bot commented Jun 27, 2018

Uh oh!

bskiba commented Jun 28, 2018

Uh oh!

bskiba commented Jun 28, 2018

Uh oh!

k8s-ci-robot commented Jul 20, 2018

Uh oh!

fejta-bot commented Dec 18, 2018

Uh oh!

thejasbabu commented Dec 19, 2018

Uh oh!

fejta-bot commented Mar 19, 2019

Uh oh!

fejta-bot commented Apr 18, 2019

Uh oh!

DirectXMan12 commented Apr 29, 2019

Uh oh!

yastij commented May 2, 2019

Uh oh!

gjtempleton May 2, 2019

Choose a reason for hiding this comment

Uh oh!

matthewejohnson711 May 2, 2019

Choose a reason for hiding this comment

Uh oh!

gjtempleton commented Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yastij commented Jun 4, 2019

Uh oh!

k8s-ci-robot commented Jun 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

gjtempleton commented Jun 4, 2019 •

edited

Loading