Skip to content

In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics.#61423

Closed
bskiba wants to merge 1 commit intokubernetes:masterfrom
bskiba:multiple-metrics
Closed

In Horizontal Pod Autoscaler Controller when scaling on multiple metrics, handle invalid metrics.#61423
bskiba wants to merge 1 commit intokubernetes:masterfrom
bskiba:multiple-metrics

Conversation

@bskiba
Copy link
Copy Markdown
Member

@bskiba bskiba commented Mar 20, 2018

What this PR does / why we need it:
In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).
If all metrics are missing, return error and set HPA condition to error from first invalid metric.
If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.

Reasoning behind this solution: if one metric is unavailable but the other says we should scale up, it is safe to scale up as the missing metric could only tell us to scale even more. If one metric is unavailable (can be transient unavailability) it is not safe to scale up as the missing metric can be the one telling us to keep the replicas count at the current level (or even scale up).

When multiple metrics are unavailable, I set the HPA ScalingActive condition to reason coming from first invalid metric with a reasouning that if you have multiple problems, you will start eliminating them one by one. I can also consider adding a new condition to describe the multiple metrics unavailable situation.

Metric statuses are correctly set on metrics that are available. Events for unavailable metrics are sent as before.

Tested this by

  1. creating hpa with two metric sources, one metric unavailable, second triggering scale up - successfully triggered scaled up for the valid metric.
  2. creating hpa with two metric sources, one metric unavailable, second triggering scale down - verified no scale down.

Which issue(s) this PR fixes:
Fixes #61007

Release note:

In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 20, 2018
@bskiba
Copy link
Copy Markdown
Member Author

bskiba commented Mar 20, 2018

/assign @MaciekPytel @DirectXMan12

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bskiba
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: directxman12

Assign the PR to them by writing /assign @directxman12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bskiba bskiba force-pushed the multiple-metrics branch from 32f11a5 to 5b9f005 Compare March 20, 2018 18:39
@DirectXMan12
Copy link
Copy Markdown
Contributor

/kind bug
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 23, 2018
In Horizontal Pod Autoscaler Controller handle case when scaling on multiple metrics and some of them are invalid (misconfigured or missing).
If all metrics are missing, return error and set HPA iScalingActive condition to reason coming from first invalid metric.
If some metrics are missing and some valid, on a scale up ignore missing metrics, on a scale down return an error and set HPA ScalingActive condition to reason coming from first invalid metric.
@bskiba bskiba force-pushed the multiple-metrics branch from 5b9f005 to a44e006 Compare March 29, 2018 08:52
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2018
@bskiba
Copy link
Copy Markdown
Member Author

bskiba commented Mar 29, 2018

/test pull-kubernetes-integration

@fejta-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 27, 2018
@bskiba
Copy link
Copy Markdown
Member Author

bskiba commented Jun 28, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 28, 2018
@bskiba
Copy link
Copy Markdown
Member Author

bskiba commented Jun 28, 2018

@DirectXMan12 @MaciekPytel Would you have time to have a look at this?

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@bskiba: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2018
@fgrzadkowski fgrzadkowski removed their request for review September 19, 2018 20:04
@fejta-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2018
@thejasbabu
Copy link
Copy Markdown

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2018
@fejta-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2019
@fejta-bot
Copy link
Copy Markdown

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 18, 2019
@DirectXMan12
Copy link
Copy Markdown
Contributor

/unassign @DirectXMan12

@yastij
Copy link
Copy Markdown
Member

yastij commented May 2, 2019

/remove-lifecycle rotten
@kubernetes/sig-autoscaling-misc

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 2, 2019
var metricNameProposal string
// If all metrics are invalid or some are invalid and we would scale down,
// return error and set condition on hpa based on first invalid metric.
if invalidMetricsCount >= len(metricSpecs) || (invalidMetricsCount > 0 && replicas < currentReplicas) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the naming of replicas here is a bit misleading now (although the name still makes sense at return time) as it's only the current proposed new replica count at the time of this comparison.

It took me a few scans to figure out what it actually corresponded to at the point where this comparison is done.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must work so much better.

@gjtempleton
Copy link
Copy Markdown
Member

gjtempleton commented Jun 4, 2019

Think this can now be closed as it's been superseded by #78503

@yastij
Copy link
Copy Markdown
Member

yastij commented Jun 4, 2019

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@yastij: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

DXist added a commit to DXist/kubernetes that referenced this pull request Jul 16, 2019
Add support for scaling to zero pods

minReplicas is allowed to be zero

condition is set once

Based on kubernetes#61423

set original valid condition

add scale to/from zero and invalid metric tests

Scaling up from zero pods ignores tolerance

validate metrics when minReplicas is 0

Document HPA behaviour when minReplicas is 0

Documented minReplicas field in autoscaling APIs
k8s-publishing-bot pushed a commit to kubernetes/api that referenced this pull request Jul 16, 2019
Add support for scaling to zero pods

minReplicas is allowed to be zero

condition is set once

Based on kubernetes/kubernetes#61423

set original valid condition

add scale to/from zero and invalid metric tests

Scaling up from zero pods ignores tolerance

validate metrics when minReplicas is 0

Document HPA behaviour when minReplicas is 0

Documented minReplicas field in autoscaling APIs

Kubernetes-commit: d55f037b7d84e61c81a6b8c20606bb06b6eb20f2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HPA refuses to scale if any custom metric is missing

9 participants