Skip to content

Multiple metrics hpa#78503

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
gjtempleton:Multiple-Metrics-HPA
Jun 4, 2019
Merged

Multiple metrics hpa#78503
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
gjtempleton:Multiple-Metrics-HPA

Conversation

@gjtempleton
Copy link
Copy Markdown
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:
Currently the HPA controller will fail to scale up targets if they are scaling on multiple metrics and one or more of these metrics is invalid/unavailable, even if one of the metrics is available and indicates a scale up should happen.

This PR allows a scale up to happen in this case, whilst still ensuring that it fails safe and refuses to scale down if one or more metrics are unavailable.

Which issue(s) this PR fixes:

Fixes #61007

Special notes for your reviewer:
I wasn't 100% sure whether to go for the first or last invalid metric to be reported as the error in the case of refusing to scale up, however going with the last simplified the changes required massively, meaning no need for passing around conditions.

Credit to @bskiba as a large amount of the changes come from her previous PR #61423

Does this PR introduce a user-facing change?:

Horizontal Pod Autoscaling can now scale targets up even when one or more metrics are invalid/unavailable as long as one metric indicates a scale up should occur.

/sig autoscaling
/priority important-longterm

Add three tests for handling invalid metrics when scaling on
multiple metrics - one for scaling up successfully (new behaviour)
and two for ensuring we don't scale down (existing behaviour).
Handle a case in the Horizontal Pod Autoscaler Controller when scaling
on multiple metrics and one or more is missing or invalid.

If all metrics are missing - return an error and leave the isScalingActive
condition as that for the last invalid metric.

If some metrics are missing/invalid and some are valid and found -
if a scale up would be triggered by the valid metrics ignore the missing
metrics and scale up, if a scale down would be triggered, return an error
and leave the isScalingActive condition as that for the last invalid metric.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 29, 2019
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @gjtempleton. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels May 29, 2019
@k8s-ci-robot k8s-ci-robot requested review from mwielgus and piosz May 29, 2019 22:33
@gjtempleton
Copy link
Copy Markdown
Member Author

cc: @josephburnett

@josephburnett
Copy link
Copy Markdown
Contributor

/lgtm

I'm not yet part of the Kubernetes org, so I can't add /ok-to-test. @mwielgus can you please.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@josephburnett: changing LGTM is restricted to assignees, and only kubernetes/kubernetes repo collaborators may be assigned issues.

Details

In response to this:

/lgtm

I'm not yet part of the Kubernetes org, so I can't add /ok-to-test. @mwielgus can you please.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 30, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 30, 2019
@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 31, 2019

Thanks!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2019
@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 31, 2019

/test pull-kubernetes-bazel-build

@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 31, 2019

@MaciekPytel @mwielgus Can one of you approve?

@gjtempleton
Copy link
Copy Markdown
Member Author

/test pull-kubernetes-bazel-build

@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 31, 2019

/assign @mwielgus
/assgin @MaciekPytel

@bskiba
Copy link
Copy Markdown
Member

bskiba commented May 31, 2019

/assign @MaciekPytel

Copy link
Copy Markdown
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjtempleton, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aguckenber-chwy
Copy link
Copy Markdown

Maybe this is worthy of a new issue but regarding this merged feature. Its great that the hpa can scale up with an unknown metric, but is there a flag that is set-able to allow the HPA to scale down for unknown metrics? We have issues where the cluster scales up then a custom metric becomes unknown and the cluster stays way scaled up costing us a fortune.

All because of this:

If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of these metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs) and a scale down is suggested by the metrics which can be fetched, scaling is skipped. This means that the HPA is still capable of scaling up if one or more metrics give a desiredReplicas greater than the current value.

@gjtempleton
Copy link
Copy Markdown
Member Author

@aguckenber-chwy that's definitely best discussed as a new issue.

Can definitely understand it as a use case users may want, but there's probably a number of different ways we could expose it as a feature, so likely to be some back on forth on that if nothing else.

@aguckenber-chwy
Copy link
Copy Markdown

@aguckenber-chwy that's definitely best discussed as a new issue.

Can definitely understand it as a use case users may want, but there's probably a number of different ways we could expose it as a feature, so likely to be some back on forth on that if nothing else.

Quick question, where is the best spot to put feature requests? The issues in this repository don't have it. It this the correct spot https://github.com/kubernetes/kubernetes/issues ? Or should it go on the community forums under General Discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HPA refuses to scale if any custom metric is missing

7 participants