Configurable HorizontalPodAutoscaler by gliush · Pull Request #74525 · kubernetes/kubernetes

gliush · 2019-02-25T13:57:57Z

I introduce an algorithm-agnostic HPA object configuration that will configure each particular HPA scaling behavior.

KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

This PR contains only API changes for now. The business logic will be introduced in a separate PR. For now I keep all the changes in my local repo until the current PR is approved:
gliush#2

What type of PR is this?
/kind feature

What this PR does / why we need it:
Different applications may have different business values, different logic and may require different scaling behaviors.
At the moment, there’s only one cluster-level configuration parameter that influence how fast the cluster is scaled down:
--horizontal-pod-autoscaler-downscale-stabilization-window (default to 5 min)

And a couple of hard-coded constants that specify how fast the cluster can scale up:

scaleUpLimitFactor = 2.0
scaleUpLimitMinimum = 4.0

This PR introduces an algorithm-agnostic HPA object configuration that will configure each particular HPA scaling behavior.
For both directions (scale up and scale down) it will be possible to specify one or several behavior that will control how the HPA controller will scale the resources. Effectively each behavior is a specification of "how many percents/pods could be added/removed per some period of time".
Additionally, each direction might have a StabilizationWindowSeconds parameter set to gather recommendations for some time and pick the safest change.

For more information and motivation read the KEPs (links are above).

Which issue(s) this PR fixes:

Fixes #39090
Fixes #65097
Fixes #69428

It covers partly #56335

Special notes for your reviewer:
The API changes are backward compatible. All current defaults are kept the same.

Does this PR introduce a user-facing change?:

Added the HPA API, that allows scale behavior to be configured through the HPA
`behavior` field. Behaviors are specified separately for scaling up and down. In
each direction a stabilization window can be specified as well as a list of
policies and how to select amongst them. Policies can limit the absolute number
of pods added or removed, or the percentage of pods added or removed.

k8s-ci-robot · 2019-02-25T13:58:00Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2019-02-25T13:58:05Z

Hi @gliush. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gliush · 2019-02-25T13:58:50Z

/sig autoscaling

gliush · 2019-02-25T14:00:24Z

@thockin: Could you have a look? @mwielgus says that you're the most familiar with the matter.
I can't do it by myself, sorry.

thockin · 2019-03-01T21:44:54Z

Is there a KEP for this? Generally we want to see the design work independent of the code.

I am happy to take up the API review, but can't really do that until I know the domain experts (e.g. @mwielgus) are happy with the design.

thockin · 2019-03-01T21:45:46Z

I see there's a doc but that's not a KEP

staging/src/k8s.io/api/autoscaling/v2beta2/types.go

thockin · 2019-03-01T22:22:01Z

Why not allow the user to specify the denominator? "1 pod per 2 minutes" is more expressive than "0.5 pods per minute".

…

On Fri, Mar 1, 2019 at 2:09 PM Marcin Wielgus ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In staging/src/k8s.io/api/autoscaling/v2beta2/types.go <#74525 (comment)> : > @@ -117,6 +120,26 @@ type MetricSpec struct { // MetricSourceType indicates the type of metric. type MetricSourceType string +// HorizontalPodAutoscalerScaleConstraints configures the scaling velocity +// by specifying the "absolute" value (in number of pods) and "relative" values (in percents) +// All the parameters of the struct are "per minute". +// For each scale direction (Up or Down) if both parameters are specified +// the largest constraint is used. +type HorizontalPodAutoscalerScaleConstraints struct { + // scaleUpPercent specifies the scale up relative speed, in percentages + // i.e. if scaleUpPercent = 150 , then we can add 150% more pods (10 -> 25 pods) + ScaleUpPercent *resource.Quantity `json:"scaleUpPercent,omitempty" protobuf:"bytes,1,opt,name=scaleUpPercent"` The idea behind quantity is that all values are per minute. So if you want to add/remove at most 1 pod every 2 minutes you need to put 0.5 here. Quantity is used for consistency with pods. We can however use ints in percent-like values if you prefer. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#74525 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVP4C-s8uVJwcXpXK-UG_wxalviTAks5vSaUCgaJpZM4bP7bw> .

mwielgus · 2019-03-01T22:22:11Z

@thockin - I have worked with @gliush on this for a while. We agreed on the rough idea and the configuration parameters, so we are proceeding with the api.

This functionality has been requested multiple times already so I'm glad that we finally have someone who is working on it :).

josephburnett · 2019-11-14T20:37:40Z

/assign @thockin

josephburnett · 2019-11-14T22:27:33Z

/hold

@gliush I dug into the failed pull-kubernetes-kubemark-e2e-gce-big test and saw this:

TestMetrics error: [restart counts violation: RestartCount(kube-controller-manager-e2e-74525-ac87c-kubemark-master, kube-controller-manager)=1, want <= 0]

I deployed these changes with kube-up.sh and when I add behavior.scaleDown.stabilizationWindowSeconds: 30 and apply load then controller-manager crash loops. Nothing in the logs so I don't know why, but this must be fixed and requires an e2e test.

josephburnett · 2019-11-14T22:30:46Z

Maybe related: #84990

thockin

I'll approve this as-is, but you really need that one extra case in defaulting - please followup.

I see there's a hold for another reason, so this may be in vain, but good luck. :)

/lgtm
/approve

thockin · 2019-11-15T00:26:40Z

pkg/apis/autoscaling/v2beta2/defaults.go

The API spec says that Behavior == nil will also set the defaults. I see the test cases don't cover that, but I think you want to create a value for behavior and then merge the defaults in, right?

josephburnett · 2019-11-21T13:52:42Z

pkg/apis/autoscaling/v2beta2/defaults.go

This should really be 15 to keep parity with existing defaults, which resync HPAs every 15 seconds.

gliush · 2019-11-26T04:11:38Z

I've changed the default period seconds for HPA: 60s -> 15s to correspond to the previous behavior.
All the tests are passed.

k8s-ci-robot · 2019-12-08T14:08:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gliush, josephburnett, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~api/OWNERS~~ [thockin]
~~pkg/api/OWNERS~~ [thockin]
~~pkg/apis/OWNERS~~ [thockin]
~~pkg/controller/podautoscaler/OWNERS~~ [josephburnett,thockin]
~~staging/src/k8s.io/api/OWNERS~~ [thockin]
~~staging/src/k8s.io/kubectl/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

josephburnett · 2019-12-09T12:17:59Z

@arjunrn thanks for fixing the controller-manager crash! I've done some manual testing in a cluster and it works nicely. Added and removed behaviors which are respected.

Just two requests:

Please squash this PR before I give an LGTM. There are 21 commits and this repo doesn't squash for you!
Please provide a pointer to the E2E test for this. @arjunrn I think you have one already. But I want to be sure we have something that will guard against a bug like the controller-manager crashing.

Signed-off-by: Arjun Naik <arjun@arjunnaik.in>

josephburnett · 2019-12-11T08:38:53Z

/lgtm

I checked and there is no difference between the latest squash and the unsquashed changes I tested. Nice work!

josephburnett · 2019-12-11T09:00:56Z

/hold cancel

arjunrn · 2019-12-11T10:09:11Z

/test pull-kubernetes-kubemark-e2e-gce-big

saschagrunert · 2020-01-06T19:17:14Z

Hey @gliush 👋, Im currently looking at the latest generated releases notes and saw that this note needs to either state the user facing change or set it to NONE. Do you think you can change the PR description for that? The next re-generation of the notes will fix that issue on our side later on. :)

gliush · 2020-01-07T08:05:43Z

@saschagrunert : I've updated the release notes, could you check, please? I'm a little bit concerned about the style and the amount of details.

saschagrunert · 2020-01-07T08:10:28Z

Thank you @gliush 🙏 I assume the API is new and we should state that to the user as well, like:

Added the HPA API, which allows scale ...

So in general it is looking good, but we have include information like mentioned here: https://github.com/saschagrunert/community/blob/master/contributors/guide/release-notes.md#contents-of-a-release-note

gliush · 2020-01-07T08:31:45Z

@saschagrunert : thank you! Done!

saschagrunert · 2020-01-07T08:32:50Z

Wonderful, thank you again! :)

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 25, 2019

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 25, 2019

k8s-ci-robot added the sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. label Feb 25, 2019

k8s-ci-robot requested review from erictune and sttts February 25, 2019 13:58

k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 25, 2019

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 25, 2019

gliush mentioned this pull request Feb 25, 2019

Configurable scale velocity for HPA kubernetes/enhancements#853

Closed

mwielgus mentioned this pull request Feb 28, 2019

[WIP] Add scale up/down limits for HPA #71549

Closed

mwielgus assigned thockin and mwielgus Feb 28, 2019

mwielgus mentioned this pull request Feb 28, 2019

Remove ScaleUpLimitFactor for faster scaling #73890

Closed

thockin reviewed Mar 1, 2019

View reviewed changes

staging/src/k8s.io/api/autoscaling/v2beta2/types.go Outdated Show resolved Hide resolved

staging/src/k8s.io/api/autoscaling/v2beta2/types.go Outdated Show resolved Hide resolved

mwielgus reviewed Mar 1, 2019

View reviewed changes

staging/src/k8s.io/api/autoscaling/v2beta2/types.go Outdated Show resolved Hide resolved

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2019

thockin reviewed Nov 15, 2019

View reviewed changes

josephburnett reviewed Nov 21, 2019

View reviewed changes

josephburnett mentioned this pull request Nov 21, 2019

KEP on Configurable tolerance for Autoscalers kubernetes/enhancements#1372

Closed

gliush and others added 5 commits December 10, 2019 20:37

Introduces all API changes needed for Configurable HPA PR

141eaf7

Adds validation rules and proper defaults

5c70cda

Adds the algorithm implementation for the Configurable HPA

27ffe43

Generates boilerplate code

ac23d55

Adds tests

8ab2262

Signed-off-by: Arjun Naik <arjun@arjunnaik.in>

arjunrn mentioned this pull request Dec 15, 2019

Configurable Scaling for the HPA kubernetes/website#18157

Merged

saschagrunert mentioned this pull request Jan 6, 2020

First release_notes_draft.md of k/k 1.18.0-alpha.1 kubernetes/sig-release#937

Merged

arjunrn mentioned this pull request Feb 3, 2020

Update version for Configuration HPA scaling enhancement kubernetes/website#18965

Merged

DirectXMan12 mentioned this pull request Apr 20, 2020

HPA conversion serializes internal struct to annotation #89964

Open

silenceper mentioned this pull request Sep 16, 2020

Initial readiness delay with metrics api and prometheus metrics kedacore/keda#1163

Closed

gjtempleton mentioned this pull request Nov 20, 2020

Updated Scaling Behavior KEP to reflect implementation kubernetes/enhancements#2159

Merged

Ritikaa96 mentioned this pull request Jul 20, 2023

Missing docs in the HPA Scaling policies kubernetes/website#42111

Closed

Conversation

gliush commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Feb 25, 2019

Uh oh!

k8s-ci-robot commented Feb 25, 2019

Uh oh!

gliush commented Feb 25, 2019

Uh oh!

gliush commented Feb 25, 2019

Uh oh!

thockin commented Mar 1, 2019

Uh oh!

thockin commented Mar 1, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thockin commented Mar 1, 2019 via email

Uh oh!

mwielgus commented Mar 1, 2019

Uh oh!

josephburnett commented Nov 14, 2019

Uh oh!

josephburnett commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josephburnett commented Nov 14, 2019

Uh oh!

thockin left a comment

Choose a reason for hiding this comment

Uh oh!

thockin Nov 15, 2019

Choose a reason for hiding this comment

Uh oh!

josephburnett Nov 21, 2019

Choose a reason for hiding this comment

Uh oh!

gliush commented Nov 26, 2019

Uh oh!

k8s-ci-robot commented Dec 8, 2019

Uh oh!

josephburnett commented Dec 9, 2019

Uh oh!

josephburnett commented Dec 11, 2019

Uh oh!

josephburnett commented Dec 11, 2019

Uh oh!

arjunrn commented Dec 11, 2019

Uh oh!

saschagrunert commented Jan 6, 2020

Uh oh!

gliush commented Jan 7, 2020

Uh oh!

saschagrunert commented Jan 7, 2020

Uh oh!

gliush commented Jan 7, 2020

Uh oh!

saschagrunert commented Jan 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

gliush commented Feb 25, 2019 •

edited

Loading

josephburnett commented Nov 14, 2019 •

edited

Loading