Skip to content

Configurable scale velocity for HPA#883

Merged
k8s-ci-robot merged 12 commits intokubernetes:masterfrom
gliush:configurable-scale-speed-for-hpa
May 1, 2019
Merged

Configurable scale velocity for HPA#883
k8s-ci-robot merged 12 commits intokubernetes:masterfrom
gliush:configurable-scale-speed-for-hpa

Conversation

@gliush
Copy link
Copy Markdown
Contributor

@gliush gliush commented Mar 7, 2019

No description provided.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 7, 2019
@k8s-ci-robot k8s-ci-robot requested review from jdumars and mwielgus March 7, 2019 17:20
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @gliush. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. labels Mar 7, 2019
@jdumars
Copy link
Copy Markdown
Contributor

jdumars commented Mar 7, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 7, 2019
@mwielgus
Copy link
Copy Markdown
Contributor

@thockin Could you review the api change here?

* Adds "delaySeconds" field into the HPA constraints
* Makes sure that it could be configured by the "stabilization window"
command line argument
* Adds user story
* Fixes typos
Fixes typos
Use another default values
Some additional explanations added
Also adds new section for the delay option
@josephburnett
Copy link
Copy Markdown
Contributor

/lgtm
@thockin @mwielgus I think this is good-to-go. Could you take a look?

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@josephburnett: changing LGTM is restricted to assignees, and only kubernetes/enhancements repo collaborators may be assigned issues.

Details

In response to this:

/lgtm
@thockin @mwielgus I think this is good-to-go. Could you take a look?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gliush
Copy link
Copy Markdown
Contributor Author

gliush commented Apr 29, 2019

@josephburnett:
We have a KEP freeze tomorrow, Apr 30 for k8s-1.15 (#853 (comment))
Is it possible to have this KEP approved so that I can merge it and start working on the implementation to make it part of 1.15?

@josephburnett
Copy link
Copy Markdown
Contributor

@gliush it's not up to me whether it's approved or not. I've already pinged @thockin and @mwielgus that it's ready for review. 🤞

@mwielgus
Copy link
Copy Markdown
Contributor

mwielgus commented May 1, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 1, 2019
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gliush, josephburnett, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit f74637d into kubernetes:master May 1, 2019
@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 1, 2019

a couple notes for clarity:

  • this merged still in provisional state, and the expectation is that things will be in implementable by feature freeze
  • I didn't see an ack from @thockin on the updates after his review (that doesn't necessarily have to happen by feature freeze, but an API review ack is still needed before an implementation merges)

@thockin
Copy link
Copy Markdown
Member

thockin commented May 1, 2019

A KEP should not be a full API review, so it seems appropriate that this merge without that final approval. It would be bad if the ultimate API was radically different than what is in the KEP, but the reality is that APIs almost always evolve as they are implemented :)

Re-reading now.

Copy link
Copy Markdown
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly API comments to sort through as you finalize the UX. This is a complicated API, so I'll encourage you to seek out simplifications, even at the cost of some flexibility. We can always add stuff, but removing is very hard.


- `constraints`:
- `scaleUp`:
- `percent = 900` (i.e., to increase the number of pods 10 times per minute is ok).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For final API we may want to try a few variants to see what UX works best and is least confusing. E.g. if I want to allow 10x growth, does it make sense to say "grow by 900%" or "grow to 1000%" ?

For examples like "grow by 100%" it seems pretty obvious, just less so at larger numbers :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer "grow to 1000%". It makes the math more obvious: maxAllowed = current * (percent / 100)

type HPAScaleConstraintRateValue struct {
Pods *int32
Percent *int32
PeriodSeconds *int32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this smoothed over the period or done at edges? E.g. 3 pods per minute could be 1 pod per 20 seconds or 3 pods all at once after 60 seconds. Just specify the behavior.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok for pods and percent to share a period? e.g. do I need to be able to specify "3 pods per minute or 100% per 20 seconds" ?

Create an HPA with the following constraints:

- `constraints`:
- `scaleDown`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little clunky -- if there are ever new modes here (I don't have even a hypothetical example, but pretend :), then any YAML that carries this payload will not have that new field.

There are other API conventions being strengthened around one-of sets, which want a discriminator field, so maybe this looks like:

constraints:
  scaleDown:  # 3 pods per 10 minutes
    policy: Pods
    value: 3
    periodSeconds: 600
constraints:
  scaleDown:  # no scale-down allowed
    policy: Disabled

Or if pods and percent are not mutually exclusive (?) then something like:

constraints:
  scaleDown:  {} # no scale-down allowed

This changes the defaults from being on the leaf fields to being on the struct. If scaleDown is not specified, the default value is { percent: 100, pods: 4 }, but if it is specified, the fields inside default to 0.

Something like that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gliush this is worth considering for kubernetes/kubernetes#74525. Just wanted to make sure you didn't miss this feedback.

- `constraints`:
- `scaleDown`:
- `pods = 5`
- `delaySeconds = 600`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is delay specific to down-scaling or also for up? The YAML listed here is confusing


```golang
type HPAScaleConstraintValue struct {
Rate *HPAScaleConstraintRateValue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean if this is not specified? Make sure to document and think through all optionality

MinReplicas *int32
MaxReplicas int32
Metrics []MetricSpec
Constraints *HPAScaleConstraints
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I still dislike "constraints" as a term here. "policy" or somethng makes more sense to me, but I probably won't fight very hard on this

```golang
type HPAScaleConstraintValue struct {
Rate *HPAScaleConstraintRateValue
DelaySeconds *int32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand, this is not a "delay" as much as a "window" ? Please think about the name that best captures the semantics.

Also document bounds. Can I set a delay of 14 days? How much buffer are we willing to allocate (and lose if the controller crashes) ?


Example for `CurReplicas = 10` and HPA controller cycle once per a minute:

- First 9 minutes the algorithm will do nothing except gathering recommendations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these stored? If the controller goes down, is it all lost? That should feedback into our API limits so we don't promise to store too much, and then cause a huge problem when the controller dies for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants