proposal for live and in-place vertical scaling by YoungjaeLee · Pull Request #1719 · kubernetes/community

YoungjaeLee · 2018-02-01T20:13:33Z

This is the proposal for live and in-place vertical scaling.

The related feature is at kubernetes/enhancements#21.

Also, the original proposal that has been presented at the resource-management WG is at https://docs.google.com/document/d/1Q_Aq4khL2Kjbvmwzok8jukF2HQl31LyM65jK93yfTII/edit?ts=59834e18#heading=h.qb2t0ak4gvki.

k8s-ci-robot · 2018-02-01T20:13:35Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mwielgus · 2018-02-02T00:02:12Z

Sorry, please disregard my previous comment. I thought this was another VPA design.

mwielgus · 2018-02-02T01:06:35Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+Also, the 'API server' restores the resource requirements of each container of the PodSpec to the original and writes the revised PodSpec to ETCD to communicate with the Scheduler.
+This is because at this moment the PodSpec on ETCD shouldn’t be updated with new resource requirements.
+
+For a pod with ResizeRequested, the 'Scheduler' checks if the node on which the pod currently runs has enough resources to resize the pod.


This is slightly more complex. With the addition of priorities, a resource request change for a high priority pod should preempt low priority pods if needed.

See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ for more details.

Thanks for pointing it. Currently, the pod-priority isn't covered in this proposal as it was not available when we started this work, but, we're planing to support pod-priority. Basically we think that in vertical scaling, the scheduler also should follow the current policy as you mentioned. Now, I'm looking at source codes to understand how the scheduler works with pod-priority, and once we figure out how to support pod-priority in vertical scaling, we'll update the proposal accordingly.

mwielgus · 2018-02-02T01:09:29Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+type ResizeRequest struct {
+    RequestStatus ResizeStatus
+    NewResources  []ResourceRequirements // indexed by containers’ index


This should not be rely on pod index. Use container name.

OK, got it. I'll fix it.

mwielgus · 2018-02-02T01:20:46Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+If used, this would be included as part of the patch or appropriate update command providing the spec update for the resize.
+It would indicate the preference of user at the time of resize.
+Specifically, Restart for `resizeAction` would indicate the pod be restarted for the corresponding resizing of resource(s), LiveResize would indicate the pod not be restarted the resize be realized live, and LiveResizePreferred would indicate that the resize be realized preferrably live but if that fails for any reason to accomplish it with a restart.


Preferred method should go ResizeRequest.

The preferred method is specified by clients like a user or a controller. But, 'ResizeRequest' is not a field that is intended to change by clients. It is for holding new values of ResourceRequirement and ResizeStatus, which are managed by the apiserver. Could you tell me why you think the preferred method should go ResizeRequest ?

mwielgus · 2018-02-02T01:24:54Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+ResizedAccepted and ResizedRejected means that the requested resource resizing is accepted and rejected, respectively, by the Scheduler.
+The NewResources is an array indexed by a container’s index and its each entry holds new resource requirements of a container that needs to resize.
+
+Given a new PodSpec with new resource requirements from a client, first the 'API server' validates it.


Who and when removes/cleans ResizeRequest in a pod?

the same doubt here, the kubelet will remove or the custom controller will do?

In the current implementation, after resizing is accepted by the scheduler, the apiserver cleans NewResource of ResizeRequest. But, the ResizeStatus remains to be "ResizedAccepted".

mwielgus · 2018-02-02T01:27:05Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+type Resizing struct {
+    metav1.TypeMeta
+    metav1.ObjectMeta
+    Request ResizeRequest


I guess we should follow spec+status approach in all api objects.

for me, I would regard Resizing as kind of action not resource, a little bit tricky to define this struct.

and we may build command like kubectl resize?

As @adohe mentioned, "Resizing" is a kind of action(?) like Binding, which has the same fields.

We think that resizing is just another pod spec update, so prefer to utilize the current podspec update methods like kubectl apply, patch, etc. instead of introducing a new kubectl command.

We think that resizing is just another pod spec update, so prefer to utilize the current podspec update methods like kubectl apply, patch, etc. instead of introducing a new kubectl command.

I agree with this

currently, the fact that a node is running a pod implies it has accepted the resource requests the pod made.

this is proposing introducing a state in which a node is running a pod with one set of resources, but may or may not have accepted the pod's currently requested resources. that seems to imply:

resources should become part of the pod/container status

when spec....resources do not match status...resources, then the node has not seen/accepted/adjusted to the newly requested resources

the node should have some way in status to indicate it has seen, but has refused to accept, the newly requested resources, and why

@liggitt I would think that container status should include current resources, still don't understand why CRI ContainerStatus definition not include container resources.

mwielgus · 2018-02-02T01:28:17Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+}
+```
+
+Resizing has the metadata of a pod to resize and a value of ResizeRequest that holds the status of a resizing request, which indicates whether the resizing is feasible or not, and new resource requirements of the pod.


Why there is ResizeRequest in PodSpec and as a separate object?

I think I don't get it. Are you asking about why we need ResizeRequest in Resizing, which is already there in PodSpec ??

Yes, why is it in both places.

mwielgus · 2018-02-02T01:30:09Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+    ConditionRequested ConditionStatus = "Requested"
+    ConditionAccepted  ConditionStatus = "Accepted"
+    ConditionRejected  ConditionStatus = "Rejected"
+    ConditionDone      ConditionStatus = "Done"


How long is the pod status kept?

The ConditionRequest and ConditionAccepted are intermediate states. So, if the scheduler and kubelet are operational, the pod status will be changed to ConditionRejected or ConditionDone. Once the pod status becomes ConditionRejected or ConditionDone, it won't be changed unless a new resizing request comes in.

derekwaynecarr

I want this feature, but I prefer we tackle this in 1.11+ as hugepages, device plugins, and exclusive cores are finished. I am also not sure I like that the original resource requirement values are lost prior to the kubelet acknowledging that a resize is complete. I wonder if we added a new resource requirement section that allowed edits, and added a new sub resource that only the kubelet was allowed to use to change the original resource requirement once realized would work better.

derekwaynecarr · 2018-02-02T01:18:21Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+* LiveResizeable.
+
+This attribute will be available per resource (such as cpu, memory) and so is adequate to indicate whether the workload can handle and prefer a change in each resource’s allocation for it without restarting.
+With potentially multiple containers and multiple resizeable resources for each in a Pod, the response to an update of the pod spec will be determined by the a precedence order among the attribute values with RestartOnly dominating LiveResizeable, i.e., if two resources have been resized in the update to the spec and one of them has a policy of RestartOnly then the pod would be restarted to realize both updates.


For clarity, can you distinguish Container restart versus pod restart? From the yaml below, I think you are referring to Container restart since that is all that has resource requirements. Pod restart terminology to me brings extra baggage for things like init containers.

I am also wondering this, a Pod is a collection of containers, if we just adjust resource for container A, shall we restart the Pod or just restart the container A?

Actually, our prototype implementation supports container restart (at pod-level resizing). But, we thought that in this proposal it is better to focus on adding a live-and-in-place resizing feature at StatefulSet level, because if we go with container start as well, it seems like to introduce too much things at one time. But, we definitely see the value of Container restart. So, if most of you agree/want to add a Container restart option, we're happy to do that.

I could see more value if we could support Container level restart.

derekwaynecarr · 2018-02-02T01:24:39Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+        memory: RestartOnly
+```
+
+For the above example, if there is a change to cpu request or limit it can be vertically scaled only if the memory request and limit remained the same, otherwise the RestartOnly policy for memory would override the policy for CPU, and the Pod (container, if container-alone restart is allowed) would need to be restarted.


I am inclined to deny a resize request that changed the pods QoS class. Noting here also concerns about desired G pod characteristics for NUMA, exclusive cores (if on a node with static cpu manager), hugepages, and devices. It seems for some of those options RestartOnly is the only viable option.

Container storage is another area

for exclusive cores, why RestartOnly should be the only viable option?

what if I update pod spec to change its QoS class from BestEffort to Guaranteed?

As mentioned below at 'Related issues' section, QoS class change by live-and-in-place resizing is not supported because in order to change QoS class of a pod, the parent directory of the cgroup directory of the pod also need to change, which is not possible.

For NUMA, CPU manager, and other stuffs which were introduced after 1.7, still we're working to make sure that our vertical scaling feature works well with them. Perhaps, could you provide some insights on problems that might come ??

derekwaynecarr · 2018-02-02T01:26:19Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+    limits:
+        cpu: 1000m
+        memory: 1Gi
+    resizePolicy:


I think for backwards compatibility we may want to assume a default resize policy is “no” for all resources unless explicitly specified.

IIUC, a default RestartOnly resize policy should still keep backwards compatibility.

This is about a resize policy for pods associated with a statefulset. So, As @adohe mentioned, I think the default should be restart.

Do we need compat here? Resources are not mutable today, right? I'd hope we can make resize-in-plase-if-possible the default.

For now, resources are immutable, and we need to soften the validation if we decide to support resize.

derekwaynecarr · 2018-02-02T01:28:42Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+## Desired Approach
+
+Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
+Logs associated with the pod will capture the failure/success of the resize command.


I am not sure I follow this. Can you elaborate?

I am also confused about this, use logs to capture the resize command result? any consider of PodCondition or Events?

Basically, we see resizing a pod as a just another pod spec update. So, such resizing is requested by the existing spec update methods like kubectl patch or apply, which are currently used to change statefulset specs.

What we mean by 'Logs' is general logs/messages generated by k8s components (like apiserver, kubelet, scheduler, and so on) and that we check with kubectl describe. And yes, the status of resizing can be also checked with a new PodCondition, called PodResized.

derekwaynecarr · 2018-02-02T01:29:38Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
+Logs associated with the pod will capture the failure/success of the resize command.
+The controller will continue to attempt the update to the spec while there is a difference between the current size and size in updated spec.
+If an update is partially successful, user can know this from the logs and attempt to rectify the situation by submitting new updates (that can restore original size(s) or go for a size feasible on all the nodes). 


Is this their literal Container application log?

No, it's logs/messages generated by k8s components.

derekwaynecarr · 2018-02-02T01:34:35Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+The pod condition of PodResized represents the status of the resizing process.
+The PodResized Condition is updated by the `Kubelet` according to the ResizeStatus, which is updated by the 'API server'.
+
+Basically, when the ResizeStatus is changed, the `Kubelet` updates the PodResized condition accordingly.


What happens if it’s rejected? Is the resource reset or does it look like the resource is consumed to the scheduler ?

In the current design, the rejection here is made only by the scheduler. So, the resource allocation/consumption isn't changed at this point (because the resizing is rejected.) The kubelet just updates the PodResized condition accordingly.

derekwaynecarr · 2018-02-02T01:37:27Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+* **A new additional hash, called `expectedHashNoResources`, added for `Kubelet` to detect a change on resource requirements**
+
+In order to watch resource requirement changes efficiently, a new additional hash is added to kubecontainer.ContainerStatus (and that is also stored as a one of the container’s labels).


I am confused here. It feels like we are trying to combine two pieces of information in a single field, and this hash is the outcome. We are basically saying we need a desired resource requirement (what we have today and what we are looking to change) and an actual enforced resource requirement which kubelet acknowledged.

Uses of persisted hashes have proven problematic between releases and in the face of version skew. Are we sure we want to add another one?

I understand that the original hash is introduced to detect the pod spec changes in an efficient way, like avoiding one-by-one field comparisons. So, I want to keep this behavior, meaning not to cause additional overhead in pod spec changes detection even when the resource requirements in a pod spec is mutable. So, basically, with this new hash, the changes of other fields in a pod spec can be detected by simple has comparison.

derekwaynecarr · 2018-02-02T01:38:33Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+# Related issues
+
+1. QoS class change by resize is not supported.


Ok, we should call this out earlier in the doc :-)

derekwaynecarr · 2018-02-02T01:40:05Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+2. Memory-resizing to change a request value might not take effect for Burstable pods.
+
+For Burstable pods, a request value for memory resource determines the value of a score for the OOM killer, but Docker doesn’t support to change dynamically the score of an existing container.


aside from docker, we need to think of other runtimes and platforms like windows

derekwaynecarr · 2018-02-02T01:42:46Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+So, the change to a memory request value doesn’t take effect for Burstable pods on the OOM killer’s behavior.
+But, for Guaranteed and Best-effort pods, this is not an issue because the score is fixed regardless of its memory request value. (in this case, the memory request value is used only for admission control by Scheduler) 
+
+3. Memory-resizing to decrease its limit may fail on the Kubelet in some circumstances.


We have a control loop that you can enable at the QoS tier to keep trying to induce reclaim. I worry that these issues unless really well understood will cause confusion for users and operators.

adohe-zz · 2018-02-02T16:20:38Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+Resources needed by containers can change over time for a variety of reasons - moving from live-test mode to production usage, change in user load or dataset sizes each of which again might come about for a variety of reasons.
+`Statefulset` supports the capability to change Request and Limit values specified for a container through supported pod spec update methods.
+However, they currently require the pods be restarted to run with the new resource sizes.


IIRC, if change Requests and Limits values for a container of Statefulset, it will kill the old pod and create a new one with new resource requirement.

Yes, the current Statefulset controller recreates pods if its resource allocation is changed. So, in this proposal, we modify the controller to leverage the proposed live resizing feature.

adohe-zz · 2018-02-02T16:44:03Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+    resizePolicy:
+        cpu: LiveResizeable
+        memory: RestartOnly
+```


also any consideration of extended resource?

No, for now, other resources are not in consideration.

adohe-zz · 2018-02-04T04:01:22Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+    resizePolicy:
+        cpu: LiveResizeable
+        memory: RestartOnly
+```


I am a little bit confused about the resizeAction annotation, under which case should I respect this value? and how to coordinate resizeAction with resizePolicy?

The resizePolicy is a kind of the characteristics of resource that describes whether the resource can be resized live or not. The resizeAction is a way that a user/client want to take when resizing the resource. So, for example, even though CPU resource is specified LiveResizeable, but for some reason, if a user specifies the resizeAction for CPU to Restart, resizing of CPU is done by restart. Of course, for a resource whose resizePolicy is RestartOnly, its resizeAction cannot be LiveResize/LiveResizePreferred.

I see your point, thanks for kindly explanation.

adohe-zz · 2018-02-04T04:05:40Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+## Desired Approach
+
+Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.


which means we need to broaden pod update validation.

Yes, I modified podspec update validation accordingly.

adohe-zz · 2018-02-04T04:12:02Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
+Logs associated with the pod will capture the failure/success of the resize command.
+The controller will continue to attempt the update to the spec while there is a difference between the current size and size in updated spec.


here current size means the cached pod spec?

adohe-zz · 2018-02-04T04:14:32Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+    ResizeAccepted ResizeStatus = “Accepted”
+    ResizeRejected ResizeStatus = “Rejected”
+    ResizeNone      ResizeStatus = "None"
+)


suggest add some comments here, have no idea what ResizeNone mean

This embedded "transaction" is a very different concept from most everything else, isn't it? Do we have precedent for this? It needs to be throroughly considered for API..

To be honest, I could not find any precedent, this is just what I worry about @deads2k could you please give thoughts here?

I feel more like the semantic is "the resource request for this pod WILL change, or the pod will die". The transaction is not needed

adohe-zz · 2018-02-04T04:17:34Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+}
+```
+
+ResizeRequest has two variables, RequestStatus and NewResources.


contains RequestStatus in spec? Seems like an anti-design

adohe-zz · 2018-02-04T04:19:45Z

@bgrant0607 @smarterclayton @deads2k would you please take a look at this proposal? I think we did see real needs of in-place Pod resource update.

adohe-zz · 2018-02-05T12:00:07Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+A new API, Resizing, for the 'scheduler' is introduced:
+
+```go
+// Resizing resizes the resources allocated to a pod


if Resizing just resizes pod resource, this could be a subResource of pod.

thockin · 2018-02-24T05:49:57Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+# API and Usage
+
+To express the policy for resizing in a pod spec, we introduce resource attribute `resizePolicy` with the following choices for value:


Maybe I missed the discussion, but do we really need this to be so configurable? Or can we make the default "say nothing" be the best? E.g. If the user says nothing, always try to resize in place. If that can't be achieved, which can't always be known statically, then do a pod restart.

If the user NEEDS a restart, they can specify that, but make it the abnormal case.

?

Depending on the nature of workloads and resources, their feasible policy is different so the policy needs to be configurable by users. For example, we can't resize the size of memory allocated to JVM dynamically, so in this case, the policy needs to be restartOnly. Also, some workload doesn't tolerate restarts, so in such case, the policy is to be configured as LiveResizeOnly.

hex108 · 2018-03-08T10:12:15Z

contributors/design-proposals/live-and-inplace-vertical-scaling.md

+
+Resizing has the metadata of a pod to resize and a value of ResizeRequest that holds the status of a resizing request, which indicates whether the resizing is feasible or not, and new resource requirements of the pod.
+
+Once the 'Scheduler' determine whether a resource resizing on a pod is feasible, or not, it notifies to the API server via this Resizing API.


Could we add a policy that scheduler just waits for another retry to determine instead of rejecting it immediately? And user could specify this policy in ResizeRequest.

mhausenblas · 2018-03-13T09:52:16Z

Thanks for doing this, overall LGTM, only thing I'd suggest is to add VPAs as one of the primary use cases. We would really benefit from this, a lot.

k8s-ci-robot · 2018-04-11T19:00:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: wojtek-t

Assign the PR to them by writing /assign @wojtek-t in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

contributors/design-proposals/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Replaced action options, LiveResize and LiveResizePreferred, with new ones, InPlaceResize and InPlaceResizePreferred. Revised the process of pod resizing so that a pod spec would be updated with new resource configurations after Kubelet confirms that the actual change of resource allocation is succefully made at container-level.

adohe-zz · 2018-05-31T08:07:18Z

@YoungjaeLee any more progress about this? I am revisiting this proposal, thinking about whether the scheduler needs to participant in the resize progress, kubelet will do this check before accept the pod resource update.

k8s-ci-robot · 2018-05-31T08:07:21Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

karthickrajamani · 2018-07-20T15:25:38Z

@adohe - we have an implementation that is consistent with the proposal as described on GitHub currently that works with the new priority feature. The scheduler does participate in the resizing to ensure a consistent picture of cluster resources exist at the scheduler at all times. However, we haven't seen the clearance/go-ahead from @derekwaynecarr though @YoungjaeLee has responded to his queries and updated proposal accordingly. @derekwaynecarr - now that 1.11 is out, I hope this proposal can be taken up and our implementation examined for inclusion.

thockin · 2018-09-18T04:37:19Z

Is this proposal still alive?

karthickrajamani · 2018-09-18T15:49:08Z

Is this proposal still alive?

@thockin, yes this is an active proposal. We have addressed changes requested earlier and incorporated in our prototype available on GitHub, waiting on clearance/acceptance to know the next steps.

fabiand · 2018-10-18T10:28:32Z

What's the state of this feature? Is there already some partial support for this in 1.12?

karthickrajamani · 2018-10-18T12:59:52Z

What's the state of this feature? Is there already some partial support for this in 1.12?

@fabiand, update of the resource requests and limits with restart is supported but not live, in-place update.

fabiand · 2018-10-18T13:20:02Z

Thanks @karthickrajamani - Are there still plans to do it in-place?

karthickrajamani · 2018-10-18T13:43:16Z

Thanks @karthickrajamani - Are there still plans to do it in-place?

Yes, we currently have a working prototype and would like to see this show up in 1.13 or latest 1.14.

fabiand · 2018-10-18T14:05:02Z

That's cool. We are looking forward to this and aim at doing KubeVirt VM CPU and memory hot-plugging based on this feature - just fyi.

kgolab · 2018-11-26T09:51:33Z

Hi there,

@YoungjaeLee, @karthickrajamani, first of all thanks for this proposal.

I'm part of the Vertical Pod Autoscaler team.
As already noted we'd also like to have live / in-place resources update to make the VPA actuation less disruptive.

I really like your proposal but we'd also like to make it a bit more generic, additionally taking into account ideas from https://docs.google.com/document/d/18K-bl1EVsmJ04xeRq9o_vfY2GDgek6B6wmLjXw-kos4/edit?ts=5b96bf40 (see also https://groups.google.com/forum/#!msg/kubernetes-sig-scheduling/UnIhGOKpohI/VtUfVWgFBwAJ)

To that end we've come up with the third proposal and I've started a KEP (#2908) to make the whole process more formal & visible.

The core idea behind the proposal in KEP is to make PodSpec mutable with regards to Resources, denoting desired resources. Additionally PodStatus is extended to provide information about actual resource allocation.

I'd love you to have a look there and let me know what you think of our idea, in particular whether it fits your use-case.
The KEP is not fully flashed out yet, I'd like to hear from you first.

Oh, and if you'd like to co-author that KEP it would be great, after all the KEP builds on this very PR.

Regards,
K.

bgrant0607 · 2019-01-09T05:06:28Z

Please close this and either fold it into the related KEP or create a new KEP in the enhancements repo.

fejta-bot · 2019-04-09T15:55:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bgrant0607 · 2019-04-09T16:41:17Z

/close

k8s-ci-robot · 2019-04-09T16:41:19Z

@bgrant0607: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

the first draft of the proposal for live and inplace vertical scaling

3f29029

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 1, 2018

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 1, 2018

k8s-ci-robot requested review from smarterclayton and thockin February 1, 2018 20:13

YoungjaeLee mentioned this pull request Feb 1, 2018

Vertical Scaling of Pods kubernetes/enhancements#21

Closed

18 tasks

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 1, 2018

mwielgus reviewed Feb 2, 2018

View reviewed changes

derekwaynecarr suggested changes Feb 2, 2018

View reviewed changes

adohe-zz reviewed Feb 2, 2018

View reviewed changes

adohe-zz reviewed Feb 4, 2018

View reviewed changes

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 5, 2018

adohe-zz reviewed Feb 5, 2018

View reviewed changes

thockin reviewed Feb 24, 2018

View reviewed changes

hex108 reviewed Mar 8, 2018

View reviewed changes

mhausenblas mentioned this pull request Mar 13, 2018

Support in-place updates of container resource constraints cri-o/cri-o#719

Closed

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 11, 2018

YoungjaeLee force-pushed the live-inplace-vertical-scaling branch from 89fb419 to a9aab4e Compare April 11, 2018 19:10

The repository of the working prototype is changed.

a76f180

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 31, 2018

thockin mentioned this pull request Nov 6, 2018

Add KEP skeleton & initial proposal for in-place update of Pod resources #2908

Closed

vinaykul mentioned this pull request Jan 12, 2019

KEP: in-place update of pod resources kubernetes/enhancements#686

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2019

k8s-ci-robot closed this Apr 9, 2019


		* A new additional hash, called `expectedHashNoResources`, added for `Kubelet` to detect a change on resource requirements

		In order to watch resource requirement changes efficiently, a new additional hash is added to kubecontainer.ContainerStatus (and that is also stored as a one of the container’s labels).


		# Related issues

		1. QoS class change by resize is not supported.


		2. Memory-resizing to change a request value might not take effect for Burstable pods.

		For Burstable pods, a request value for memory resource determines the value of a score for the OOM killer, but Docker doesn’t support to change dynamically the score of an existing container.


		## Desired Approach

		Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.


		# API and Usage

		To express the policy for resizing in a pod spec, we introduce resource attribute `resizePolicy` with the following choices for value:


		Resizing has the metadata of a pod to resize and a value of ResizeRequest that holds the status of a resizing request, which indicates whether the resizing is feasible or not, and new resource requirements of the pod.

		Once the 'Scheduler' determine whether a resource resizing on a pod is feasible, or not, it notifies to the API server via this Resizing API.

Conversation

YoungjaeLee commented Feb 1, 2018

Uh oh!

k8s-ci-robot commented Feb 1, 2018

Uh oh!

mwielgus commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YoungjaeLee Feb 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adohe-zz Feb 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwielgus Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekwaynecarr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adohe-zz Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mwielgus commented Feb 2, 2018 •

edited

Loading

YoungjaeLee Feb 6, 2018 •

edited

Loading

adohe-zz Feb 5, 2018 •

edited

Loading

mwielgus Feb 2, 2018 •

edited

Loading

adohe-zz Feb 2, 2018 •

edited

Loading