Skip to content

proposal for live and in-place vertical scaling#1719

Closed
YoungjaeLee wants to merge 3 commits intokubernetes:masterfrom
k8sresmgmt:live-inplace-vertical-scaling
Closed

proposal for live and in-place vertical scaling#1719
YoungjaeLee wants to merge 3 commits intokubernetes:masterfrom
k8sresmgmt:live-inplace-vertical-scaling

Conversation

@YoungjaeLee
Copy link
Copy Markdown

This is the proposal for live and in-place vertical scaling.

The related feature is at kubernetes/enhancements#21.

Also, the original proposal that has been presented at the resource-management WG is at https://docs.google.com/document/d/1Q_Aq4khL2Kjbvmwzok8jukF2HQl31LyM65jK93yfTII/edit?ts=59834e18#heading=h.qb2t0ak4gvki.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 1, 2018
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 1, 2018
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 1, 2018
@mwielgus
Copy link
Copy Markdown
Contributor

mwielgus commented Feb 2, 2018

Sorry, please disregard my previous comment. I thought this was another VPA design.

Also, the 'API server' restores the resource requirements of each container of the PodSpec to the original and writes the revised PodSpec to ETCD to communicate with the Scheduler.
This is because at this moment the PodSpec on ETCD shouldn’t be updated with new resource requirements.

For a pod with ResizeRequested, the 'Scheduler' checks if the node on which the pod currently runs has enough resources to resize the pod.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly more complex. With the addition of priorities, a resource request change for a high priority pod should preempt low priority pods if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it. Currently, the pod-priority isn't covered in this proposal as it was not available when we started this work, but, we're planing to support pod-priority. Basically we think that in vertical scaling, the scheduler also should follow the current policy as you mentioned. Now, I'm looking at source codes to understand how the scheduler works with pod-priority, and once we figure out how to support pod-priority in vertical scaling, we'll update the proposal accordingly.


type ResizeRequest struct {
RequestStatus ResizeStatus
NewResources []ResourceRequirements // indexed by containers’ index
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be rely on pod index. Use container name.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. I'll fix it.


If used, this would be included as part of the patch or appropriate update command providing the spec update for the resize.
It would indicate the preference of user at the time of resize.
Specifically, Restart for `resizeAction` would indicate the pod be restarted for the corresponding resizing of resource(s), LiveResize would indicate the pod not be restarted the resize be realized live, and LiveResizePreferred would indicate that the resize be realized preferrably live but if that fails for any reason to accomplish it with a restart.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred method should go ResizeRequest.

Copy link
Copy Markdown
Author

@YoungjaeLee YoungjaeLee Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred method is specified by clients like a user or a controller. But, 'ResizeRequest' is not a field that is intended to change by clients. It is for holding new values of ResourceRequirement and ResizeStatus, which are managed by the apiserver. Could you tell me why you think the preferred method should go ResizeRequest ?

ResizedAccepted and ResizedRejected means that the requested resource resizing is accepted and rejected, respectively, by the Scheduler.
The NewResources is an array indexed by a container’s index and its each entry holds new resource requirements of a container that needs to resize.

Given a new PodSpec with new resource requirements from a client, first the 'API server' validates it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who and when removes/cleans ResizeRequest in a pod?

Copy link
Copy Markdown

@adohe-zz adohe-zz Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same doubt here, the kubelet will remove or the custom controller will do?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, after resizing is accepted by the scheduler, the apiserver cleans NewResource of ResizeRequest. But, the ResizeStatus remains to be "ResizedAccepted".

type Resizing struct {
metav1.TypeMeta
metav1.ObjectMeta
Request ResizeRequest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should follow spec+status approach in all api objects.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, I would regard Resizing as kind of action not resource, a little bit tricky to define this struct.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we may build command like kubectl resize?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @adohe mentioned, "Resizing" is a kind of action(?) like Binding, which has the same fields.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think that resizing is just another pod spec update, so prefer to utilize the current podspec update methods like kubectl apply, patch, etc. instead of introducing a new kubectl command.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think that resizing is just another pod spec update, so prefer to utilize the current podspec update methods like kubectl apply, patch, etc. instead of introducing a new kubectl command.

I agree with this

currently, the fact that a node is running a pod implies it has accepted the resource requests the pod made.

this is proposing introducing a state in which a node is running a pod with one set of resources, but may or may not have accepted the pod's currently requested resources. that seems to imply:

  • resources should become part of the pod/container status
  • when spec....resources do not match status...resources, then the node has not seen/accepted/adjusted to the newly requested resources
  • the node should have some way in status to indicate it has seen, but has refused to accept, the newly requested resources, and why

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt I would think that container status should include current resources, still don't understand why CRI ContainerStatus definition not include container resources.

}
```

Resizing has the metadata of a pod to resize and a value of ResizeRequest that holds the status of a resizing request, which indicates whether the resizing is feasible or not, and new resource requirements of the pod.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is ResizeRequest in PodSpec and as a separate object?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't get it. Are you asking about why we need ResizeRequest in Resizing, which is already there in PodSpec ??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why is it in both places.

ConditionRequested ConditionStatus = "Requested"
ConditionAccepted ConditionStatus = "Accepted"
ConditionRejected ConditionStatus = "Rejected"
ConditionDone ConditionStatus = "Done"
Copy link
Copy Markdown
Contributor

@mwielgus mwielgus Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long is the pod status kept?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConditionRequest and ConditionAccepted are intermediate states. So, if the scheduler and kubelet are operational, the pod status will be changed to ConditionRejected or ConditionDone. Once the pod status becomes ConditionRejected or ConditionDone, it won't be changed unless a new resizing request comes in.

Copy link
Copy Markdown
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this feature, but I prefer we tackle this in 1.11+ as hugepages, device plugins, and exclusive cores are finished. I am also not sure I like that the original resource requirement values are lost prior to the kubelet acknowledging that a resize is complete. I wonder if we added a new resource requirement section that allowed edits, and added a new sub resource that only the kubelet was allowed to use to change the original resource requirement once realized would work better.

* LiveResizeable.

This attribute will be available per resource (such as cpu, memory) and so is adequate to indicate whether the workload can handle and prefer a change in each resource’s allocation for it without restarting.
With potentially multiple containers and multiple resizeable resources for each in a Pod, the response to an update of the pod spec will be determined by the a precedence order among the attribute values with RestartOnly dominating LiveResizeable, i.e., if two resources have been resized in the update to the spec and one of them has a policy of RestartOnly then the pod would be restarted to realize both updates.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, can you distinguish Container restart versus pod restart? From the yaml below, I think you are referring to Container restart since that is all that has resource requirements. Pod restart terminology to me brings extra baggage for things like init containers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also wondering this, a Pod is a collection of containers, if we just adjust resource for container A, shall we restart the Pod or just restart the container A?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, our prototype implementation supports container restart (at pod-level resizing). But, we thought that in this proposal it is better to focus on adding a live-and-in-place resizing feature at StatefulSet level, because if we go with container start as well, it seems like to introduce too much things at one time. But, we definitely see the value of Container restart. So, if most of you agree/want to add a Container restart option, we're happy to do that.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see more value if we could support Container level restart.

memory: RestartOnly
```

For the above example, if there is a change to cpu request or limit it can be vertically scaled only if the memory request and limit remained the same, otherwise the RestartOnly policy for memory would override the policy for CPU, and the Pod (container, if container-alone restart is allowed) would need to be restarted.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to deny a resize request that changed the pods QoS class. Noting here also concerns about desired G pod characteristics for NUMA, exclusive cores (if on a node with static cpu manager), hugepages, and devices. It seems for some of those options RestartOnly is the only viable option.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container storage is another area

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for exclusive cores, why RestartOnly should be the only viable option?

Copy link
Copy Markdown

@adohe-zz adohe-zz Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I update pod spec to change its QoS class from BestEffort to Guaranteed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below at 'Related issues' section, QoS class change by live-and-in-place resizing is not supported because in order to change QoS class of a pod, the parent directory of the cgroup directory of the pod also need to change, which is not possible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NUMA, CPU manager, and other stuffs which were introduced after 1.7, still we're working to make sure that our vertical scaling feature works well with them. Perhaps, could you provide some insights on problems that might come ??

limits:
cpu: 1000m
memory: 1Gi
resizePolicy:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for backwards compatibility we may want to assume a default resize policy is “no” for all resources unless explicitly specified.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, a default RestartOnly resize policy should still keep backwards compatibility.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about a resize policy for pods associated with a statefulset. So, As @adohe mentioned, I think the default should be restart.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need compat here? Resources are not mutable today, right? I'd hope we can make resize-in-plase-if-possible the default.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, resources are immutable, and we need to soften the validation if we decide to support resize.

## Desired Approach

Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
Logs associated with the pod will capture the failure/success of the resize command.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow this. Can you elaborate?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused about this, use logs to capture the resize command result? any consider of PodCondition or Events?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we see resizing a pod as a just another pod spec update. So, such resizing is requested by the existing spec update methods like kubectl patch or apply, which are currently used to change statefulset specs.

What we mean by 'Logs' is general logs/messages generated by k8s components (like apiserver, kubelet, scheduler, and so on) and that we check with kubectl describe. And yes, the status of resizing can be also checked with a new PodCondition, called PodResized.

Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
Logs associated with the pod will capture the failure/success of the resize command.
The controller will continue to attempt the update to the spec while there is a difference between the current size and size in updated spec.
If an update is partially successful, user can know this from the logs and attempt to rectify the situation by submitting new updates (that can restore original size(s) or go for a size feasible on all the nodes).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this their literal Container application log?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's logs/messages generated by k8s components.

The pod condition of PodResized represents the status of the resizing process.
The PodResized Condition is updated by the `Kubelet` according to the ResizeStatus, which is updated by the 'API server'.

Basically, when the ResizeStatus is changed, the `Kubelet` updates the PodResized condition accordingly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if it’s rejected? Is the resource reset or does it look like the resource is consumed to the scheduler ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current design, the rejection here is made only by the scheduler. So, the resource allocation/consumption isn't changed at this point (because the resizing is rejected.) The kubelet just updates the PodResized condition accordingly.


* **A new additional hash, called `expectedHashNoResources`, added for `Kubelet` to detect a change on resource requirements**

In order to watch resource requirement changes efficiently, a new additional hash is added to kubecontainer.ContainerStatus (and that is also stored as a one of the container’s labels).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. It feels like we are trying to combine two pieces of information in a single field, and this hash is the outcome. We are basically saying we need a desired resource requirement (what we have today and what we are looking to change) and an actual enforced resource requirement which kubelet acknowledged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses of persisted hashes have proven problematic between releases and in the face of version skew. Are we sure we want to add another one?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the original hash is introduced to detect the pod spec changes in an efficient way, like avoiding one-by-one field comparisons. So, I want to keep this behavior, meaning not to cause additional overhead in pod spec changes detection even when the resource requirements in a pod spec is mutable. So, basically, with this new hash, the changes of other fields in a pod spec can be detected by simple has comparison.


# Related issues

1. QoS class change by resize is not supported.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we should call this out earlier in the doc :-)


2. Memory-resizing to change a request value might not take effect for Burstable pods.

For Burstable pods, a request value for memory resource determines the value of a score for the OOM killer, but Docker doesn’t support to change dynamically the score of an existing container.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from docker, we need to think of other runtimes and platforms like windows

So, the change to a memory request value doesn’t take effect for Burstable pods on the OOM killer’s behavior.
But, for Guaranteed and Best-effort pods, this is not an issue because the score is fixed regardless of its memory request value. (in this case, the memory request value is used only for admission control by Scheduler)

3. Memory-resizing to decrease its limit may fail on the Kubelet in some circumstances.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a control loop that you can enable at the QoS tier to keep trying to induce reclaim. I worry that these issues unless really well understood will cause confusion for users and operators.


Resources needed by containers can change over time for a variety of reasons - moving from live-test mode to production usage, change in user load or dataset sizes each of which again might come about for a variety of reasons.
`Statefulset` supports the capability to change Request and Limit values specified for a container through supported pod spec update methods.
However, they currently require the pods be restarted to run with the new resource sizes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, if change Requests and Limits values for a container of Statefulset, it will kill the old pod and create a new one with new resource requirement.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current Statefulset controller recreates pods if its resource allocation is changed. So, in this proposal, we modify the controller to leverage the proposed live resizing feature.

resizePolicy:
cpu: LiveResizeable
memory: RestartOnly
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also any consideration of extended resource?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for now, other resources are not in consideration.

resizePolicy:
cpu: LiveResizeable
memory: RestartOnly
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit confused about the resizeAction annotation, under which case should I respect this value? and how to coordinate resizeAction with resizePolicy?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resizePolicy is a kind of the characteristics of resource that describes whether the resource can be resized live or not. The resizeAction is a way that a user/client want to take when resizing the resource. So, for example, even though CPU resource is specified LiveResizeable, but for some reason, if a user specifies the resizeAction for CPU to Restart, resizing of CPU is done by restart. Of course, for a resource whose resizePolicy is RestartOnly, its resizeAction cannot be LiveResize/LiveResizePreferred.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, thanks for kindly explanation.


## Desired Approach

Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which means we need to broaden pod update validation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I modified podspec update validation accordingly.


Any valid method to update the pod spec should be applicable for vertical scaling, e.g., using kubectl commands set, patch, apply, edit.
Logs associated with the pod will capture the failure/success of the resize command.
The controller will continue to attempt the update to the spec while there is a difference between the current size and size in updated spec.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here current size means the cached pod spec?

ResizeAccepted ResizeStatus = “Accepted”
ResizeRejected ResizeStatus = “Rejected”
ResizeNone ResizeStatus = "None"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest add some comments here, have no idea what ResizeNone mean

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This embedded "transaction" is a very different concept from most everything else, isn't it? Do we have precedent for this? It needs to be throroughly considered for API..

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I could not find any precedent, this is just what I worry about @deads2k could you please give thoughts here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel more like the semantic is "the resource request for this pod WILL change, or the pod will die". The transaction is not needed

}
```

ResizeRequest has two variables, RequestStatus and NewResources.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains RequestStatus in spec? Seems like an anti-design

@adohe-zz
Copy link
Copy Markdown

adohe-zz commented Feb 4, 2018

@bgrant0607 @smarterclayton @deads2k would you please take a look at this proposal? I think we did see real needs of in-place Pod resource update.

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 5, 2018
A new API, Resizing, for the 'scheduler' is introduced:

```go
// Resizing resizes the resources allocated to a pod
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Resizing just resizes pod resource, this could be a subResource of pod.


# API and Usage

To express the policy for resizing in a pod spec, we introduce resource attribute `resizePolicy` with the following choices for value:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed the discussion, but do we really need this to be so configurable? Or can we make the default "say nothing" be the best? E.g. If the user says nothing, always try to resize in place. If that can't be achieved, which can't always be known statically, then do a pod restart.

If the user NEEDS a restart, they can specify that, but make it the abnormal case.

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the nature of workloads and resources, their feasible policy is different so the policy needs to be configurable by users. For example, we can't resize the size of memory allocated to JVM dynamically, so in this case, the policy needs to be restartOnly. Also, some workload doesn't tolerate restarts, so in such case, the policy is to be configured as LiveResizeOnly.


Resizing has the metadata of a pod to resize and a value of ResizeRequest that holds the status of a resizing request, which indicates whether the resizing is feasible or not, and new resource requirements of the pod.

Once the 'Scheduler' determine whether a resource resizing on a pod is feasible, or not, it notifies to the API server via this Resizing API.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a policy that scheduler just waits for another retry to determine instead of rejecting it immediately? And user could specify this policy in ResizeRequest.

@mhausenblas
Copy link
Copy Markdown

Thanks for doing this, overall LGTM, only thing I'd suggest is to add VPAs as one of the primary use cases. We would really benefit from this, a lot.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 11, 2018
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: wojtek-t

Assign the PR to them by writing /assign @wojtek-t in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Replaced action options, LiveResize and LiveResizePreferred, with new
ones, InPlaceResize and InPlaceResizePreferred.

Revised the process of pod resizing so that a pod spec would be updated
with new resource configurations after Kubelet confirms that the actual
change of resource allocation is succefully made at container-level.
@YoungjaeLee YoungjaeLee force-pushed the live-inplace-vertical-scaling branch from 89fb419 to a9aab4e Compare April 11, 2018 19:10
@adohe-zz
Copy link
Copy Markdown

@YoungjaeLee any more progress about this? I am revisiting this proposal, thinking about whether the scheduler needs to participant in the resize progress, kubelet will do this check before accept the pod resource update.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 31, 2018
@karthickrajamani
Copy link
Copy Markdown

karthickrajamani commented Jul 20, 2018

@adohe - we have an implementation that is consistent with the proposal as described on GitHub currently that works with the new priority feature. The scheduler does participate in the resizing to ensure a consistent picture of cluster resources exist at the scheduler at all times. However, we haven't seen the clearance/go-ahead from @derekwaynecarr though @YoungjaeLee has responded to his queries and updated proposal accordingly. @derekwaynecarr - now that 1.11 is out, I hope this proposal can be taken up and our implementation examined for inclusion.

@thockin
Copy link
Copy Markdown
Member

thockin commented Sep 18, 2018

Is this proposal still alive?

@karthickrajamani
Copy link
Copy Markdown

Is this proposal still alive?

@thockin, yes this is an active proposal. We have addressed changes requested earlier and incorporated in our prototype available on GitHub, waiting on clearance/acceptance to know the next steps.

@fabiand
Copy link
Copy Markdown

fabiand commented Oct 18, 2018

What's the state of this feature? Is there already some partial support for this in 1.12?

@karthickrajamani
Copy link
Copy Markdown

What's the state of this feature? Is there already some partial support for this in 1.12?

@fabiand, update of the resource requests and limits with restart is supported but not live, in-place update.

@fabiand
Copy link
Copy Markdown

fabiand commented Oct 18, 2018

Thanks @karthickrajamani - Are there still plans to do it in-place?

@karthickrajamani
Copy link
Copy Markdown

Thanks @karthickrajamani - Are there still plans to do it in-place?

Yes, we currently have a working prototype and would like to see this show up in 1.13 or latest 1.14.

@fabiand
Copy link
Copy Markdown

fabiand commented Oct 18, 2018 via email

@kgolab
Copy link
Copy Markdown

kgolab commented Nov 26, 2018

Hi there,

@YoungjaeLee, @karthickrajamani, first of all thanks for this proposal.

I'm part of the Vertical Pod Autoscaler team.
As already noted we'd also like to have live / in-place resources update to make the VPA actuation less disruptive.

I really like your proposal but we'd also like to make it a bit more generic, additionally taking into account ideas from https://docs.google.com/document/d/18K-bl1EVsmJ04xeRq9o_vfY2GDgek6B6wmLjXw-kos4/edit?ts=5b96bf40 (see also https://groups.google.com/forum/#!msg/kubernetes-sig-scheduling/UnIhGOKpohI/VtUfVWgFBwAJ)

To that end we've come up with the third proposal and I've started a KEP (#2908) to make the whole process more formal & visible.

The core idea behind the proposal in KEP is to make PodSpec mutable with regards to Resources, denoting desired resources. Additionally PodStatus is extended to provide information about actual resource allocation.

I'd love you to have a look there and let me know what you think of our idea, in particular whether it fits your use-case.
The KEP is not fully flashed out yet, I'd like to hear from you first.

Oh, and if you'd like to co-author that KEP it would be great, after all the KEP builds on this very PR.

Regards,
K.

@bgrant0607
Copy link
Copy Markdown
Member

Please close this and either fold it into the related KEP or create a new KEP in the enhancements repo.

@fejta-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2019
@bgrant0607
Copy link
Copy Markdown
Member

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@bgrant0607: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.