-
Notifications
You must be signed in to change notification settings - Fork 3.8k
UpdatePodSandboxResources CRI API handler #11406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @chrishenzie. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
fea94df to
ad54c67
Compare
| if err != nil { | ||
| return nil, fmt.Errorf("NRI sandbox update failed: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub I suppose if we do sync the resources update we could do it after each pod sync..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to store the Overhead and Resources in the StatusStorage and update in this handler. For container updates it looks like this the store is updated in the NRI API client, should we be doing something similar here instead of in this handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting... I prefer the way you have it here over saving in the nri api client and I suppose sans an NRI config change the results should be the same. NRI could be off for a set of long running pods... then on after an edit of the config and a restart. Might not affect expectations though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub I suppose if we do sync the resources update we could do it after each pod sync..
@mikebrow Sorry Mike, I'm not sure how to interpret that.
Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.
@mikebrow @chrishenzie So shouldn't an UpdatePodSandboxResources in the end update both the in-memory and stored pod state so that the overall state of the pod becomes indistinguishable from if it was started with the updated resources in the first place ? Then if an NRI plugin is started after the update, it will only see the pod in its updated state.
ad54c67 to
0ac531f
Compare
https://github.com/containerd/containerd/blob/v2.0.2/integration/nri_test.go#L102 |
0ac531f to
af9ad25
Compare
|
@mikebrow I took a stab at an integration test in that file to check that update events successfully propagate to the NRI plugin. Let me know of any improvements or other test cases we may want to cover. Thanks! |
af9ad25 to
27c2ca7
Compare
|
It looks like this PR is blocked until we can update the CRI API import to 1.33, however that is blocked due to minimum go version requirements: #11749 |
27c2ca7 to
a172968
Compare
a172968 to
4960cd2
Compare
a512d69 to
5419634
Compare
5419634 to
7e30da7
Compare
7e30da7 to
7cb1347
Compare
3cccb80 to
806a99e
Compare
806a99e to
e408e36
Compare
|
@mikebrow @mxpv @klihub @samuelkarp I've refactored a bulk of the PR based on my previous questions/comments about persistence. I took a stab at moving the persistence logic directly into the PodSandboxController so we could avoid the hacky JSON wrangling I was previously doing in the pod sandbox status handler. The controller seemed like the most architecturally correct place for state mutations, but I could be wrong. Any feedback on whether this approach aligns with the intended architecture, specifically regarding the controller writing back to the core sandbox store for persistence, is appreciated. Thanks, and Happy Holidays! |
|
pretty big change to the way sandboxer controllers are written.. @mxpv wdyt? |
I believe long term plan is not to have Right now sandbox API supports 2 paths: sandbox API implemented by the runtimes and Ideally, we'd want to treat all sandboxes equally, and just use Sandbox controller interface in CRI and have runtimes implement the logic. So eventually it'd great to move this out to the runtime level and get rid of pause containers. Having that, this change introduces new dependencies (store and metadata store), which we'll have to eliminate to achieve the long term goal. I'm fine to take this in to get VPA going if there time constraints. But this PR makes the long term goal slightly more complex to achieve and it'd good to refactor this at some point. |
e408e36 to
7cb1347
Compare
Signed-off-by: Chris Henzie <chrishenzie@google.com>
Introduces changes to make pod sandbox updates persistent across restarts. This is achieved by: - Storing the updated Overhead and Resources as an extension on the core sandbox object and in the in-memory sandbox status store. - Modifying the sandbox recovery logic to read this extension on startup (this is not working in recovery unit tests yet and needs fixing). - Updating the PodSandboxStatus CRI handler to include updated resources from the sandbox status store. Signed-off-by: Chris Henzie <chrishenzie@google.com>
7cb1347 to
de5b622
Compare
|
Thanks @mxpv for the review. I've reverted to the previous approach that avoids sandbox controller changes. |
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Introduces the handler for the UpdatePodSandboxResources CRI API method which is needed for vertical pod autoscaling. Depends on kubernetes/kubernetes#128123 and containerd/nri#141.
Once these are merged I will remove the overrides in favor of the cut release versions. The go version change isn't intended but maybe unimportant since that commit will be going away anyway.
Any feedback on where is most appropriate to test this is appreciated.
Fixes: #11339
@samuelkarp @tallclair