UpdatePodSandboxResources CRI API handler #11406

chrishenzie · 2025-02-19T02:45:26Z

Introduces the handler for the UpdatePodSandboxResources CRI API method which is needed for vertical pod autoscaling. Depends on kubernetes/kubernetes#128123 and containerd/nri#141.

Once these are merged I will remove the overrides in favor of the cut release versions. The go version change isn't intended but maybe unimportant since that commit will be going away anyway.

Any feedback on where is most appropriate to test this is appreciated.

Fixes: #11339

@samuelkarp @tallclair

k8s-ci-robot · 2025-02-19T02:45:37Z

Hi @chrishenzie. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

samuelkarp · 2025-02-19T02:54:49Z

/ok-to-test

internal/cri/nri/nri_api_linux.go

mikebrow · 2025-02-20T00:03:12Z

internal/cri/server/sandbox_update_resources.go

+	if err != nil {
+		return nil, fmt.Errorf("NRI sandbox update failed: %w", err)
+	}
+


Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.

@klihub I suppose if we do sync the resources update we could do it after each pod sync..

Updated to store the Overhead and Resources in the StatusStorage and update in this handler. For container updates it looks like this the store is updated in the NRI API client, should we be doing something similar here instead of in this handler?

interesting... I prefer the way you have it here over saving in the nri api client and I suppose sans an NRI config change the results should be the same. NRI could be off for a set of long running pods... then on after an edit of the config and a restart. Might not affect expectations though.

@klihub I suppose if we do sync the resources update we could do it after each pod sync..

@mikebrow Sorry Mike, I'm not sure how to interpret that.

Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.

@mikebrow @chrishenzie So shouldn't an UpdatePodSandboxResources in the end update both the in-memory and stored pod state so that the overall state of the pod becomes indistinguishable from if it was started with the updated resources in the first place ? Then if an NRI plugin is started after the update, it will only see the pod in its updated state.

mikebrow · 2025-02-20T15:03:47Z

Any feedback on where is most appropriate to test this is appreciated.

https://github.com/containerd/containerd/blob/v2.0.2/integration/nri_test.go#L102

chrishenzie · 2025-02-20T23:51:51Z

@mikebrow I took a stab at an integration test in that file to check that update events successfully propagate to the NRI plugin. Let me know of any improvements or other test cases we may want to cover. Thanks!

integration/nri_test.go

chrishenzie · 2025-04-23T22:47:01Z

It looks like this PR is blocked until we can update the CRI API import to 1.33, however that is blocked due to minimum go version requirements: #11749

internal/cri/server/sandbox_status.go

chrishenzie · 2025-12-24T00:35:49Z

@mikebrow @mxpv @klihub @samuelkarp

I've refactored a bulk of the PR based on my previous questions/comments about persistence.

I took a stab at moving the persistence logic directly into the PodSandboxController so we could avoid the hacky JSON wrangling I was previously doing in the pod sandbox status handler. The controller seemed like the most architecturally correct place for state mutations, but I could be wrong.

Any feedback on whether this approach aligns with the intended architecture, specifically regarding the controller writing back to the core sandbox store for persistence, is appreciated.

Thanks, and Happy Holidays!

mikebrow · 2025-12-24T01:10:25Z

pretty big change to the way sandboxer controllers are written.. @mxpv wdyt?

mxpv · 2026-01-06T19:54:04Z

I took a stab at moving the persistence logic directly into the PodSandboxController so we could avoid the hacky JSON wrangling I was previously doing in the pod sandbox status handler.
The controller seemed like the most architecturally correct place for state mutations, but I could be wrong.

I believe long term plan is not to have server/sandbox package at all.

Right now sandbox API supports 2 paths: sandbox API implemented by the runtimes and server/sandbox which is a special case we keep due the need of complex refactoring (CRI sandbox).

Ideally, we'd want to treat all sandboxes equally, and just use Sandbox controller interface in CRI and have runtimes implement the logic.

So eventually it'd great to move this out to the runtime level and get rid of pause containers.

Having that, this change introduces new dependencies (store and metadata store), which we'll have to eliminate to achieve the long term goal.

I'm fine to take this in to get VPA going if there time constraints. But this PR makes the long term goal slightly more complex to achieve and it'd good to refactor this at some point.

Signed-off-by: Chris Henzie <chrishenzie@google.com>

Introduces changes to make pod sandbox updates persistent across restarts. This is achieved by: - Storing the updated Overhead and Resources as an extension on the core sandbox object and in the in-memory sandbox status store. - Modifying the sandbox recovery logic to read this extension on startup (this is not working in recovery unit tests yet and needs fixing). - Updating the PodSandboxStatus CRI handler to include updated resources from the sandbox status store. Signed-off-by: Chris Henzie <chrishenzie@google.com>

chrishenzie · 2026-01-07T17:46:23Z

Thanks @mxpv for the review. I've reverted to the previous approach that avoids sandbox controller changes.

mikebrow

LGTM

k8s-ci-robot added the needs-ok-to-test label Feb 19, 2025

k8s-ci-robot added the size/XXL label Feb 19, 2025

dosubot bot added area/cri Container Runtime Interface (CRI) area/nri Node Resource Interface (NRI) status/has-dependency labels Feb 19, 2025

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Feb 19, 2025

klihub reviewed Feb 19, 2025

View reviewed changes

internal/cri/nri/nri_api_linux.go Show resolved Hide resolved

klihub reviewed Feb 19, 2025

View reviewed changes

internal/cri/nri/nri_api_linux.go Show resolved Hide resolved

chrishenzie force-pushed the update-pod-sandbox-api branch from fea94df to ad54c67 Compare February 19, 2025 17:21

mikebrow reviewed Feb 20, 2025

View reviewed changes

chrishenzie force-pushed the update-pod-sandbox-api branch from ad54c67 to 0ac531f Compare February 20, 2025 01:00

chrishenzie force-pushed the update-pod-sandbox-api branch from 0ac531f to af9ad25 Compare February 20, 2025 23:50

klihub reviewed Feb 21, 2025

View reviewed changes

integration/nri_test.go Show resolved Hide resolved

chrishenzie force-pushed the update-pod-sandbox-api branch from af9ad25 to 27c2ca7 Compare February 26, 2025 23:43

k8s-ci-robot added the needs-rebase label Feb 27, 2025

chrishenzie mentioned this pull request Mar 31, 2025

Implement new UpdatePodSandboxResources CRI method #11339

Closed

sreeram-venkitesh mentioned this pull request Apr 30, 2025

[SIG-Node]: KEP-4960 - Container Stop Signals #11617

Open

djdongjin mentioned this pull request May 7, 2025

Bump up go version to 1.24 and cri-api to 0.33.0 #11823

Merged

mxpv added status/needs-update Awaiting contributor update and removed status/needs-update Awaiting contributor update labels May 9, 2025

chrishenzie force-pushed the update-pod-sandbox-api branch from 27c2ca7 to a172968 Compare June 3, 2025 17:34

k8s-ci-robot removed the needs-rebase label Jun 3, 2025

chrishenzie force-pushed the update-pod-sandbox-api branch from a172968 to 4960cd2 Compare June 3, 2025 17:38

k8s-ci-robot added the needs-rebase label Jun 20, 2025

dmcgowan modified the milestones: 2.2, 2.3 Nov 6, 2025

chrishenzie force-pushed the update-pod-sandbox-api branch from a512d69 to 5419634 Compare November 24, 2025 19:25

chrishenzie force-pushed the update-pod-sandbox-api branch from 5419634 to 7e30da7 Compare December 9, 2025 23:11

mikebrow reviewed Dec 9, 2025

View reviewed changes

internal/cri/server/sandbox_status.go Show resolved Hide resolved

chrishenzie force-pushed the update-pod-sandbox-api branch from 7e30da7 to 7cb1347 Compare December 10, 2025 18:43

k8s-ci-robot added size/XL and removed size/L labels Dec 10, 2025

k8s-ci-robot added size/L and removed size/XL labels Dec 23, 2025

chrishenzie force-pushed the update-pod-sandbox-api branch from 3cccb80 to 806a99e Compare December 23, 2025 23:45

k8s-ci-robot added size/XL and removed size/L labels Dec 23, 2025

chrishenzie force-pushed the update-pod-sandbox-api branch from 806a99e to e408e36 Compare December 24, 2025 00:20

chrishenzie force-pushed the update-pod-sandbox-api branch from e408e36 to 7cb1347 Compare January 7, 2026 17:40

chrishenzie added 2 commits January 7, 2026 09:44

Implement UpdatePodSandboxResources CRI API handler

ffd3691

Signed-off-by: Chris Henzie <chrishenzie@google.com>

chrishenzie force-pushed the update-pod-sandbox-api branch from 7cb1347 to de5b622 Compare January 7, 2026 17:44

mxpv approved these changes Jan 8, 2026

View reviewed changes

mikebrow approved these changes Jan 8, 2026

View reviewed changes

github-project-automation bot moved this from Needs Update to Review In Progress in Pull Request Review Jan 8, 2026

mxpv added this pull request to the merge queue Jan 8, 2026

Merged via the queue into containerd:main with commit 2bf3fcf Jan 8, 2026
52 checks passed

github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Jan 8, 2026

chrishenzie deleted the update-pod-sandbox-api branch January 8, 2026 17:53

UpdatePodSandboxResources CRI API handler #11406

UpdatePodSandboxResources CRI API handler #11406

Uh oh!

Conversation

chrishenzie commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Feb 19, 2025

Uh oh!

samuelkarp commented Feb 19, 2025

Uh oh!

Uh oh!

Uh oh!

mikebrow Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

mikebrow Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

chrishenzie Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

mikebrow Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

klihub Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

mikebrow commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrishenzie commented Feb 20, 2025

Uh oh!

Uh oh!

chrishenzie commented Apr 23, 2025

Uh oh!

Uh oh!

chrishenzie commented Dec 24, 2025

Uh oh!

mikebrow commented Dec 24, 2025

Uh oh!

mxpv commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrishenzie commented Jan 7, 2026

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

chrishenzie commented Feb 19, 2025 •

edited

Loading

mikebrow commented Feb 20, 2025 •

edited

Loading

mxpv commented Jan 6, 2026 •

edited

Loading