Skip to content

Conversation

@chrishenzie
Copy link
Contributor

@chrishenzie chrishenzie commented Feb 19, 2025

Introduces the handler for the UpdatePodSandboxResources CRI API method which is needed for vertical pod autoscaling. Depends on kubernetes/kubernetes#128123 and containerd/nri#141.

Once these are merged I will remove the overrides in favor of the cut release versions. The go version change isn't intended but maybe unimportant since that commit will be going away anyway.

Any feedback on where is most appropriate to test this is appreciated.

Fixes: #11339

@samuelkarp @tallclair

@k8s-ci-robot
Copy link

Hi @chrishenzie. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot bot added area/cri Container Runtime Interface (CRI) area/nri Node Resource Interface (NRI) status/has-dependency labels Feb 19, 2025
@samuelkarp
Copy link
Member

/ok-to-test

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from fea94df to ad54c67 Compare February 19, 2025 17:21
if err != nil {
return nil, fmt.Errorf("NRI sandbox update failed: %w", err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub I suppose if we do sync the resources update we could do it after each pod sync..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to store the Overhead and Resources in the StatusStorage and update in this handler. For container updates it looks like this the store is updated in the NRI API client, should we be doing something similar here instead of in this handler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting... I prefer the way you have it here over saving in the nri api client and I suppose sans an NRI config change the results should be the same. NRI could be off for a set of long running pods... then on after an edit of the config and a restart. Might not affect expectations though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub I suppose if we do sync the resources update we could do it after each pod sync..

@mikebrow Sorry Mike, I'm not sure how to interpret that.

Should we not store these fields in the sandbox store? if we don't store these values in the sandbox store... we won't have a way to sync them with the plugin for restart scenarios of the runtime/plugin or for a late binding nri plugin.

@mikebrow @chrishenzie So shouldn't an UpdatePodSandboxResources in the end update both the in-memory and stored pod state so that the overall state of the pod becomes indistinguishable from if it was started with the updated resources in the first place ? Then if an NRI plugin is started after the update, it will only see the pod in its updated state.

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from ad54c67 to 0ac531f Compare February 20, 2025 01:00
@mikebrow
Copy link
Member

mikebrow commented Feb 20, 2025

Any feedback on where is most appropriate to test this is appreciated.

https://github.com/containerd/containerd/blob/v2.0.2/integration/nri_test.go#L102

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 0ac531f to af9ad25 Compare February 20, 2025 23:50
@chrishenzie
Copy link
Contributor Author

@mikebrow I took a stab at an integration test in that file to check that update events successfully propagate to the NRI plugin. Let me know of any improvements or other test cases we may want to cover. Thanks!

@chrishenzie
Copy link
Contributor Author

It looks like this PR is blocked until we can update the CRI API import to 1.33, however that is blocked due to minimum go version requirements: #11749

@dmcgowan dmcgowan modified the milestones: 2.2, 2.3 Nov 6, 2025
@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from a512d69 to 5419634 Compare November 24, 2025 19:25
@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 5419634 to 7e30da7 Compare December 9, 2025 23:11
@chrishenzie
Copy link
Contributor Author

@mikebrow @mxpv @klihub @samuelkarp

I've refactored a bulk of the PR based on my previous questions/comments about persistence.

I took a stab at moving the persistence logic directly into the PodSandboxController so we could avoid the hacky JSON wrangling I was previously doing in the pod sandbox status handler. The controller seemed like the most architecturally correct place for state mutations, but I could be wrong.

Any feedback on whether this approach aligns with the intended architecture, specifically regarding the controller writing back to the core sandbox store for persistence, is appreciated.

Thanks, and Happy Holidays!

@mikebrow
Copy link
Member

pretty big change to the way sandboxer controllers are written.. @mxpv wdyt?

@mxpv
Copy link
Member

mxpv commented Jan 6, 2026

I took a stab at moving the persistence logic directly into the PodSandboxController so we could avoid the hacky JSON wrangling I was previously doing in the pod sandbox status handler.
The controller seemed like the most architecturally correct place for state mutations, but I could be wrong.

I believe long term plan is not to have server/sandbox package at all.

Right now sandbox API supports 2 paths: sandbox API implemented by the runtimes and server/sandbox which is a special case we keep due the need of complex refactoring (CRI sandbox).

Ideally, we'd want to treat all sandboxes equally, and just use Sandbox controller interface in CRI and have runtimes implement the logic.

So eventually it'd great to move this out to the runtime level and get rid of pause containers.

Having that, this change introduces new dependencies (store and metadata store), which we'll have to eliminate to achieve the long term goal.

I'm fine to take this in to get VPA going if there time constraints. But this PR makes the long term goal slightly more complex to achieve and it'd good to refactor this at some point.

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from e408e36 to 7cb1347 Compare January 7, 2026 17:40
Signed-off-by: Chris Henzie <chrishenzie@google.com>
Introduces changes to make pod sandbox updates persistent across
restarts.

This is achieved by:
- Storing the updated Overhead and Resources as an extension on the core
  sandbox object and in the in-memory sandbox status store.
- Modifying the sandbox recovery logic to read this extension on startup
  (this is not working in recovery unit tests yet and needs fixing).
- Updating the PodSandboxStatus CRI handler to include updated resources
  from the sandbox status store.

Signed-off-by: Chris Henzie <chrishenzie@google.com>
@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 7cb1347 to de5b622 Compare January 7, 2026 17:44
@chrishenzie
Copy link
Contributor Author

Thanks @mxpv for the review. I've reverted to the previous approach that avoids sandbox controller changes.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from Needs Update to Review In Progress in Pull Request Review Jan 8, 2026
@mxpv mxpv added this pull request to the merge queue Jan 8, 2026
Merged via the queue into containerd:main with commit 2bf3fcf Jan 8, 2026
52 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Jan 8, 2026
@chrishenzie chrishenzie deleted the update-pod-sandbox-api branch January 8, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) area/nri Node Resource Interface (NRI) ok-to-test size/XL status/has-dependency

Projects

Development

Successfully merging this pull request may close these issues.

Implement new UpdatePodSandboxResources CRI method

7 participants