Skip to content

Conversation

@ningmingxiao
Copy link
Contributor

@ningmingxiao ningmingxiao commented Jul 26, 2025

I find restart containerd use much time on loadShims when create many pods.
create 300 pods
before this commit

time="2025-07-26T17:16:11.934486476+08:00" level=info msg="containerd successfully booted in 12.399198s"

after this commit

time="2025-07-26T17:14:18.288939951+08:00" level=info msg="containerd successfully booted in 2.570514s"

A picture of a cute animal (not mandatory but encouraged)

666666
* **Improve shim load time after restart by loading in parallel**

@k8s-ci-robot
Copy link

Hi @ningmingxiao. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot bot added the area/runtime Runtime label Jul 26, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in Pull Request Review Jul 26, 2025
@ningmingxiao ningmingxiao reopened this Jul 26, 2025
@github-project-automation github-project-automation bot moved this from Done to Needs Triage in Pull Request Review Jul 26, 2025
@ningmingxiao ningmingxiao changed the title use goroutine to speedup LoadExistingShims use goroutine to speedup loadShims Jul 26, 2025
@ningmingxiao ningmingxiao changed the title use goroutine to speedup loadShims use goroutine to speedup loadShims to speedup restart containerd Jul 26, 2025
@ningmingxiao ningmingxiao changed the title use goroutine to speedup loadShims to speedup restart containerd use goroutine to speedup loadShims Jul 26, 2025
@ningmingxiao ningmingxiao changed the title use goroutine to speedup loadShims restart:use goroutine to speedup loadShims Jul 26, 2025
@ningmingxiao ningmingxiao changed the title restart:use goroutine to speedup loadShims restart: use goroutine to speedup loadShims Jul 26, 2025
@ningmingxiao
Copy link
Contributor Author

@AkihiroSuda @dmcgowan @mikebrow ‌ PTAL

@ningmingxiao
Copy link
Contributor Author

@djdongjin can you review this pr ? some times loadShims use long time containerd will be killed by systemd .

Jun 12 14:02:50 AIS4-single-minion-3-0 containerd[869425]: containerd: context canceled

@ningmingxiao
Copy link
Contributor Author

ping @AkihiroSuda can you review this pr? It's easy to review. Thanks.

@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Aug 17, 2025
@ningmingxiao
Copy link
Contributor Author

ping PTAL @djdongjin @estesp

@dmcgowan dmcgowan added this to the 2.2 milestone Sep 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 19, 2025
@fuweid fuweid added this pull request to the merge queue Sep 26, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 26, 2025
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
@mikebrow
Copy link
Member

mikebrow commented Oct 9, 2025

• [FAILED] [7.680 seconds]
[k8s.io] Container OOM runtime should output OOMKilled reason [It] should terminate with exitCode 137 and reason OOMKilled
sigs.k8s.io/cri-tools/pkg/validate/container_linux.go:147

  Timeline >>
  STEP: create Privileged podSandbox @ 10/09/25 02:19:19.116
  STEP: create container @ 10/09/25 02:19:20.717
  STEP: create a container that will be killed by OOMKiller @ 10/09/25 02:19:20.717
  STEP: Get image status for image: registry.k8s.io/e2e-test-images/busybox:1.29-2 @ 10/09/25 02:19:20.717
  STEP: Pull image : registry.k8s.io/e2e-test-images/busybox:1.29-2 @ 10/09/25 02:19:20.718
  STEP: Create container. @ 10/09/25 02:19:22.19
  Oct  9 02:19:22.386: INFO: Created container "d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131"

  STEP: verifying container status @ 10/09/25 02:19:22.386
  STEP: start container @ 10/09/25 02:19:22.387
  STEP: Start container for containerID: d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131 @ 10/09/25 02:19:22.387
  Oct  9 02:19:22.540: INFO: Started container "d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131"

  STEP: container is stopped because of OOM @ 10/09/25 02:19:22.54
  STEP: Get container status for containerID: d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131 @ 10/09/25 02:19:22.54
  STEP: Get container status for containerID: d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131 @ 10/09/25 02:19:26.542
  STEP: Get container status for containerID: d3ed6803f8c1aa92958392235286c95303570cba8e06ece62fc43849a0d29131 @ 10/09/25 02:19:26.543
  STEP: exit code is 137 @ 10/09/25 02:19:26.543
  STEP: reason is OOMKilled @ 10/09/25 02:19:26.543
  [FAILED] in [It] - sigs.k8s.io/cri-tools/pkg/validate/container_linux.go:165 @ 10/09/25 02:19:26.544
  STEP: stop PodSandbox @ 10/09/25 02:19:26.544
  STEP: delete PodSandbox @ 10/09/25 02:19:26.742
  << Timeline

  [FAILED] Expected
      <string>: Error
  to equal
      <string>: OOMKilled
  In [It] at: sigs.k8s.io/cri-tools/pkg/validate/container_linux.go:165 @ 10/09/25 02:19:26.544

now why would this commit affect the container status exit reason.. from "OOMKilled" to "Error"

@ningmingxiao
Copy link
Contributor Author

ningmingxiao commented Oct 9, 2025

this ci sometimes failed it should be another bug. oom event is received after exited event. see issue 12260

@mikebrow
Copy link
Member

mikebrow commented Oct 9, 2025

restarted failed tests.. and all green, nod to preexisting flake shown in #12260

@dmcgowan dmcgowan added this pull request to the merge queue Oct 15, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 15, 2025
restart: use goroutine to speedup loadShims
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025
@ningmingxiao
Copy link
Contributor Author

ningmingxiao commented Oct 16, 2025

the ci failed because of another bug, i will try to fix it in another pr. #12372

@dmcgowan dmcgowan added this pull request to the merge queue Oct 16, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 16, 2025
@dmcgowan dmcgowan moved this from Review In Progress to Merge on Green in Pull Request Review Oct 17, 2025
@dmcgowan dmcgowan added this pull request to the merge queue Oct 17, 2025
@dmcgowan dmcgowan self-assigned this Oct 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 17, 2025
@dmcgowan dmcgowan added this pull request to the merge queue Oct 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 17, 2025
@mikebrow mikebrow added this pull request to the merge queue Oct 17, 2025
Merged via the queue into containerd:main with commit 79f7818 Oct 17, 2025
92 of 96 checks passed
@github-project-automation github-project-automation bot moved this from Merge on Green to Done in Pull Request Review Oct 17, 2025
@ningmingxiao ningmingxiao deleted the fix_load_task branch October 18, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants