Skip to content

Return NotFound error for kill and delete in deleted state.#3244

Merged
estesp merged 1 commit intocontainerd:masterfrom
Random-Liu:fix-container-cleanup
May 7, 2019
Merged

Return NotFound error for kill and delete in deleted state.#3244
estesp merged 1 commit intocontainerd:masterfrom
Random-Liu:fix-container-cleanup

Conversation

@Random-Liu
Copy link
Member

We've seen this issue in a Kubernetes benchmark test https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-cri-containerd-node-e2e-benchmark/1120454833061498892.

The initial container deletion hit a timeout, and returned at https://github.com/containerd/containerd/blob/v1.2.6/runtime/v1/linux/task.go#L89. I guess because we are stopping >200 containers at the same time:

Apr 22 22:50:40 n1-standard-1-cos-stable-60-9592-90-0-81227dd7 containerd[1037]: time="2019-04-22T22:50:40.732760213Z" level=error msg="Failed to handle exit event &TaskExit{ContainerID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,ID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,Pid:1466,ExitStatus:0,ExitedAt:2019-04-22 22:50:30.720718714 +0000 UTC,XXX_unrecognized:[],} for fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9" error="failed to handle container TaskExit event: failed to stop container: context deadline exceeded: unknown"

However, the shim actually already deleted the container, thus went into deleted state.

After this point, all future kill and deletion won't be able to cleanup the container any more:

Apr 22 22:50:42 n1-standard-1-cos-stable-60-9592-90-0-81227dd7 containerd[1037]: time="2019-04-22T22:50:42.382318462Z" level=error msg="Failed to handle backOff event &TaskExit{ContainerID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,ID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,Pid:1466,ExitStatus:0,ExitedAt:2019-04-22 22:50:30.720718714 +0000 UTC,XXX_unrecognized:[],} for fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9" error="failed to handle container TaskExit event: failed to stop container: cannot kill a deleted process: unknown"
...
Apr 22 22:50:44 n1-standard-1-cos-stable-60-9592-90-0-81227dd7 containerd[1037]: time="2019-04-22T22:50:44.453285825Z" level=error msg="Failed to handle backOff event &TaskExit{ContainerID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,ID:fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9,Pid:1466,ExitStatus:0,ExitedAt:2019-04-22 22:50:30.720718714 +0000 UTC,XXX_unrecognized:[],} for fdab5da3258b6cf5488148cb9e4295e0fe5d1a3fe4fb6f64b7091a78c74504d9" error="failed to handle container TaskExit event: failed to stop container: cannot kill a deleted process: unknown"
...

Signed-off-by: Lantao Liu lantaol@google.com

Signed-off-by: Lantao Liu <lantaol@google.com>
@codecov-io
Copy link

Codecov Report

Merging #3244 into master will decrease coverage by 4.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3244      +/-   ##
==========================================
- Coverage   48.65%   44.63%   -4.02%     
==========================================
  Files         102      113      +11     
  Lines        9570    12164    +2594     
==========================================
+ Hits         4656     5430     +774     
- Misses       4088     5899    +1811     
- Partials      826      835       +9
Flag Coverage Δ
#linux 48.65% <ø> (ø) ⬆️
#windows 39.87% <ø> (?)
Impacted Files Coverage Δ
snapshots/native/native.go 43.04% <0%> (-9.99%) ⬇️
metadata/snapshot.go 45.8% <0%> (-8.96%) ⬇️
archive/tar.go 43.79% <0%> (-7.07%) ⬇️
metadata/containers.go 47.97% <0%> (-6.62%) ⬇️
content/local/writer.go 58.65% <0%> (-5.55%) ⬇️
content/local/store.go 49.51% <0%> (-5.15%) ⬇️
metadata/images.go 57.57% <0%> (-4.99%) ⬇️
archive/tar_opts.go 28.57% <0%> (-4.77%) ⬇️
archive/compression/compression.go 58.69% <0%> (-4.7%) ⬇️
metadata/buckets.go 56.33% <0%> (-4.6%) ⬇️
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a3f0aa...dff7456. Read the comment docs.

1 similar comment
@codecov-io
Copy link

Codecov Report

Merging #3244 into master will decrease coverage by 4.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3244      +/-   ##
==========================================
- Coverage   48.65%   44.63%   -4.02%     
==========================================
  Files         102      113      +11     
  Lines        9570    12164    +2594     
==========================================
+ Hits         4656     5430     +774     
- Misses       4088     5899    +1811     
- Partials      826      835       +9
Flag Coverage Δ
#linux 48.65% <ø> (ø) ⬆️
#windows 39.87% <ø> (?)
Impacted Files Coverage Δ
snapshots/native/native.go 43.04% <0%> (-9.99%) ⬇️
metadata/snapshot.go 45.8% <0%> (-8.96%) ⬇️
archive/tar.go 43.79% <0%> (-7.07%) ⬇️
metadata/containers.go 47.97% <0%> (-6.62%) ⬇️
content/local/writer.go 58.65% <0%> (-5.55%) ⬇️
content/local/store.go 49.51% <0%> (-5.15%) ⬇️
metadata/images.go 57.57% <0%> (-4.99%) ⬇️
archive/tar_opts.go 28.57% <0%> (-4.77%) ⬇️
archive/compression/compression.go 58.69% <0%> (-4.7%) ⬇️
metadata/buckets.go 56.33% <0%> (-4.6%) ⬇️
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a3f0aa...dff7456. Read the comment docs.

@crosbymichael
Copy link
Member

LGTM

These changes look really good for handling the state of a task

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants