Skip to content

Unable to recover corrupt image after unexpected host reboot #3671

@mcginne

Description

@mcginne

We occasionally see hosts rebooting unexpectedly, and some times when this occurs certain containers are unable to start with the following errors:

Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: I0917 17:18:34.123476    2092 kuberuntime_manager.go:409] No sandbox for pod "ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)" can be found. Need to start a new one
Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: E0917 17:18:34.140251    2092 kubelet.go:2244] node "10.143.255.15" not found
Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: E0917 17:18:34.162667    2092 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found
Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: E0917 17:18:34.162767    2092 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)" failed: rpc error: code = Unknown desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found
Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: E0917 17:18:34.162802    2092 kuberuntime_manager.go:697] createPodSandbox for pod "ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)" failed: rpc error: code = Unknown desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found
Sep 17 17:18:34 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 kubelet.service[2092]: E0917 17:18:34.162867    2092 pod_workers.go:190] Error syncing pod f51dbf3439a39cd1567f6b8e5c99dc94 ("ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)"), skipping: failed to "CreatePodSandbox" for "ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)" with CreatePodSandboxError: "CreatePodSandbox for pod \"ibm-master-proxy-static-10.143.255.15_kube-system(f51dbf3439a39cd1567f6b8e5c99dc94)\" failed: rpc error: code = Unknown desc = failed to create containerd container: failed to create snapshot: missing parent \"k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4\" bucket: not found"

Containerd logs report:

Sep 17 17:29:51 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:29:51.125588170Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,}"
Sep 17 17:29:51 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:29:51.170208023Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found"
Sep 17 17:30:03 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:03.124078256Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,}"
Sep 17 17:30:03 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:03.166402807Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found"
Sep 17 17:30:14 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:14.124150574Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,}"
Sep 17 17:30:14 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:14.190567400Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found"
Sep 17 17:30:29 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:29.123812110Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,}"
Sep 17 17:30:29 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:29.190065496Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found"
Sep 17 17:30:44 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:44.124012565Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,}"
Sep 17 17:30:44 test-bm0glfe20a9a2salavcg-dmnetperfa1-default-00000372 containerd[1988]: time="2019-09-17T17:30:44.174658740Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:ibm-master-proxy-static-10.143.255.15,Uid:f51dbf3439a39cd1567f6b8e5c99dc94,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd container: failed to create snapshot: missing parent "k8s.io/9/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4" bucket: not found"

I have tried to delete and pull the image manually on the host using:

crictl pull --creds xxx:yyy registry.ng.bluemix.net/armada-master/haproxy:967a34e6512d2d318e796959d962b59fbfa616fb

But this fails with a similar error:

image

Version info:

crictl version
Version:  0.1.0
RuntimeName:  containerd
RuntimeVersion:  v1.2.9
RuntimeApiVersion:  v1alpha2

I can understand things being left in a bad state when a host has crashed unexpectedly, but I would like a way of being able to recover a node - currently I am having to reload the node when this issue occurs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions