Skip to content

Containers can't be stopped because TaskExit event failed to be published for runc.v2. #3177

@Random-Liu

Description

@Random-Liu

In the cri plugin, we enabled per-pod shim this week. containerd/cri#1116

After that we start seeing stop container timeout failure in all the high density benchmark test "105 pods per node". https://k8s-testgrid.appspot.com/sig-node-containerd#node-e2e-benchmark

Here is the containerd log for one for the containers 0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d. The container is stopped and killed, we the cri plugin never receives the TaskExit event. It seems to be something wrong with event generation.

time="2019-04-04T18:41:36.631574801Z" level=debug msg="Generated id \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" for container \"test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a_test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a_resource-usage-4584_da1ade22-399e-449d-94c3-1a318b4f5f54_0\""
time="2019-04-04T18:41:36.744647976Z" level=debug msg="Use OCI runtime {Type:io.containerd.runc.v2 Engine: PodAnnotations:[] Root: Options:0xc00023c630} for sandbox \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" and container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:41:36.803113278Z" level=debug msg="Container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" spec: (*specs.Spec)(0xc000db0780){Version:(string)1.0.1-dev Process:(*specs.Process)(0xc001caf790){Terminal:(bool)false ConsoleSize:(*specs.Box)<nil> User:(specs.User){UID:(uint32)0 GID:(uint32)0 AdditionalGids:([]uint32)<nil> Username:(string)} Args:([]string)[/pause] Env:([]string)[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a KUBERNETES_PORT=tcp://10.0.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1 KUBERNETES_SERVICE_HOST=10.0.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443] Cwd:(string)/ Capabilities:(*specs.LinuxCapabilities)(0xc000db0800){Bounding:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Effective:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Inheritable:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Permitted:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Ambient:([]string)<nil>} Rlimits:([]specs.POSIXRlimit)<nil> NoNewPrivileges:(bool)false ApparmorProfile:(string) OOMScoreAdj:(*int)(0xc001cfb720)1000 SelinuxLabel:(string)} Root:(*specs.Root)(0xc002195e80){Path:(string)rootfs Readonly:(bool)false} Hostname:(string) Mounts:([]specs.Mount)[{Destination:(string)/proc Type:(string)proc Source:(string)proc Options:([
il> Annotations:(map[string]string)map[io.kubernetes.cri.container-type:container io.kubernetes.cri.sandbox-id:a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696] Linux:(*specs.Linux)(0xc001ce2c40){UIDMappings:([]specs.LinuxIDMapping)<nil> GIDMappings:([]specs.LinuxIDMapping)<nil> Sysctl:(map[string]string)<nil> Resources:(*specs.LinuxResources)(0xc0011b61e0){Devices:([]specs.LinuxDeviceCgroup)[{Allow:(bool)false Type:(string) Major:(*int64)<nil> Minor:(*int64)<nil> Access:(string)rwm}] Memory:(*specs.LinuxMemory)(0xc000d68a80){Limit:(*int64)<nil> Reservation:(*int64)<nil> Swap:(*int64)<nil> Kernel:(*int64)<nil> KernelTCP:(*int64)<nil> Swappiness:(*uint64)<nil> DisableOOMKiller:(*bool)<nil>} CPU:(*specs.LinuxCPU)(0xc001bcf400){Shares:(*uint64)(0xc001cfb710)2 Quota:(*int64)<nil> Period:(*uint64)(0xc001cfb700)100000 RealtimeRuntime:(*int64)<nil> RealtimePeriod:(*uint64)<nil> Cpus:(string) Mems:(string)} Pids:(*specs.LinuxPids)<nil> BlockIO:(*specs.LinuxBlockIO)<nil> HugepageLimits:([]specs.LinuxHugepageLimit)<nil> Network:(*specs.LinuxNetwork)<nil> Rdma:(map[string]specs.LinuxRdma)<nil>} CgroupsPath:(string)/kubepods/besteffort/podda1ade22-399e-449d-94c3-1a318b4f5f54/0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d Namespaces:([]specs.LinuxNamespace)[{Type:(specs.LinuxNamespaceType)pid Path:(string)} {Type:(specs.LinuxNamespaceType)ipc Path:(string)/proc/8222/ns/ipc} {Type:(specs.LinuxNamespaceType)uts Path:(string)/proc/8222/ns/uts} {Type:(specs.LinuxNamespaceType)mount Path:(string)} {Type:(specs.LinuxNamespaceType)network Path:(string)/proc/8222/ns/net}] Devices:([]specs.LinuxDevice)<nil> Seccomp:(*specs.LinuxSeccomp)<nil> RootfsPropagation:(string) MaskedPaths:([]string)[/proc/acpi /proc/kcore /proc/keys /proc/latency_stats /proc/timer_list /proc/timer_stats /proc/sched_debug /proc/scsi /sys/firmware] ReadonlyPaths:([]string)[/proc/asound /proc/bus /proc/fs /proc/irq /proc/sys /proc/sysrq-trigger] MountLabel:(string) IntelRdt:(*specs.LinuxIntelRdt)<nil>} Solaris:(*specs.Solaris)<ni
time="2019-04-04T18:41:40.992622126Z" level=info msg="CreateContainer within sandbox \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" for &ContainerMetadata{Name:test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a,Attempt:0,} returns container id \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:41:42.192310583Z" level=info msg="StartContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:42:12.471176359Z" level=info msg="StartContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" returns successfully"
time="2019-04-04T18:53:04.816014089Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:53:04.900062762Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:53:05.866970910Z" level=info msg="Finish piping stderr of container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:53:05.870047888Z" level=info msg="Finish piping stdout of container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:53:35.898807042Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:53:35.899043803Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:54:35.951068556Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:54:35.963946631Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:55:35.971917138Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:55:36.951455671Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:55:36.970942725Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:56:06.978961921Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:56:06.979019282Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:57:06.992940717Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:57:07.005859385Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:58:07.013903724Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:58:07.790607889Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:58:07.811165415Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:58:37.824709273Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:58:37.824762699Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:59:37.833905848Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:59:37.847227119Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:00:37.865891066Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:00:38.820964256Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:00:38.845300164Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:01:08.860084368Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:01:08.860143562Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:02:08.871193611Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:02:08.885855719Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:03:08.894728886Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:03:09.773810801Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:03:09.794143413Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:03:39.802885992Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:03:39.802943840Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:04:39.812837493Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:04:39.826498312Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:05:39.837112312Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:05:40.161845688Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:05:40.182712592Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:06:10.191859785Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:06:10.191918128Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:07:10.201960458Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:07:10.216296253Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:08:10.226262436Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:08:11.003858614Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:08:11.023824418Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:08:41.033270200Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:08:41.033335027Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:09:41.044166525Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:09:41.056674304Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:10:41.065092123Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:10:41.072429115Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:10:41.110165749Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:11:11.119716910Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:11:11.119762819Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:12:11.130458562Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:12:11.143319295Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:13:11.152302420Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:13:11.809114490Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:13:11.830325474Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:13:13.283764327Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" is cancelled: context canceled"

And I saw a lot of event publish failure:

...
time="2019-04-04T18:53:16.943939490Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:16.979687830Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:17.129133553Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:17.531179734Z" level=error msg="post event" error="failed to publish event"
...

This is similar with #3125.

As is suggested, we should:

  1. Print error with stdout/stderr from event publishing process for debuging;
  2. Retry sending event, especially for TaskExit, which is very important in the container lifecycle handling.

And since this is first seen after switching to per-pod shim, I guess it is related to more events per shim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions