-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Description
In the cri plugin, we enabled per-pod shim this week. containerd/cri#1116
After that we start seeing stop container timeout failure in all the high density benchmark test "105 pods per node". https://k8s-testgrid.appspot.com/sig-node-containerd#node-e2e-benchmark
Here is the containerd log for one for the containers 0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d. The container is stopped and killed, we the cri plugin never receives the TaskExit event. It seems to be something wrong with event generation.
time="2019-04-04T18:41:36.631574801Z" level=debug msg="Generated id \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" for container \"test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a_test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a_resource-usage-4584_da1ade22-399e-449d-94c3-1a318b4f5f54_0\""
time="2019-04-04T18:41:36.744647976Z" level=debug msg="Use OCI runtime {Type:io.containerd.runc.v2 Engine: PodAnnotations:[] Root: Options:0xc00023c630} for sandbox \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" and container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:41:36.803113278Z" level=debug msg="Container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" spec: (*specs.Spec)(0xc000db0780){Version:(string)1.0.1-dev Process:(*specs.Process)(0xc001caf790){Terminal:(bool)false ConsoleSize:(*specs.Box)<nil> User:(specs.User){UID:(uint32)0 GID:(uint32)0 AdditionalGids:([]uint32)<nil> Username:(string)} Args:([]string)[/pause] Env:([]string)[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOSTNAME=test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a KUBERNETES_PORT=tcp://10.0.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1 KUBERNETES_SERVICE_HOST=10.0.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443] Cwd:(string)/ Capabilities:(*specs.LinuxCapabilities)(0xc000db0800){Bounding:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Effective:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Inheritable:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Permitted:([]string)[CAP_CHOWN CAP_DAC_OVERRIDE CAP_FSETID CAP_FOWNER CAP_MKNOD CAP_NET_RAW CAP_SETGID CAP_SETUID CAP_SETFCAP CAP_SETPCAP CAP_NET_BIND_SERVICE CAP_SYS_CHROOT CAP_KILL CAP_AUDIT_WRITE] Ambient:([]string)<nil>} Rlimits:([]specs.POSIXRlimit)<nil> NoNewPrivileges:(bool)false ApparmorProfile:(string) OOMScoreAdj:(*int)(0xc001cfb720)1000 SelinuxLabel:(string)} Root:(*specs.Root)(0xc002195e80){Path:(string)rootfs Readonly:(bool)false} Hostname:(string) Mounts:([]specs.Mount)[{Destination:(string)/proc Type:(string)proc Source:(string)proc Options:([
il> Annotations:(map[string]string)map[io.kubernetes.cri.container-type:container io.kubernetes.cri.sandbox-id:a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696] Linux:(*specs.Linux)(0xc001ce2c40){UIDMappings:([]specs.LinuxIDMapping)<nil> GIDMappings:([]specs.LinuxIDMapping)<nil> Sysctl:(map[string]string)<nil> Resources:(*specs.LinuxResources)(0xc0011b61e0){Devices:([]specs.LinuxDeviceCgroup)[{Allow:(bool)false Type:(string) Major:(*int64)<nil> Minor:(*int64)<nil> Access:(string)rwm}] Memory:(*specs.LinuxMemory)(0xc000d68a80){Limit:(*int64)<nil> Reservation:(*int64)<nil> Swap:(*int64)<nil> Kernel:(*int64)<nil> KernelTCP:(*int64)<nil> Swappiness:(*uint64)<nil> DisableOOMKiller:(*bool)<nil>} CPU:(*specs.LinuxCPU)(0xc001bcf400){Shares:(*uint64)(0xc001cfb710)2 Quota:(*int64)<nil> Period:(*uint64)(0xc001cfb700)100000 RealtimeRuntime:(*int64)<nil> RealtimePeriod:(*uint64)<nil> Cpus:(string) Mems:(string)} Pids:(*specs.LinuxPids)<nil> BlockIO:(*specs.LinuxBlockIO)<nil> HugepageLimits:([]specs.LinuxHugepageLimit)<nil> Network:(*specs.LinuxNetwork)<nil> Rdma:(map[string]specs.LinuxRdma)<nil>} CgroupsPath:(string)/kubepods/besteffort/podda1ade22-399e-449d-94c3-1a318b4f5f54/0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d Namespaces:([]specs.LinuxNamespace)[{Type:(specs.LinuxNamespaceType)pid Path:(string)} {Type:(specs.LinuxNamespaceType)ipc Path:(string)/proc/8222/ns/ipc} {Type:(specs.LinuxNamespaceType)uts Path:(string)/proc/8222/ns/uts} {Type:(specs.LinuxNamespaceType)mount Path:(string)} {Type:(specs.LinuxNamespaceType)network Path:(string)/proc/8222/ns/net}] Devices:([]specs.LinuxDevice)<nil> Seccomp:(*specs.LinuxSeccomp)<nil> RootfsPropagation:(string) MaskedPaths:([]string)[/proc/acpi /proc/kcore /proc/keys /proc/latency_stats /proc/timer_list /proc/timer_stats /proc/sched_debug /proc/scsi /sys/firmware] ReadonlyPaths:([]string)[/proc/asound /proc/bus /proc/fs /proc/irq /proc/sys /proc/sysrq-trigger] MountLabel:(string) IntelRdt:(*specs.LinuxIntelRdt)<nil>} Solaris:(*specs.Solaris)<ni
time="2019-04-04T18:41:40.992622126Z" level=info msg="CreateContainer within sandbox \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" for &ContainerMetadata{Name:test-fc9fe058-bf17-40fa-bb0e-f99f48f2548a,Attempt:0,} returns container id \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:41:42.192310583Z" level=info msg="StartContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:42:12.471176359Z" level=info msg="StartContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" returns successfully"
time="2019-04-04T18:53:04.816014089Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:53:04.900062762Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:53:05.866970910Z" level=info msg="Finish piping stderr of container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:53:05.870047888Z" level=info msg="Finish piping stdout of container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:53:35.898807042Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:53:35.899043803Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:54:35.951068556Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:54:35.963946631Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:55:35.971917138Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:55:36.951455671Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:55:36.970942725Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:56:06.978961921Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:56:06.979019282Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:57:06.992940717Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:57:07.005859385Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:58:07.013903724Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:58:07.790607889Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T18:58:07.811165415Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T18:58:37.824709273Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:58:37.824762699Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T18:59:37.833905848Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T18:59:37.847227119Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:00:37.865891066Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:00:38.820964256Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:00:38.845300164Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:01:08.860084368Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:01:08.860143562Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:02:08.871193611Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:02:08.885855719Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:03:08.894728886Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:03:09.773810801Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:03:09.794143413Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:03:39.802885992Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:03:39.802943840Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:04:39.812837493Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:04:39.826498312Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:05:39.837112312Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:05:40.161845688Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:05:40.182712592Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:06:10.191859785Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:06:10.191918128Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:07:10.201960458Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:07:10.216296253Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:08:10.226262436Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:08:11.003858614Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:08:11.023824418Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:08:41.033270200Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:08:41.033335027Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:09:41.044166525Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:09:41.056674304Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:10:41.065092123Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:10:41.072429115Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:10:41.110165749Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:11:11.119716910Z" level=error msg="An error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be stopped" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:11:11.119762819Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:12:11.130458562Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:12:11.143319295Z" level=info msg="Kill container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\""
time="2019-04-04T19:13:11.152302420Z" level=error msg="StopPodSandbox for \"a0c4eff7d01ae1299187248f640065bce23390743a325fda4d7017e8f6ee1696\" failed" error="failed to stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\": an error occurs during waiting for container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" to be killed: wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" stop timeout"
time="2019-04-04T19:13:11.809114490Z" level=info msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with timeout 30 (s)"
time="2019-04-04T19:13:11.830325474Z" level=info msg="Stop container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" with signal terminated"
time="2019-04-04T19:13:13.283764327Z" level=error msg="StopContainer for \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" failed" error="wait container \"0c88fdd8abb9d3896544a8a2c453767845c1f6e5eede276cdff6c5c3f302d69d\" is cancelled: context canceled"
And I saw a lot of event publish failure:
...
time="2019-04-04T18:53:16.943939490Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:16.979687830Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:17.129133553Z" level=error msg="post event" error="failed to publish event"
time="2019-04-04T18:53:17.531179734Z" level=error msg="post event" error="failed to publish event"
...
This is similar with #3125.
As is suggested, we should:
- Print error with stdout/stderr from event publishing process for debuging;
- Retry sending event, especially for TaskExit, which is very important in the container lifecycle handling.
And since this is first seen after switching to per-pod shim, I guess it is related to more events per shim.
Reactions are currently unavailable