Skip to content

spurious FailedSync and FailedMount events 2 minutes after pod terminates #49663

@sjenning

Description

@sjenning

Running hack/local-up-cluster.sh with KEEP_TERMINATED_POD_VOLUMES=false, start a busybox pod that terminates after 10s and does not restart:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    command:
    - sleep
    - "10"
  terminationGracePeriodSeconds: 0
  restartPolicy: Never

In ~10s, the pod goes to Complete. 2 minutes later two events occur:

2017-07-26 16:32:54 -0500 CDT   2017-07-26 16:32:54 -0500 CDT   1         busybox   Pod                 Warning   FailedMount   kubelet, 127.0.0.1   Unable to mount volumes for pod "busybox_default(aaa4c2ce-7249-11e7-8f15-7085c20cf2ab)": timeout expired waiting for volumes to attach/mount for pod "default"/"busybox". list of unattached/unmounted volumes=[default-token-3dgs5]
2017-07-26 16:32:54 -0500 CDT   2017-07-26 16:32:54 -0500 CDT   1         busybox   Pod                 Warning   FailedSync   kubelet, 127.0.0.1   Error syncing pod

The occurs because a syncPod() is called from a pod worker after the volumes have already been unmounted, because the pod is terminated, and kl.volumeManager.WaitForAttachAndMount(pod) hangs for 2 minutes then times out as the volume reconciler will not remount these volumes.

xref openshift/origin#14383

@derekwaynecarr @eparis @smarterclayton @saad-ali

/sig node

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions