-
Notifications
You must be signed in to change notification settings - Fork 43.2k
Hung volumes can wedge the kubelet #31272
Copy link
Copy link
Open
Labels
area/kubeletkind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/storageCategorizes an issue or PR as relevant to SIG Storage.Categorizes an issue or PR as relevant to SIG Storage.
Metadata
Metadata
Assignees
Labels
area/kubeletkind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/storageCategorizes an issue or PR as relevant to SIG Storage.Categorizes an issue or PR as relevant to SIG Storage.
Type
Fields
Give feedbackNo fields configured for issues without a type.
If you have pods that use something like NFS storage, if the system is unable to read the mounted directory, or unmount it, it is possible to completely wedge the kubelet such that it can't successfully run any new pods that use volumes (which is basically all, if they use secret tokens) until either the storage issue is resolved, or you restart the kubelet.
To reproduce:
kubectl run --rm --attach --restart Never --image busybox bbox date)The busybox pod will be stuck ContainerCreating with events such as these:
In this stack trace I gathered after I deleted the pod, it shows that the volume reconciler is still trying to get the volumes for the pod I just deleted. You'll also see a goroutine trying to stop the Docker container, but it is stuck.
In this stack trace I gathered after I tried to create the bbox pod, it shows that the new pod (bbox) is waiting for its volumes to attach/mount (in this case, secrets).
We've seen this in 1.2.x and I just reproduced it in master (commit f297ea9).
cc @kubernetes/sig-storage @kubernetes/sig-node @kubernetes/rh-cluster-infra @pmorie @derekwaynecarr @timothysc @saad-ali