-
Notifications
You must be signed in to change notification settings - Fork 42.8k
Kubelet does not launch static pods while waiting for bootstrapping #68686
Description
The kubelet currently blocks trying to attempt to get a bootstrap client before it starts its worker loops for running static pods, which means a static pod that provides functionality on a node that bootstrapping may need (talking to an HSM, calling out externally, providing a plugin for bootstrapping, or even running a control plane pod) is blocked. There is no actual functionality in the Kubelet that depends on this (waiting for bootstrap has no use case value), it's just a historical legacy of how the code evolved. As people start to leverage bootstrapping and rotation, it becomes more obvious how this doesn't work in practice.
In the long run we want bootstrap credentials to be pluggable and extensible so that an HSM or other trust store on the machine can deal with the problem. Having secret key material for bootstrapping stored on disk that allows you to join a cluster is undesirable, and having a callout to make that flow less error prone is very desirable for both cloud and metal provisioned environments.
In addition, the bootstrapping and client cert rotation flows are slightly duplicated today - using the client cert to rotate itself means that when a client cert expires (very possible with quicker rotations if the api goes down for a period of time) the bootstrap credential is consulted - but that could be arbitrarily far in the future and means we tend to fail in those cases (bootstrap credential expired). Since the long term desire is for bootstrap to be out of process, it would be better for us to continually go back to bootstrap on cert expiration rather than retrying our flow.
For 1.13, we should fix the kubelet to not pause when bootstrapping and instead delegate to the rotation flow to provide that credential. This has the side effect of reducing duplicated code for procuring CSRs between the two code paths, and hiding the details of client cert acquisition inside of the cert manager so that future changes can allow call outs.
@awly @mikedanese as discussed this week, @kubernetes/sig-auth-misc / @kubernetes/sig-node-bugs, @liggitt because you and discussed this, @aaronlevy because we wanted to push allow static pods for bootstrap control planes in bootkube not to block when using bootstrapping kubelets..