kubeadm: Detect CRIs automatically#69366
Conversation
There was a problem hiding this comment.
- glog calls don't need the extra
\n. appears multiple times in the diff. - better move the Infof() call in DetectCRISocket()?
- shouldn't we add this for nodeconfig.go too?
JoinConfiguration also has a NodeRegistrationOptions
There was a problem hiding this comment.
This code is called for JoinConfiguration.
fabriziopandini
left a comment
There was a problem hiding this comment.
Thanks @rosti !
I think that the approach you are proposing can be slightly improved by leveraging on the node annotation that contains the CRI used for node initialisation
Looking forward for a second pass on this! Ping me when it is ready
There was a problem hiding this comment.
Now that v1beta1 merged those changes should be implemented there.
With regards to implement changes on v1alpha3 as well, I think that we are not allowed to alter the behaviour of an existing API versions, but I'm open to discuss if this case (change of a default for improving UX) is an acceptable exception
cmd/kubeadm/app/cmd/config_test.go
Outdated
There was a problem hiding this comment.
v1beta1 (please revise this PR)
cmd/kubeadm/app/cmd/reset_test.go
Outdated
cmd/kubeadm/app/cmd/reset.go
Outdated
There was a problem hiding this comment.
I think that the detection process should initially try to read the kubeadm config (that now includes also the node annotation with the cri socket used for node initialisation) and only as a fallback use cri detection.
This has two advantages:
- it will work also in case for multiple cri (because the cri to use is already known)
- it will enable further cleanup actions e.g. cleanup of cluster status in case of control plane nodes (out of scope of this PR)
cmd/kubeadm/app/cmd/reset.go
Outdated
There was a problem hiding this comment.
Might be adding a something to the flag description like e.g. this parameter should be used only in case automatic cri detection fails
chuckha
left a comment
There was a problem hiding this comment.
this is going to save me a lot of time once it merges! thanks for the PR
There was a problem hiding this comment.
i wonder it would be a better or worse experience to print a warning that multiple sockets were detected and kubeadm is continuing with the first one it found? My thought is that this would allow more users to get started without having to specify a specific socket to use, but still be aware there may be a problem.
There was a problem hiding this comment.
Every possible solution, from UX perspective, is a double edged sword.
On one hand, if we bail with error (the current version) the user will have to re-run the command with correct --cri-socket passed in. This may be annoying to some users and it's still not guaranteed, that they will provide the correct CRI.
On other hand, if we warn the user and just pick up the first one, then the user may ignore or not see the warning and end up in a mess (requiring reset).
For me the first option is more viable in the case of multiple CRIs and therefore I picked that solution.
There was a problem hiding this comment.
Checking existence of the socket is not enough from my point of view. I'd suggest to also check if CRI API is accessible through socket.
There was a problem hiding this comment.
maybe we should do https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/cri#RuntimeVersioner
or something that is known to be supported by all implementers.
There was a problem hiding this comment.
crictl info should be enough, I believe.
There was a problem hiding this comment.
I don't think, that we need to make the check too complex. Even checking, that the path leads to a domain socket is a bit of an overkill on my part. The sockets we are checking for at the moment, cannot be created without root privileges and I doubt that any process running as root could create an exact socket path by accident.
My opinion is to keep things as simple as possible. If there aren't any use cases, where someone could deliberately setup an invalid CRI socket, then I don't think we should add any additional checks for it.
|
we probably want this in 1.13? |
|
@neolit123 I'll try to get to it this week or the next one. |
00ef7ed to
1b7b7cd
Compare
|
This should be ready for review now. |
There was a problem hiding this comment.
I didn't get the point. Why not to add /var/run/dockershim.sock here and remove duplicate check below?
There was a problem hiding this comment.
/var/run/dockershim.sock is backed by kubelet, which is in continuous restart cycle if the node is uninitialized (like before kubeadm init or join).
Therefore the check for /var/run/dockershim.sock is unreliable and it actually was not working in something like 80% of the time I tested it on a fresh machine.
There was a problem hiding this comment.
Does this mean that if docker is running this function will always return /var/run/docker.sock ? Would it be better to change the order of known sockets? If CRI-O or containerd is configured and running should we prefer those even if docker is running?
There was a problem hiding this comment.
No, this means, that if you have only one runtime environment (no matter if Docker or CRI socket backed one) it will use that. If there is no runtime environment detected or multiple ones are detected (for example Docker & CRI-O) it will display error message and force the user to use --cri-socket to supply the socket.
ec9305f to
53c9f3e
Compare
53c9f3e to
f7beaa6
Compare
|
/assign @bart0sh |
f7beaa6 to
d3a5dea
Compare
|
/test pull-kubernetes-e2e-kops-aws |
fabriziopandini
left a comment
There was a problem hiding this comment.
@rosti I really appreciate you are keeping up in this effort for so a long time!
IMO the PR is shaping out nicely; I left some minor comments, but nothing blocking and I'm ready to lgtm asap. in the meantime
/approve
PS. it would be great if during the grand discussion about CI test improvement we define an MVP for different CRI as well
cmd/kubeadm/app/cmd/config.go
Outdated
There was a problem hiding this comment.
Side note:
It seems to me that the defaulted configuration is getting more and more driven by the need of passing validation and unit tests, vs being driven by UX needs. This is not optimal and should probably be addressed in the long term.
There was a problem hiding this comment.
Indeed this is the case. Especially the fact, that we carry NodeRegistrationOptions around as part of the InitConfiguration, which we fetched up only to use something from ClusterConfiguration or the LocalAPIEndpoint. This in itself triggers the CRI autodetect code too often and sometimes without needing it at all.
In my opinion, in the long run, we have to decouple the configs and fetch them as we need them from the cluster or config file(s). I believe, that this is what @luxas would want too.
In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed. This change does that by doing the following: - Introduce a new runtime function (DetectCRISocket) that will attempt to detect a CRI socket, or return an appropriate error. - Default to using the above function if --cri-socket is not specified and CRISocket in NodeRegistrationOptions is empty. - Stop static defaulting to DefaultCRISocket. And rename it to DefaultDockerCRISocket. Its use is now narrowed to "Docker or not" distinguishment and tests. - Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet. Use that in all commands, that support --cri-socket. - Remove the deprecated --cri-socket-path flag from kubeadm config images pull and deprecate --cri-socket in kubeadm upgrade apply. Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
d3a5dea to
f97770b
Compare
|
@rosti well done! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini, rosti The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-kubernetes-e2e-kops-aws |
|
|
||
| // The CRI socket flag is deprecated here, since it should be taken from the NodeRegistrationOptions for the current | ||
| // node instead of the command line. This prevents errors by the users (such as attempts to use wrong CRI during upgrade). | ||
| cmdutil.AddCRISocketFlag(cmd.Flags(), &flags.criSocket) |
There was a problem hiding this comment.
Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?
There was a problem hiding this comment.
i had concerns about this myself but forgot to mention at today's office hours. :
maybe we still want a flag that is set to a value of "auto" by default.
There was a problem hiding this comment.
This is error prone and is there only for historical reasons. It's there, because in the past we did not keep the CRI socket in the cluster on per node basis.
As long as the option is still there, you can use it, but it's now deprecated.
Anyway, forcing the CRI should be done upon init/join and then leave kubeadm to use that socket for all other operations. In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.
There was a problem hiding this comment.
In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.
do you think we should undeprecated the flag and have it with a value of "auto" by default?
There was a problem hiding this comment.
I don't think so, it's best that this flag is removed altogether. The only thing, that users can attempt to do with this flag is to change the CRI upon upgrade. However, I don't think, that this is going to work properly and I don't think, that we should support such user story.
There was a problem hiding this comment.
but how would we handle the scenario @inercia mentioned:
Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?
i.e. multiple CRIs installed. how would the user pick one of the sockets?
There was a problem hiding this comment.
The persisted NodeRegistrationOptions in the cluster for this node is going to contain a CRI socket, that was setup upon init/join. Hence, no detection will be required.
What this PR does / why we need it:
In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed.
This change does that by doing the following:
--cri-socketis not specified and CRISocket in NodeRegistrationOptions is empty.--cri-socketflag to a flagSet. Use that in all commands, that support--cri-socket.--cri-socket-pathflag from kubeadm config images pull and deprecate--cri-socketin kubeadm upgrade apply.In short, if multiple CRIs are detected, we bail out with an error. If no CRI is detected, we attempt to use Docker by default and this will fail if it's not installed and we actually need the CRI for the operation.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Refs kubernetes/kubeadm#1117 (probably needs docs update too)
Special notes for your reviewer:
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/kind feature
/assign @fabriziopandini
/assign @timothysc
/cc @neolit123 @bart0sh @kad
Release note: