kubeadm: Detect CRIs automatically by rosti · Pull Request #69366 · kubernetes/kubernetes

rosti · 2018-10-03T11:41:10Z

What this PR does / why we need it:

In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed.

This change does that by doing the following:

Introduce a new runtime function (DetectCRISocket) that will attempt to detect a CRI socket, or return an appropriate error.
Default to using the above function if --cri-socket is not specified and CRISocket in NodeRegistrationOptions is empty.
Stop static defaulting to DefaultCRISocket. And rename it to DefaultDockerCRISocket. Its use is now narrowed to "Docker or not" distinguishment and tests.
Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet. Use that in all commands, that support --cri-socket.
Remove the deprecated --cri-socket-path flag from kubeadm config images pull and deprecate --cri-socket in kubeadm upgrade apply.

In short, if multiple CRIs are detected, we bail out with an error. If no CRI is detected, we attempt to use Docker by default and this will fail if it's not installed and we actually need the CRI for the operation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Refs kubernetes/kubeadm#1117 (probably needs docs update too)

Special notes for your reviewer:
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/kind feature
/assign @fabriziopandini
/assign @timothysc
/cc @neolit123 @bart0sh @kad

Release note:

kubeadm now attempts to detect an installed CRI by its usual domain socket, so that --cri-socket can be omitted from the command line if Docker is not used and there is a single CRI installed.

neolit123 · 2018-10-03T15:49:05Z

cmd/kubeadm/app/util/config/masterconfig.go

glog calls don't need the extra \n. appears multiple times in the diff.

better move the Infof() call in DetectCRISocket()?

shouldn't we add this for nodeconfig.go too?
JoinConfiguration also has a NodeRegistrationOptions

This code is called for JoinConfiguration.

fabriziopandini

Thanks @rosti !
I think that the approach you are proposing can be slightly improved by leveraging on the node annotation that contains the CRI used for node initialisation
Looking forward for a second pass on this! Ping me when it is ready

fabriziopandini · 2018-10-05T16:17:19Z

cmd/kubeadm/app/apis/kubeadm/v1alpha3/defaults_unix.go

Now that v1beta1 merged those changes should be implemented there.
With regards to implement changes on v1alpha3 as well, I think that we are not allowed to alter the behaviour of an existing API versions, but I'm open to discuss if this case (change of a default for improving UX) is an acceptable exception

fabriziopandini · 2018-10-05T16:20:14Z

cmd/kubeadm/app/cmd/config_test.go

v1beta1 (please revise this PR)

fabriziopandini · 2018-10-05T16:23:00Z

cmd/kubeadm/app/cmd/reset_test.go

fabriziopandini · 2018-10-05T16:30:54Z

cmd/kubeadm/app/cmd/reset.go

I think that the detection process should initially try to read the kubeadm config (that now includes also the node annotation with the cri socket used for node initialisation) and only as a fallback use cri detection.

This has two advantages:

it will work also in case for multiple cri (because the cri to use is already known)

it will enable further cleanup actions e.g. cleanup of cluster status in case of control plane nodes (out of scope of this PR)

fabriziopandini · 2018-10-05T16:33:01Z

cmd/kubeadm/app/cmd/reset.go

Might be adding a something to the flag description like e.g. this parameter should be used only in case automatic cri detection fails

cmd/kubeadm/app/cmd/upgrade/apply.go

chuckha

this is going to save me a lot of time once it merges! thanks for the PR

chuckha · 2018-10-08T16:33:05Z

cmd/kubeadm/app/util/runtime/runtime.go

i wonder it would be a better or worse experience to print a warning that multiple sockets were detected and kubeadm is continuing with the first one it found? My thought is that this would allow more users to get started without having to specify a specific socket to use, but still be aware there may be a problem.

Every possible solution, from UX perspective, is a double edged sword.

On one hand, if we bail with error (the current version) the user will have to re-run the command with correct --cri-socket passed in. This may be annoying to some users and it's still not guaranteed, that they will provide the correct CRI.

On other hand, if we warn the user and just pick up the first one, then the user may ignore or not see the warning and end up in a mess (requiring reset).

For me the first option is more viable in the case of multiple CRIs and therefore I picked that solution.

bart0sh · 2018-10-16T15:07:03Z

cmd/kubeadm/app/util/runtime/runtime.go

Checking existence of the socket is not enough from my point of view. I'd suggest to also check if CRI API is accessible through socket.

maybe we should do https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/cri#RuntimeVersioner
or something that is known to be supported by all implementers.

crictl info should be enough, I believe.

I don't think, that we need to make the check too complex. Even checking, that the path leads to a domain socket is a bit of an overkill on my part. The sockets we are checking for at the moment, cannot be created without root privileges and I doubt that any process running as root could create an exact socket path by accident.

My opinion is to keep things as simple as possible. If there aren't any use cases, where someone could deliberately setup an invalid CRI socket, then I don't think we should add any additional checks for it.

neolit123 · 2018-10-29T14:28:16Z

we probably want this in 1.13?

rosti · 2018-10-29T17:00:28Z

@neolit123 I'll try to get to it this week or the next one.

rosti · 2018-11-02T10:35:28Z

This should be ready for review now.

bart0sh · 2018-11-02T11:23:59Z

cmd/kubeadm/app/util/runtime/runtime.go

I didn't get the point. Why not to add /var/run/dockershim.sock here and remove duplicate check below?

/var/run/dockershim.sock is backed by kubelet, which is in continuous restart cycle if the node is uninitialized (like before kubeadm init or join).
Therefore the check for /var/run/dockershim.sock is unreliable and it actually was not working in something like 80% of the time I tested it on a fresh machine.

bart0sh · 2018-11-02T11:29:32Z

cmd/kubeadm/app/util/runtime/runtime.go

Does this mean that if docker is running this function will always return /var/run/docker.sock ? Would it be better to change the order of known sockets? If CRI-O or containerd is configured and running should we prefer those even if docker is running?

No, this means, that if you have only one runtime environment (no matter if Docker or CRI socket backed one) it will use that. If there is no runtime environment detected or multiple ones are detected (for example Docker & CRI-O) it will display error message and force the user to use --cri-socket to supply the socket.

neolit123 · 2019-01-15T15:49:51Z

cmd/kubeadm/app/cmd/alpha/preflight.go

possibly a TODO leftover?

cmd/kubeadm/app/cmd/upgrade/apply.go

neolit123 · 2019-01-15T15:56:03Z

/assign @bart0sh

rosti · 2019-01-16T09:57:29Z

/test pull-kubernetes-e2e-kops-aws

fabriziopandini

@rosti I really appreciate you are keeping up in this effort for so a long time!
IMO the PR is shaping out nicely; I left some minor comments, but nothing blocking and I'm ready to lgtm asap. in the meantime
/approve

PS. it would be great if during the grand discussion about CI test improvement we define an MVP for different CRI as well

fabriziopandini · 2019-01-20T09:37:14Z

cmd/kubeadm/app/cmd/config.go

Side note:
It seems to me that the defaulted configuration is getting more and more driven by the need of passing validation and unit tests, vs being driven by UX needs. This is not optimal and should probably be addressed in the long term.

Indeed this is the case. Especially the fact, that we carry NodeRegistrationOptions around as part of the InitConfiguration, which we fetched up only to use something from ClusterConfiguration or the LocalAPIEndpoint. This in itself triggers the CRI autodetect code too often and sometimes without needing it at all.
In my opinion, in the long run, we have to decouple the configs and fetch them as we need them from the cluster or config file(s). I believe, that this is what @luxas would want too.

cmd/kubeadm/app/cmd/init.go

cmd/kubeadm/app/cmd/join.go

cmd/kubeadm/app/cmd/upgrade/apply.go

cmd/kubeadm/app/cmd/util/cmdutil.go

cmd/kubeadm/app/cmd/config.go

cmd/kubeadm/app/apis/kubeadm/v1beta1/defaults_unix.go

cmd/kubeadm/app/apis/kubeadm/v1beta1/defaults_windows.go

In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed. This change does that by doing the following: - Introduce a new runtime function (DetectCRISocket) that will attempt to detect a CRI socket, or return an appropriate error. - Default to using the above function if --cri-socket is not specified and CRISocket in NodeRegistrationOptions is empty. - Stop static defaulting to DefaultCRISocket. And rename it to DefaultDockerCRISocket. Its use is now narrowed to "Docker or not" distinguishment and tests. - Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet. Use that in all commands, that support --cri-socket. - Remove the deprecated --cri-socket-path flag from kubeadm config images pull and deprecate --cri-socket in kubeadm upgrade apply. Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>

fabriziopandini · 2019-01-23T14:34:09Z

@rosti well done!
/lgtm
/approve

k8s-ci-robot · 2019-01-23T14:34:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, rosti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fabriziopandini · 2019-01-23T17:00:53Z

/test pull-kubernetes-e2e-kops-aws

inercia · 2019-01-23T21:52:31Z

cmd/kubeadm/app/cmd/upgrade/apply.go

+
+	// The CRI socket flag is deprecated here, since it should be taken from the NodeRegistrationOptions for the current
+	// node instead of the command line. This prevents errors by the users (such as attempts to use wrong CRI during upgrade).
+	cmdutil.AddCRISocketFlag(cmd.Flags(), &flags.criSocket)


Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

i had concerns about this myself but forgot to mention at today's office hours. :
maybe we still want a flag that is set to a value of "auto" by default.

This is error prone and is there only for historical reasons. It's there, because in the past we did not keep the CRI socket in the cluster on per node basis.
As long as the option is still there, you can use it, but it's now deprecated.
Anyway, forcing the CRI should be done upon init/join and then leave kubeadm to use that socket for all other operations. In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

do you think we should undeprecated the flag and have it with a value of "auto" by default?

I don't think so, it's best that this flag is removed altogether. The only thing, that users can attempt to do with this flag is to change the CRI upon upgrade. However, I don't think, that this is going to work properly and I don't think, that we should support such user story.

but how would we handle the scenario @inercia mentioned:

Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

i.e. multiple CRIs installed. how would the user pick one of the sockets?

The persisted NodeRegistrationOptions in the cluster for this node is going to contain a CRI socket, that was setup upon init/join. Hence, no detection will be required.

k8s-ci-robot assigned fabriziopandini Oct 3, 2018

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 3, 2018

k8s-ci-robot assigned timothysc Oct 3, 2018

k8s-ci-robot requested review from bart0sh, kad and neolit123 October 3, 2018 11:41

rosti changed the title ~~kubeadm: Detect CRIs automatically~~ [WIP] kubeadm: Detect CRIs automatically Oct 3, 2018

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2018

neolit123 reviewed Oct 3, 2018

View reviewed changes

fabriziopandini suggested changes Oct 5, 2018

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2018

chuckha reviewed Oct 8, 2018

View reviewed changes

bart0sh reviewed Oct 16, 2018

View reviewed changes

timothysc added this to the v1.13 milestone Nov 1, 2018

rosti force-pushed the cri-auto-detect branch from 00ef7ed to 1b7b7cd Compare November 2, 2018 10:34

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2018

rosti changed the title ~~[WIP] kubeadm: Detect CRIs automatically~~ kubeadm: Detect CRIs automatically Nov 2, 2018

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2018

bart0sh reviewed Nov 2, 2018

View reviewed changes

rosti changed the title ~~kubeadm: Detect CRIs automatically~~ [WIP] kubeadm: Detect CRIs automatically Nov 2, 2018

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2018

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018

hellolijj mentioned this pull request Nov 10, 2018

kubeadm reset support kill relative pods to release occupied port AliyunContainerService/pouch#2389

Closed

rosti mentioned this pull request Jan 15, 2019

Bump Docker supported version to 18.09 #72823

Merged

rosti force-pushed the cri-auto-detect branch from ec9305f to 53c9f3e Compare January 15, 2019 15:27

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2019

rosti changed the title ~~[WIP] kubeadm: Detect CRIs automatically~~ kubeadm: Detect CRIs automatically Jan 15, 2019

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2019

rosti force-pushed the cri-auto-detect branch from 53c9f3e to f7beaa6 Compare January 15, 2019 15:34

neolit123 reviewed Jan 15, 2019

View reviewed changes

k8s-ci-robot assigned bart0sh Jan 15, 2019

rosti force-pushed the cri-auto-detect branch from f7beaa6 to d3a5dea Compare January 15, 2019 16:57

fabriziopandini approved these changes Jan 20, 2019

View reviewed changes

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 20, 2019

rosti force-pushed the cri-auto-detect branch from d3a5dea to f97770b Compare January 21, 2019 14:22

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2019

rosti mentioned this pull request Jan 21, 2019

Deprecate and remove --cri-socket flag from kubeadm upgrade apply kubernetes/kubeadm#1356

Closed

2 tasks

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2019

k8s-ci-robot merged commit b66e332 into kubernetes:master Jan 23, 2019

inercia reviewed Jan 23, 2019

View reviewed changes

rosti mentioned this pull request Jan 25, 2019

kubeadm: Fix auto CRI detection in kubeadm reset #73316

Merged

This was referenced Jan 30, 2019

Upgrading to 1.13.1 annotates wrong cri socket kubernetes/kubeadm#1322

Closed

kubeadm: fix incorrect criSocket override #73521

Closed

rosti mentioned this pull request Jan 31, 2019

kubeadm: Document CRI auto detection functionality kubernetes/website#12462

Merged

Conversation

rosti commented Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neolit123 Oct 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fabriziopandini left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chuckha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 Oct 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neolit123 commented Oct 29, 2018

Uh oh!

rosti commented Oct 29, 2018

Uh oh!

rosti commented Nov 2, 2018

Uh oh!

bart0sh Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

neolit123 commented Jan 15, 2019

Uh oh!

rosti commented Jan 16, 2019

Uh oh!

fabriziopandini left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fabriziopandini commented Jan 23, 2019

rosti commented Oct 3, 2018 •

edited

Loading

neolit123 Oct 3, 2018 •

edited

Loading

neolit123 Oct 16, 2018 •

edited

Loading

bart0sh Nov 2, 2018 •

edited

Loading

rosti Jan 24, 2019 •

edited

Loading