Skip to content

kubeadm: Detect CRIs automatically#69366

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
rosti:cri-auto-detect
Jan 23, 2019
Merged

kubeadm: Detect CRIs automatically#69366
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
rosti:cri-auto-detect

Conversation

@rosti
Copy link
Copy Markdown
Contributor

@rosti rosti commented Oct 3, 2018

What this PR does / why we need it:

In order to allow for a smoother UX with CRIs different than Docker, we have to make the --cri-socket command line flag optional when just one CRI is installed.

This change does that by doing the following:

  • Introduce a new runtime function (DetectCRISocket) that will attempt to detect a CRI socket, or return an appropriate error.
  • Default to using the above function if --cri-socket is not specified and CRISocket in NodeRegistrationOptions is empty.
  • Stop static defaulting to DefaultCRISocket. And rename it to DefaultDockerCRISocket. Its use is now narrowed to "Docker or not" distinguishment and tests.
  • Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet. Use that in all commands, that support --cri-socket.
  • Remove the deprecated --cri-socket-path flag from kubeadm config images pull and deprecate --cri-socket in kubeadm upgrade apply.

In short, if multiple CRIs are detected, we bail out with an error. If no CRI is detected, we attempt to use Docker by default and this will fail if it's not installed and we actually need the CRI for the operation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Refs kubernetes/kubeadm#1117 (probably needs docs update too)

Special notes for your reviewer:
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/kind feature
/assign @fabriziopandini
/assign @timothysc
/cc @neolit123 @bart0sh @kad

Release note:

kubeadm now attempts to detect an installed CRI by its usual domain socket, so that --cri-socket can be omitted from the command line if Docker is not used and there is a single CRI installed.

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 3, 2018
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 3, 2018
@rosti rosti changed the title kubeadm: Detect CRIs automatically [WIP] kubeadm: Detect CRIs automatically Oct 3, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2018
Copy link
Copy Markdown
Member

@neolit123 neolit123 Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • glog calls don't need the extra \n. appears multiple times in the diff.
  • better move the Infof() call in DetectCRISocket()?
  • shouldn't we add this for nodeconfig.go too?
    JoinConfiguration also has a NodeRegistrationOptions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is called for JoinConfiguration.

Copy link
Copy Markdown
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rosti !
I think that the approach you are proposing can be slightly improved by leveraging on the node annotation that contains the CRI used for node initialisation
Looking forward for a second pass on this! Ping me when it is ready

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that v1beta1 merged those changes should be implemented there.
With regards to implement changes on v1alpha3 as well, I think that we are not allowed to alter the behaviour of an existing API versions, but I'm open to discuss if this case (change of a default for improving UX) is an acceptable exception

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1beta1 (please revise this PR)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1beta1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the detection process should initially try to read the kubeadm config (that now includes also the node annotation with the cri socket used for node initialisation) and only as a fallback use cri detection.

This has two advantages:

  1. it will work also in case for multiple cri (because the cri to use is already known)
  2. it will enable further cleanup actions e.g. cleanup of cluster status in case of control plane nodes (out of scope of this PR)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be adding a something to the flag description like e.g. this parameter should be used only in case automatic cri detection fails

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2018
Copy link
Copy Markdown
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to save me a lot of time once it merges! thanks for the PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder it would be a better or worse experience to print a warning that multiple sockets were detected and kubeadm is continuing with the first one it found? My thought is that this would allow more users to get started without having to specify a specific socket to use, but still be aware there may be a problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every possible solution, from UX perspective, is a double edged sword.

On one hand, if we bail with error (the current version) the user will have to re-run the command with correct --cri-socket passed in. This may be annoying to some users and it's still not guaranteed, that they will provide the correct CRI.

On other hand, if we warn the user and just pick up the first one, then the user may ignore or not see the warning and end up in a mess (requiring reset).

For me the first option is more viable in the case of multiple CRIs and therefore I picked that solution.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking existence of the socket is not enough from my point of view. I'd suggest to also check if CRI API is accessible through socket.

Copy link
Copy Markdown
Member

@neolit123 neolit123 Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should do https://godoc.org/k8s.io/kubernetes/pkg/kubelet/apis/cri#RuntimeVersioner
or something that is known to be supported by all implementers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crictl info should be enough, I believe.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think, that we need to make the check too complex. Even checking, that the path leads to a domain socket is a bit of an overkill on my part. The sockets we are checking for at the moment, cannot be created without root privileges and I doubt that any process running as root could create an exact socket path by accident.

My opinion is to keep things as simple as possible. If there aren't any use cases, where someone could deliberately setup an invalid CRI socket, then I don't think we should add any additional checks for it.

@neolit123
Copy link
Copy Markdown
Member

we probably want this in 1.13?

@rosti
Copy link
Copy Markdown
Contributor Author

rosti commented Oct 29, 2018

@neolit123 I'll try to get to it this week or the next one.

@timothysc timothysc added this to the v1.13 milestone Nov 1, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2018
@rosti rosti changed the title [WIP] kubeadm: Detect CRIs automatically kubeadm: Detect CRIs automatically Nov 2, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2018
@rosti
Copy link
Copy Markdown
Contributor Author

rosti commented Nov 2, 2018

This should be ready for review now.

Copy link
Copy Markdown
Contributor

@bart0sh bart0sh Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the point. Why not to add /var/run/dockershim.sock here and remove duplicate check below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/var/run/dockershim.sock is backed by kubelet, which is in continuous restart cycle if the node is uninitialized (like before kubeadm init or join).
Therefore the check for /var/run/dockershim.sock is unreliable and it actually was not working in something like 80% of the time I tested it on a fresh machine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if docker is running this function will always return /var/run/docker.sock ? Would it be better to change the order of known sockets? If CRI-O or containerd is configured and running should we prefer those even if docker is running?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this means, that if you have only one runtime environment (no matter if Docker or CRI socket backed one) it will use that. If there is no runtime environment detected or multiple ones are detected (for example Docker & CRI-O) it will display error message and force the user to use --cri-socket to supply the socket.

@rosti rosti changed the title kubeadm: Detect CRIs automatically [WIP] kubeadm: Detect CRIs automatically Nov 2, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2018
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 15, 2019
@rosti rosti changed the title [WIP] kubeadm: Detect CRIs automatically kubeadm: Detect CRIs automatically Jan 15, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2019
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly a TODO leftover?

@neolit123
Copy link
Copy Markdown
Member

/assign @bart0sh

@rosti
Copy link
Copy Markdown
Contributor Author

rosti commented Jan 16, 2019

/test pull-kubernetes-e2e-kops-aws

Copy link
Copy Markdown
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosti I really appreciate you are keeping up in this effort for so a long time!
IMO the PR is shaping out nicely; I left some minor comments, but nothing blocking and I'm ready to lgtm asap. in the meantime
/approve

PS. it would be great if during the grand discussion about CI test improvement we define an MVP for different CRI as well

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note:
It seems to me that the defaulted configuration is getting more and more driven by the need of passing validation and unit tests, vs being driven by UX needs. This is not optimal and should probably be addressed in the long term.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is the case. Especially the fact, that we carry NodeRegistrationOptions around as part of the InitConfiguration, which we fetched up only to use something from ClusterConfiguration or the LocalAPIEndpoint. This in itself triggers the CRI autodetect code too often and sometimes without needing it at all.
In my opinion, in the long run, we have to decouple the configs and fetch them as we need them from the cluster or config file(s). I believe, that this is what @luxas would want too.

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 20, 2019
In order to allow for a smoother UX with CRIs different than Docker, we have to
make the --cri-socket command line flag optional when just one CRI is
installed.

This change does that by doing the following:

- Introduce a new runtime function (DetectCRISocket) that will attempt to
  detect a CRI socket, or return an appropriate error.
- Default to using the above function if --cri-socket is not specified and
  CRISocket in NodeRegistrationOptions is empty.
- Stop static defaulting to DefaultCRISocket. And rename it to
  DefaultDockerCRISocket. Its use is now narrowed to "Docker or not"
  distinguishment and tests.
- Introduce AddCRISocketFlag function that adds --cri-socket flag to a flagSet.
  Use that in all commands, that support --cri-socket.
- Remove the deprecated --cri-socket-path flag from kubeadm config images pull
  and deprecate --cri-socket in kubeadm upgrade apply.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2019
@fabriziopandini
Copy link
Copy Markdown
Member

@rosti well done!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2019
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, rosti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fabriziopandini
Copy link
Copy Markdown
Member

/test pull-kubernetes-e2e-kops-aws

@k8s-ci-robot k8s-ci-robot merged commit b66e332 into kubernetes:master Jan 23, 2019

// The CRI socket flag is deprecated here, since it should be taken from the NodeRegistrationOptions for the current
// node instead of the command line. This prevents errors by the users (such as attempts to use wrong CRI during upgrade).
cmdutil.AddCRISocketFlag(cmd.Flags(), &flags.criSocket)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had concerns about this myself but forgot to mention at today's office hours. :
maybe we still want a flag that is set to a value of "auto" by default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is error prone and is there only for historical reasons. It's there, because in the past we did not keep the CRI socket in the cluster on per node basis.
As long as the option is still there, you can use it, but it's now deprecated.
Anyway, forcing the CRI should be done upon init/join and then leave kubeadm to use that socket for all other operations. In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, strictly speaking, the only operations that should allow passing of CRI sockets should be init, join and reset.

do you think we should undeprecated the flag and have it with a value of "auto" by default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, it's best that this flag is removed altogether. The only thing, that users can attempt to do with this flag is to change the CRI upon upgrade. However, I don't think, that this is going to work properly and I don't think, that we should support such user story.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how would we handle the scenario @inercia mentioned:

Does this mean there will be no way to force a specific CRI? In that case, what would happen if users have docker and crio running at the same time?

i.e. multiple CRIs installed. how would the user pick one of the sockets?

Copy link
Copy Markdown
Contributor Author

@rosti rosti Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persisted NodeRegistrationOptions in the cluster for this node is going to contain a CRI socket, that was setup upon init/join. Hence, no detection will be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants