fix: node-init restartPods should use docker if /etc/crictl.yaml not found#12894
fix: node-init restartPods should use docker if /etc/crictl.yaml not found#12894errordeveloper merged 1 commit intocilium:masterfrom InfiniteEnergy:node-init-docker
Conversation
|
Commit 9fa556c8f79c4fc8e75cf2771536bdbc17699286 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
Coverage decreased (-0.02%) to 37.11% when pulling cbfb6ba6e43f995ce782854ca7895168a461cbbe on InfiniteEnergy:node-init-docker into bfd972f on cilium:master. |
sayboras
left a comment
There was a problem hiding this comment.
Thanks for your github issue and PR 💯.
I got one question as below.
There was a problem hiding this comment.
Can you explain why -x is used here?
With your changes, docker will be used for below scenario. However, crictl should be executed intead if I am not wrong.
root@kind-control-plane:/# ls -lrt /etc/crictl.yaml
-rw-r--r-- 1 root root 56 Jul 8 21:00 /etc/crictl.yaml
root@kind-control-plane:/# cat /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
There was a problem hiding this comment.
I believe docker is the default if this hasn't been configured at all so if that file doesn't exist then use docker.
I just checked for EKS and GKE clusters that I have, both are having /etc/crictl.yaml file and both are using docker.
Might need others to confirm this.
There was a problem hiding this comment.
Can you explain why
-xis used here?
I thought this was the "exists" test; I just double checked and it is "executable" which would be problematic. I've fixed that to use -f which I think is more accurate.
There was a problem hiding this comment.
What this series of conditions is trying to determine is "What is the container runtime?" It is currently (and I'm extending this some) using /etc/crictl.yaml as a signal, but this is probably not exactly correct. In a given random installation this config file might even be configured to be elsewhere.
My thinking is to follow the current pattern and try to improve the situation. In most cases if crictl.yaml explicitly mentions docker or there is no crictl.yaml config at all then the kubelet is probably using docker since that is the current default (accoding to the docs on --container-runtime). This change (now fixed with -f) will capture how AKS behaves (no crictl file) and I don't think will break EKS/GKE where that file does exist.
Other ideas:
- Move this test down to the bottom of the if condition stack
- Have the last two conditions test for the presence of the
dockerbinary andcrictlbinary and use whichever one is found - Try to parse kubelet's commandline to look for relevant flags like
--container-runtimecurrently defaults to docker which was my thinking on "if this hasn't been configured, use docker"
--config, and then parse that- I don't really like this option. It will be much harder to implement (error prone) and still doesn't catch that the default value if no flags are specified can change, or be compiled independently to be different.
There was a problem hiding this comment.
I just confirmed on the 3 AKS scenarios i know about (from these docs)
- image
AKS Ubuntu 16.04:- This is the default
/etc/crictl.yamldoesn't exist, testing its existence works- containers are restarted to get onto CNI net
- cilium connectivity-check succeeds
- image
AKS Ubuntu 18.04- Built with
--aks-custom-headers "CustomizedUbuntu=aks-ubuntu-1804" /etc/crictl.yamldoesn't exist, testing its existence works- containers are restarted to get onto CNI net
- cilium connectivity-check succeeds
- Built with
- image
AKS Ubuntu 18.04w/ containerd-
built with
--aks-custom-headers "CustomizedUbuntu=aks-ubuntu-1804,ContainerRuntime=containerd" -
/etc/crictl.yamlis present; it setsruntime-endpoint: unix:///run/containerd/containerd.sock -
crictlbinary is not present, docker binary is. This is mentioned in the limitations section and makes probing for which runtime is in use by looking for what binaries are present problematic. -
cilium-node-init logs show restart fails
Restarting kubenet managed pods Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found bash: line 86: crictl: command not found Node initialization complete bash: line 86: crictl: command not found !!! startup-script succeeded! -
Other issues with pods failing to start, haven't fully debugged it yet b/c i think that's outside of the scope of this ticket. With
crictlnot present the restart pods technique cilium-node-init is attempting will be harder.
-
There was a problem hiding this comment.
Thanks for your detailed explanation and checking. I think your changes now are definitely good enough for the scope of issue. Docker will be used in either of below cases:
- /etc/crictl.yaml doesn't not exists
- Or
dockeris specifically mentioned in /etc/crictl.yaml.
Ortherwise, critctl will be used by default.
This script has several tests for what the container runtime situation looks like to determine how best to restart the underlying containers (going around the kubelet) so that the new networking configuration can take effect. The first test looks to see if the crictl config file is configured to use docker, but if that file doesn't exist then it fails. I believe docker is the default if this hasn't been configured at all so if that file doesn't exist then use docker. Fixes #12850 Signed-off-by: Nathan Bird <njbird@infiniteenergy.com>
|
test-me-please |
fristonio
left a comment
There was a problem hiding this comment.
Thanks for a detailed explanation in the comments.
|
CI passed, mark this one as ready to merge. PS: Add backport for 1.6, 1.7 and 1.8. I made mistake selected backport-done/1.8 😓 grrrr |
This script has several tests for what the container runtime situation
looks like to determine how best to restart the underlying containers
(going around the kubelet) so that the new networking configuration
can take effect.
The first test looks to see if the crictl config file is configured to use
docker, but if that file doesn't exist then it fails. I believe docker
is the default if this hasn't been configured at all so if that file
doesn't exist then use docker.
Fixes #12850
Signed-off-by: Nathan Bird njbird@infiniteenergy.com
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.