Skip to content

v1.10 backports 2022-02-17#18835

Merged
gandro merged 4 commits intocilium:v1.10from
nebril:pr/v1.10-backport-2022-02-17
Mar 3, 2022
Merged

v1.10 backports 2022-02-17#18835
gandro merged 4 commits intocilium:v1.10from
nebril:pr/v1.10-backport-2022-02-17

Conversation

@nebril
Copy link
Copy Markdown
Member

@nebril nebril commented Feb 17, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18762 18486; do contrib/backporting/set-labels.py $pr done 1.10; done

or with

$ make add-label branch=v1.10 issues=18762

@nebril nebril requested a review from a team as a code owner February 17, 2022 10:37
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Feb 17, 2022
@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 17, 2022

/test-backport-1.10

@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 17, 2022

/test-backport-1.10

@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 21, 2022

/ci-aks-1.10

@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 21, 2022

/test-1.17-4.9

@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 21, 2022

/test-1.18-4.9

@nebril
Copy link
Copy Markdown
Member Author

nebril commented Feb 21, 2022

/test-1.19-4.9

@joestringer
Copy link
Copy Markdown
Member

ConformanceEKS run failed during initialization (waiting for services to become ready) after reinstalling with encryption.
k8s-1.17-kernel-4.9 and k8s-1.19-kernel-4.9 both timed out, it seems like there's frequently issues with kube-dns not being ready. In the 1.17 run for instance, K8sDatapathConfig Host firewall With VXLAN and endpoint routes failed due to kubernetes not becoming ready in time. Other tests demonstrate cilium status complaining about one of the controllers for assessing cilium-health endpoint happiness, cilium-health-ep 2m23s ago 14s ago 2 Get "http://10.0.0.23:4240/hello": dial tcp 10.0.0.23:4240: connect: no route to host.

I'll kick each of these, hopefully there isn't breakage introduced into the v1.10 tree.

@joestringer
Copy link
Copy Markdown
Member

/ci-eks-1.10

@joestringer
Copy link
Copy Markdown
Member

/test-1.17-4.9

@joestringer
Copy link
Copy Markdown
Member

/test-1.19-4.9

@joestringer
Copy link
Copy Markdown
Member

Seems like there's something that needs investigation here in regards to the ConformanceEKS action. The other flaky tests are also making me nervous. Not planning to merge this before the upcoming release. @cilium/tophat , please investigate.

jaffcheng and others added 4 commits March 2, 2022 21:29
[ upstream commit 76e3aac ]

error message:

panic: descriptor Desc{fqName: "cilium_operator_alibaba-cloud_api_duration_seconds", help:
"Duration of interactions with API", constLabels: {}, variableLabels: [operation response_code]} is invalid:
"cilium_operator_alibaba-cloud_api_duration_seconds" is not a valid metric name

Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 842f6c8 ]

Currently, cilium-agent using alibaba ipam mode doesn't
respect pre-allocate configuration from CNI config file when
creating ciliumnode resource, and the value of pre-allocate
is always the default value 8.

This patch makes this option configurable via CNI config.

Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
To prevent situations in which the GKE node is forcibly stopped and
re-created from causing unmanaged pods, and building on the observation
that the node comes back with the same name and pods are already
scheduled there, we change the recommended taint effect from NoSchedule
to NoExecute, to cause any previously scheduled pods to be evicted,
preventing them from getting IPs assigned by the default CNI. This
should not impact other environments due to the nature of 'NoExecute',
so we recommend it everywhere.

[ upstream commit b049574 ]

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Co-authored-by: Tam Mach <sayboras@yahoo.com>
The changes that we have been doing to /etc/defaults/kubelet are reset
on node reboots, as is apparently the whole /etc directory --- which
also means that /etc/cni/net.d/05-cilium.conf is removed.

This would not be a problem if the assumption we made that the node taint we
recommend placing on the nodes would come back upon reboots held true, but in
practice it doesn't.

Besides this, it seems that containerd will re-instante its CNI
configuration file, and it will do so way before Cilium has had the
chance to re-run on the node and re-create its CNI configuration,
causing pods to be assigned IPs by the default CNI rather than by Cilium
in the meantime.

This commit attempts at preventing that from happening by observing that
/home/kubernetes/bin/kubelet (i.e. the actual kubelet binary) is kept between
reboots and executed concurrently with containerd by systemd. We leverage on
this empirical observation to replace this file kubelet with a wrapper script
that, under the required conditions, disables containerd, patches its
configuration, removes undesired CNI configuration files, re-enables
containerd and becomes the kubelet.

[ upstream commit 36585e4 ]

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Co-authored-by: Alexandre Perrin <alex@kaworu.ch>
Co-authored-by: Chris Tarazi <chris@isovalent.com>
@pchaigno pchaigno force-pushed the pr/v1.10-backport-2022-02-17 branch from d741308 to 0be8727 Compare March 2, 2022 20:29
@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 2, 2022

Taking over.

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 2, 2022

/test-backport-1.10

@pchaigno pchaigno self-assigned this Mar 2, 2022
@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 3, 2022

Previous run hit a timeout: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/1927/.
/test-1.18-4.9

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 3, 2022

Previous run hit a timeout: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.19/1723/.
/test-1.20-4.19

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Mar 3, 2022

Given the changes in this PR affect Alibaba IPAM and GKE, it's unlikely any of these CI jobs would be affected anyway. Reviews are in. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 3, 2022
@pchaigno pchaigno assigned pchaigno and unassigned pchaigno Mar 3, 2022
@gandro gandro merged commit fc4bf52 into cilium:v1.10 Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants