cilium-cni: implement cni CHECK support by squeed · Pull Request #20956 · cilium/cilium

squeed · 2022-08-17T20:15:53Z

The cni CHECK action asks the plugin to ensure that the container's networking is configured as desired.

Fortunately, the agent already exposes a "healthz"-style api; all we need to do is call it. Also, verify that the veth interface exists and is configured correctly.

The default CNI version is now v0.4.0. Cilium now supports the CNI CHECK action.

Note that, right now, very few runtimes actually call CNI CHECK. I tested this with a hacked version of containerd that called CNI CHECK periodically. Some of the other CNI maintainers are working on rolling out check support universally.

tommyp1ckles · 2022-08-19T05:36:48Z

@squeed Are we intending to change the cni version for aws-cni chaining to support newer versions of aws-vpc-cni?

plugins/cilium-cni/main.go

squeed · 2022-08-22T07:36:01Z

@squeed Are we intending to change the cni version for aws-cni chaining to support newer versions of aws-vpc-cni?

Good point, all the CNI versions should be bumped.

squeed · 2022-08-22T07:44:31Z

OK, rebased, fixed lint error, and bumped all CNI configs to v0.4.0. We might have an issue with very old flannel installations; hopefully this won't be an issue.

squeed · 2022-08-22T14:03:51Z

One last TODO: implement this for the other "chaining" modes.

aanm · 2022-08-24T11:19:17Z

@squeed do you wan to implement the chaining as part of this PR or in a follow up?

squeed · 2022-08-24T17:35:56Z

@squeed do you wan to implement the chaining as part of this PR or in a follow up?

I'll take care of it in this PR; should be ready soon.

squeed · 2022-08-25T14:18:05Z

@aanm implemented chained check too. It was easy enough.

This is ready to go. Next TODO: get check in to ContainerD :-p

aanm · 2022-08-25T15:16:54Z

@squeed it looks the CI is complaining

tommyp1ckles · 2022-08-25T16:56:35Z

plugins/cilium-cni/main.go

Question: Under what circumstance is this empty?

Great question. the CNI Result type allows you to add non-sandbox interfaces if they are relevant (e.g. the bridge on the host). The way to signify this is that the Sandbox field is empty.

tommyp1ckles · 2022-08-25T17:09:22Z

Cool stuff, looks good to me aside from CI issues @squeed Is there going to be any immediate effects from this aside from CNI compatibility. I.e. Does Kubelet perform CHECKs periodically?

squeed · 2022-08-27T11:11:46Z

Cool stuff, looks good to me aside from CI issues @squeed Is there going to be any immediate effects from this aside from CNI compatibility. I.e. Does Kubelet perform CHECKs periodically?

Right now, runtime use of CHECK is somewhat limited. ContainerD doesn't call it yet (but I hope to change that soon). Likewise, cro-o only calls it when cri-o restarts.

So, right now, there should be essentially no impact (sadly). But I am actively working on changing that.

The cni CHECK action asks the plugin to ensure that the container's networking is configured as desired. Fortunately, the agent already exposes a "healthz"-style api; all we need to do is call it. Also, verify that the veth interface exists and is configured correctly. Fixes: cilium#17251 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

CNI v0.4.0 introduces CHECK, which we support. CNI v1.0.0 no longer supports single-plugin configs, so let's switch to the list now. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed · 2022-08-27T11:20:46Z

@squeed it looks the CI is complaining

Whoops, fixed.

tklauser · 2022-08-29T14:47:44Z

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sConformance Portmap Chaining Check connectivity-check compliance with portmap chaining

Failure Output

FAIL: connectivity-check pods are not ready after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

squeed · 2022-08-30T11:40:14Z

Interesting; that test failure is with the portmap plugin, which I theoretically touched (but didn't really). I'll look more in to it.

squeed · 2022-08-31T11:35:45Z

Given that the test passes on other kernel versions, and that I haven't touched anything interesting w.r.t. portmap, I'm judging the failure to be a flake.

/test

pchaigno · 2022-09-03T17:02:56Z

Given that the test passes on other kernel versions, and that I haven't touched anything interesting w.r.t. portmap

Please note that the Portmap test only runs on that 4.9 CI job; it is skipped in other CI jobs. So if the test ends up being broken, this pull request is likely the cause. If it ends up being more flaky in the coming days, this pull request may be the cause.

tklauser · 2022-09-05T14:50:23Z

It looks like K8sConformance Portmap Chaining Check connectivity-check compliance with portmap chaining is failing consistently since Sep 1 on master in the k8s-1.16-kernel-4.9 job: https://jenkins.cilium.io/job/cilium-master-k8s-1.16-kernel-4.9/

Given that the PR was merged on Sep 1, the test also failed on this PR and Paul's comment above I'd assume this PR to be the likely culprit.

joestringer · 2022-09-05T17:04:09Z

Given that the test passes on other kernel versions, and that I haven't touched anything interesting w.r.t. portmap, I'm judging the failure to be a flake.

Just a heads up, I would expect that every flake should have a corresponding github issue. This way, other contributors can find those flakes and corroborate that they are unrelated to your changes. If you are unable to find an issue corresponding to the actual failure, then that typically suggests the issue may be related to your PR. In that case, I would encourage raising the failure for discussion in the OSS Slack #testing or #development channels. (That said, obviously the flake must be first observed at some point so we do need to use our judgement to choose when to file the flake in the first place; for that, it may be useful to search the title of the test in the issues & PRs to see if you can locate the same failure elsewhere).

pchaigno · 2022-09-06T09:05:04Z

@joestringer Note it's not a flake but a complete breakage. For those, I would expect us to fix in a few hours and thus not need an issue (as long as we warn on Slack to rebase after the fix is merged). That said, my expectation doesn't often meet reality 😞

julianwiedmann · 2022-09-06T11:52:03Z

For linkage purposes - the offending commit has been reverted with #21207.

joestringer · 2022-09-06T19:18:33Z

@joestringer Note it's not a flake but a complete breakage. For those, I would expect us to fix in a few hours and thus not need an issue (as long as we warn on Slack to rebase after the fix is merged). That said, my expectation doesn't often meet reality disappointed

I agree. My point was that prior to merging, if you believe something to be a flake, then you can search the issues to find out. It appears that we merged this PR under the assumption that the issue was a flake, but it was not actually a flake.

I'm not fussed about whether we file an issue or not for a reliable failure, filing an issue is a low-effort thing we can do to document the issue & centralize the tracking for it (and help others search for it). If we identify the issue quickly and get the author/maintainer in the loop for fixing the issue, then we don't necessarily have to file an issue; we can just collaborate on a revert or fix and then announce on slack when the issue is fixed.

squeed requested a review from a team as a code owner August 17, 2022 20:15

squeed requested a review from tommyp1ckles August 17, 2022 20:15

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 17, 2022

tklauser added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Aug 18, 2022

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 18, 2022

tklauser added area/cni Impacts the Container Networking Interface between Cilium and the orchestrator. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 18, 2022

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 18, 2022

tommyp1ckles reviewed Aug 19, 2022

View reviewed changes

plugins/cilium-cni/main.go Outdated Show resolved Hide resolved

squeed force-pushed the cni-check branch from 855b3ec to da3d383 Compare August 22, 2022 07:43

squeed force-pushed the cni-check branch from da3d383 to 3f3ae72 Compare August 22, 2022 07:45

aanm approved these changes Aug 22, 2022

View reviewed changes

squeed force-pushed the cni-check branch from 3f3ae72 to 4fbc7b3 Compare August 25, 2022 14:17

tommyp1ckles reviewed Aug 25, 2022

View reviewed changes

tommyp1ckles approved these changes Aug 25, 2022

View reviewed changes

squeed added 2 commits August 27, 2022 13:19

cni-install: bump to v0.4.0, switch to ConfList

f5c924c

CNI v0.4.0 introduces CHECK, which we support. CNI v1.0.0 no longer supports single-plugin configs, so let's switch to the list now. Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed force-pushed the cni-check branch from 4fbc7b3 to f5c924c Compare August 27, 2022 11:20

aanm merged commit fa5b1fc into cilium:master Sep 1, 2022

joestringer mentioned this pull request Sep 5, 2022

Add preliminary support for SCTP #20033

Merged

6 tasks

squeed mentioned this pull request Sep 8, 2022

Switch to CNI v0.4.0 #21240

Closed

Conversation

squeed commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tommyp1ckles commented Aug 19, 2022

Uh oh!

Uh oh!

squeed commented Aug 22, 2022

Uh oh!

squeed commented Aug 22, 2022

Uh oh!

squeed commented Aug 22, 2022

Uh oh!

aanm commented Aug 24, 2022

Uh oh!

squeed commented Aug 24, 2022

Uh oh!

squeed commented Aug 25, 2022

Uh oh!

aanm commented Aug 25, 2022

Uh oh!

tommyp1ckles Aug 25, 2022

Choose a reason for hiding this comment

Uh oh!

squeed Aug 27, 2022

Choose a reason for hiding this comment

Uh oh!

tommyp1ckles commented Aug 25, 2022

Uh oh!

squeed commented Aug 27, 2022

Uh oh!

squeed commented Aug 27, 2022

Uh oh!

tklauser commented Aug 29, 2022 • edited by maintainer-s-little-helper bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Name

Failure Output

Uh oh!

squeed commented Aug 30, 2022

Uh oh!

squeed commented Aug 31, 2022

Uh oh!

pchaigno commented Sep 3, 2022

Uh oh!

tklauser commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joestringer commented Sep 5, 2022

Uh oh!

pchaigno commented Sep 6, 2022

Uh oh!

julianwiedmann commented Sep 6, 2022

Uh oh!

joestringer commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

squeed commented Aug 17, 2022 •

edited

Loading

tklauser commented Aug 29, 2022 •

edited by maintainer-s-little-helper bot

Loading

tklauser commented Sep 5, 2022 •

edited

Loading

joestringer commented Sep 6, 2022 •

edited

Loading