Skip to content

netkit: Use NETKIT_SCRUB_NONE on primary device#35306

Merged
borkmann merged 1 commit intocilium:mainfrom
jrife:jrife/netkit-scrub-netlink
Oct 22, 2024
Merged

netkit: Use NETKIT_SCRUB_NONE on primary device#35306
borkmann merged 1 commit intocilium:mainfrom
jrife:jrife/netkit-scrub-netlink

Conversation

@jrife
Copy link
Copy Markdown
Contributor

@jrife jrife commented Oct 9, 2024

When using endpoint routes with veth, cil_to_container runs before the veth driver scrubs the packet (i.e. clears skb->mark). In netkit mode, cil_to_container is executed after scrubbing the packet. This causes traffic from originating from the host to be miscategorized leading to cases like GH-34042 where enabling netkit causes liveness+readiness probes to fail.

To address this, the IFLA_NETKIT_SCRUB and IFLA_NETKIT_PEER_SCRUB flags were recently introduced to configure whether or not netkit scrubs the packet when xmiting to the primary or peer devices, respectively. Set IFLA_NETKIT_SCRUB to NETKIT_SCRUB_NONE to ensure that any mark set in the host namespace is passed to cil_to_container when using netkit with endpoint routes. Set IFLA_NETKIT_PEER_SCRUB to NETKIT_SCRUB_DEFAULT to ensure that the mark is always cleared before running cil_from_container and passing the packet on to the host namespace.

This PR relies on this change which adds support for these new flags to github.com/vishvananda/netlink.

Fixes: #35060
Fixes: #34042
Fixes: #33875
Fixes: 6895341 ("cilium, connector: Add netkit connector")

netkit: Fix issue where traffic originating from the host namespace fails to reach the pod when using endpoint routes and network policies.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 9, 2024
@jrife jrife changed the title netkit: Use NETKIT_SCRUB_NONE where possible netkit: Use NETKIT_SCRUB_NONE on primary device Oct 9, 2024
@jrife
Copy link
Copy Markdown
Contributor Author

jrife commented Oct 9, 2024

cc @borkmann . Tested this locally with a fresh kernel build and this change to github.com/vishvananda/netlink. The issues described in #34042 and #33875 seem to be fixed with this combination. I'm going to wait for that PR to get merged so that we can bump the module version in go.mod in this PR.

@jrife jrife force-pushed the jrife/netkit-scrub-netlink branch from eeb78f7 to 9afae90 Compare October 9, 2024 00:47
@borkmann
Copy link
Copy Markdown
Member

Looks great to me, thanks! Once the vishvananda/netlink#1022 PR gets merged, we need to pull in the go mod dependencies and then we can get this merged asap.

@borkmann borkmann added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Oct 15, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 15, 2024
@borkmann borkmann added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. needs-backport/1.16 labels Oct 15, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 15, 2024
@borkmann borkmann added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. feature/netkit labels Oct 15, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 15, 2024
@jrife jrife force-pushed the jrife/netkit-scrub-netlink branch 4 times, most recently from 48074fd to b6fc704 Compare October 22, 2024 03:58
When using endpoint routes with veth, cil_to_container runs before the
veth driver scrubs the packet (i.e. clears skb->mark). In netkit mode,
cil_to_container is executed after scrubbing the packet. This causes
traffic from originating from the host to be miscategorized leading to
cases like ciliumGH-34042 where enabling netkit causes liveness+readiness
probes to fail.

To address this, the IFLA_NETKIT_SCRUB and IFLA_NETKIT_PEER_SCRUB flags
were recently introduced to configure whether or not netkit scrubs the
packet when xmiting to the primary or peer devices, respectively. Set
IFLA_NETKIT_SCRUB to NETKIT_SCRUB_NONE to ensure that any mark set in
the host namespace is passed to cil_to_container when using netkit with
endpoint routes. Set IFLA_NETKIT_PEER_SCRUB to NETKIT_SCRUB_DEFAULT to
ensure that the mark is always cleared before running cil_from_container
and passing the packet on to the host namespace.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net/T/#u
Fixes: cilium#35060
Fixes: cilium#34042
Fixes: cilium#33875
Fixes: 6895341 ("cilium, connector: Add netkit connector")
@jrife jrife force-pushed the jrife/netkit-scrub-netlink branch from b6fc704 to e1f80c9 Compare October 22, 2024 04:06
@jrife
Copy link
Copy Markdown
Contributor Author

jrife commented Oct 22, 2024

vishvananda/netlink#1022 was merged and I updated this PR. I ended up leaving out the new test case from .github/workflows/test-e2e-upgrade.yaml, as I'll need to follow up and make sure there is a test image with the required netkit changes before enabling a test case with netkit+endpoint routes.

@jrife jrife marked this pull request as ready for review October 22, 2024 05:53
@jrife jrife requested review from a team as code owners October 22, 2024 05:53
@borkmann
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vendor changes LGTM

@borkmann
Copy link
Copy Markdown
Member

I ended up leaving out the new test case from .github/workflows/test-e2e-upgrade.yaml, as I'll need to follow up and make sure there is a test image with the required netkit changes before enabling a test case with netkit+endpoint routes.

Sounds good, the repo for CI builds is here https://github.com/cilium/little-vm-helper-images/commits/main/ , and I think cilium/little-vm-helper-images@ef9d087 should already have it included (the bpf tree one).

One more thing to follow-up apart from the e2e test addition is that the BPF program attached to the netkit primary device should also at the end clear prio+mark to 0, so that this doesn't go all the way into the Pod. Thanks @YutaroHayakawa for spotting!

@borkmann borkmann merged commit 1de792c into cilium:main Oct 22, 2024
borkmann added a commit that referenced this pull request Oct 25, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@tklauser tklauser mentioned this pull request Oct 25, 2024
7 tasks
borkmann added a commit that referenced this pull request Oct 25, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann added a commit that referenced this pull request Oct 25, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 labels Oct 25, 2024
borkmann added a commit that referenced this pull request Oct 29, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann added a commit that referenced this pull request Nov 4, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
borkmann added a commit that referenced this pull request Nov 4, 2024
Jordan fixed Cilium with netkit and per-endpoint-routes in #35306. Given
we have a more recent bpf image now, lets add it also to CI to regularly
test for regressions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Nov 4, 2024
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available when
applying network policy if per-endpoint routes are enabled. This was
fixed with a patch to the kernel driver [0] and an update to the Cilium
datapath connector [1].

It's still possible to run Cilium with netkit on a kernel that doesn't
support the new scrub attributes, which is confusing [2] and may still
result in strange behavior.

This commit introduces a runtime probe that will actively check for the
presence of scrub attribute support if running with per-endpoint routes.
If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.
- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available when
applying network policy if per-endpoint routes are enabled. This was
fixed with a patch to the kernel driver [0] and an update to the Cilium
datapath connector [1].

It's still possible to run Cilium with netkit on a kernel that doesn't
support the new scrub attributes, which is confusing [2] and may still
result in strange behavior.

This commit introduces a runtime probe that will actively check for the
presence of scrub attribute support if running with per-endpoint routes.
If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.
- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] that adds a new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of the new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support the
new scrub attributes. This is confusing [2] and may still result in
strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] that adds a new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of the new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support the
new scrub attributes. This is confusing [2] and may still result in
strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
ajmmm added a commit to ajmmm/cilium that referenced this pull request Mar 23, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] cilium#35306
[2] cilium#40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2026
Early versions of the netkit driver scrubbed skb metadata before BPF
programs are run, meaning identity metadata is not available in BPF.
This can cause network policy mis-classification if per-endpoint routes
are enabled.

A kernel patch landed in the driver [0] which adds new scrub attributes
to manage this, and a subsequent change was made in the Cilium datapath
connector [1] to make use of these new attributes. However, it's still
possible to run Cilium with netkit on a kernel that doesn't support
them, which is confusing [2] and may still result in strange behavior.

This commit introduces a runtime probe that will actively probe the
underlying host for netkit scrub attributes, if running with per-endpoint
routes. If the probe fails, we error out.

- With bpf.datapathMode={netkit,netkit-l2} this will error and the agent
  will fail to start.

- with bpf.datapathMode=auto, this will log a warning and fall-back to
  veth connector.

This commit also introduces additional test permutations to cover this.

[0] https://lore.kernel.org/bpf/20241004101335.117711-1-daniel@iogearbox.net
[1] #35306
[2] #40021

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. feature/netkit release-note/bug This PR fixes an issue in a previous release of Cilium. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

None yet

4 participants