Skip to content

bpf: clear mark content before storing the cluster ID#43159

Merged
giorio94 merged 1 commit intocilium:mainfrom
giorio94:mio/bpf-cluster-id-mark
Dec 8, 2025
Merged

bpf: clear mark content before storing the cluster ID#43159
giorio94 merged 1 commit intocilium:mainfrom
giorio94:mio/bpf-cluster-id-mark

Conversation

@giorio94
Copy link
Copy Markdown
Member

@giorio94 giorio94 commented Dec 5, 2025

Currently, the ctx_set_cluster_id_mark helper does not clear the mark before storing the cluster ID. However, the resulting value is not correct in case the same portions of the mark did already contain some value. For instance, this can happen if set_identity_mark got called before, which is now the case since 2660242 ("bpf: lxc: always set identity mark on forwarded egressing traffic").

Let's get this fixed by explicitly masking the mark before storing the cluster ID. Rather than wiping out the entire content, we preserve the "magic" part, which is not expected to interfere.

Set the backport/affect labels coherently with #42551.

@giorio94 giorio94 requested a review from brb December 5, 2025 13:33
@giorio94 giorio94 added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. affects/v1.16 This issue affects v1.16 branch affects/v1.17 This issue affects v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Dec 5, 2025
@giorio94 giorio94 force-pushed the mio/bpf-cluster-id-mark branch from 647bd35 to a0502cb Compare December 5, 2025 13:34
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Dec 5, 2025

/test

@brb
Copy link
Copy Markdown
Member

brb commented Dec 5, 2025

MBOI @julianwiedmann

@brb
Copy link
Copy Markdown
Member

brb commented Dec 5, 2025

Changes LGTM. Could you apply the following:

diff --git a/bpf/tests/inter_cluster_snat_clusterip_client_lxc.c b/bpf/tests/inter_cluster_snat_clusterip_client_lxc.c
index 378cbb4dc0..846071d1b5 100644
--- a/bpf/tests/inter_cluster_snat_clusterip_client_lxc.c
+++ b/bpf/tests/inter_cluster_snat_clusterip_client_lxc.c
@@ -60,6 +60,8 @@
 /* Set the LXC source address to be the address of pod one */
 ASSIGN_CONFIG(union v4addr, endpoint_ipv4, { .be32 = CLIENT_IP})

+ASSIGN_CONFIG(__u32, security_label, 0x10042)
+
 #include "lib/ipcache.h"
 #include "lib/lb.h"
 #include "lib/policy.h"
@@ -158,6 +160,7 @@ int lxc_to_overlay_syn_check(struct __ctx_buff *ctx)
        struct iphdr *l3;
        struct ipv4_ct_tuple tuple;
        struct ct_entry *entry;
+       __u32 cluster_id;

        test_init();

@@ -232,6 +235,11 @@ int lxc_to_overlay_syn_check(struct __ctx_buff *ctx)
        if (!entry)
                test_fatal("couldn't find egress conntrack entry");

+       cluster_id = ctx_get_cluster_id_mark(ctx);
+       if (cluster_id != BACKEND_CLUSTER_ID)
+               test_fatal("ctx->mark cluster_id should be %u, got %u",
+                          BACKEND_CLUSTER_ID, cluster_id);
+
        test_finish();

Currently, it fails when running on the main branch with make run_bpf_tests BPF_TEST="inter_cluster_snat_clusterip_client_lxc":


--- FAIL: TestBPF/inter_cluster_snat_clusterip_client_lxc.o/01_lxc_to_overlay_syn (0.00s)

    bpf_test.go:558: inter_cluster_snat_clusterip_client_lxc.c:241: ctx->mark cluster_id should be 2, got 3

Currently, the ctx_set_cluster_id_mark helper does not clear the mark
before storing the cluster ID. However, the resulting value is not
correct in case the same portions of the mark did already contain
some value. For instance, this can happen if set_identity_mark got
called before, which is now the case since 2660242 ("bpf: lxc:
always set identity mark on forwarded egressing traffic").

Let's get this fixed by explicitly masking the mark before storing
the cluster ID. Rather than wiping out the entire content, we preserve
the "magic" part, which is not expected to interfere.

Additionally, let's extend the related BPF test to assert that we
correctly propagate the destination clusterID to bpf overlay (thanks
Martynas).

Co-authored-by: Martynas Pumputis <martynas@isovalent.com>
Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/bpf-cluster-id-mark branch from a0502cb to 8c9738c Compare December 5, 2025 15:50
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Dec 5, 2025

Could you apply the following:

Added, thanks a lot!

@giorio94 giorio94 marked this pull request as ready for review December 5, 2025 15:50
@giorio94 giorio94 requested a review from a team as a code owner December 5, 2025 15:50
@giorio94
Copy link
Copy Markdown
Member Author

giorio94 commented Dec 5, 2025

/test

@giorio94 giorio94 enabled auto-merge December 5, 2025 18:00
Copy link
Copy Markdown
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@giorio94 giorio94 added this pull request to the merge queue Dec 8, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 8, 2025
Merged via the queue into cilium:main with commit 5313b0d Dec 8, 2025
77 checks passed
@giorio94 giorio94 deleted the mio/bpf-cluster-id-mark branch December 8, 2025 08:42
@julianwiedmann
Copy link
Copy Markdown
Member

Rather than wiping out the entire content, we preserve the "magic" part, which is not expected to interfere.

@giorio94
I think that's incorrect - you should be absolutely making sure that the magic part is set to exactly the desired value. And that the rest of the skb->mark is consistent with the semantics of that magic value.

Looking at ctx_get_cluster_id_mark(), this actually feels like the root cause of the problem - this should be using MARK_MAGIC_HOST_MASK as mask, to avoid aliasing with bit-wise overlapping magic values. Right now it's extracting a cluster_id from MARK_MAGIC_IDENTITY marks (and many other magic values), when it should bail out instead.

@giorio94
Copy link
Copy Markdown
Member Author

Rather than wiping out the entire content, we preserve the "magic" part, which is not expected to interfere.

@giorio94 I think that's incorrect - you should be absolutely making sure that the magic part is set to exactly the desired value. And that the rest of the skb->mark is consistent with the semantics of that magic value.

Looking at ctx_get_cluster_id_mark(), this actually feels like the root cause of the problem - this should be using MARK_MAGIC_HOST_MASK as mask, to avoid aliasing with bit-wise overlapping magic values. Right now it's extracting a cluster_id from MARK_MAGIC_IDENTITY marks (and many other magic values), when it should bail out instead.

Thanks a lot for the feedback. I've took a stab at fixing this in #43258. Let me know if that makes sense to you (I don't have much context in this area).

@julianwiedmann julianwiedmann added the area/clustermesh Relates to multi-cluster routing functionality in Cilium. label Dec 11, 2025
@kaworu kaworu mentioned this pull request Dec 12, 2025
7 tasks
@kaworu kaworu added the backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. label Dec 12, 2025
@kaworu kaworu removed the needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch label Dec 12, 2025
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. labels Dec 17, 2025
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects/v1.16 This issue affects v1.16 branch affects/v1.17 This issue affects v1.17 branch area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

5 participants