Skip to content

bpf: lxc: also handle non-DSR Nodeport services in per-packet LB path#44507

Merged
julianwiedmann merged 1 commit intomainfrom
pr/jwi/main/lxc-nodeport
Feb 26, 2026
Merged

bpf: lxc: also handle non-DSR Nodeport services in per-packet LB path#44507
julianwiedmann merged 1 commit intomainfrom
pr/jwi/main/lxc-nodeport

Conversation

@julianwiedmann
Copy link
Copy Markdown
Member

#41963 added support for wildcard matching of Nodeport services in bpf_lxc's per-packet LB path. When accessing a DSR Nodeport service on a remote node, the LB now happens at the source. This addressed the scenario where client and backend sit on the same node, and therefore reply traffic would never pass through a native interface hook for RevDNAT.

Extend this logic to also cover non-DSR Nodeport services. They are not affected by the mentioned bug (since replies would first go back to the intermediate LB node) - but doing the LB at the source still provides benefits. It allows for precise network policy, and saves resources on the intermediate node.

In detail the Network policy implications are:

  1. client egress policy needs to allows access to the service's backends. But no longer needs to allow access to remote nodes.
  2. backend ingress policy needs to allow access by client pods. But no longer needs to allow WORLD for serving in-cluster clients.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 24, 2026
@julianwiedmann julianwiedmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/loadbalancing Impacts load-balancing and Kubernetes service implementations area/kpr Anything related to our kube-proxy replacement. labels Feb 24, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 24, 2026
@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann julianwiedmann added the dont-merge/preview-only Only for preview or testing, don't merge it. label Feb 24, 2026
@julianwiedmann julianwiedmann marked this pull request as ready for review February 24, 2026 14:08
@julianwiedmann julianwiedmann requested a review from a team as a code owner February 24, 2026 14:08
@julianwiedmann
Copy link
Copy Markdown
Member Author

This will need some trivial changes to the upgrade notes (maybe docs too), but the code changes should be good for review already.

Copy link
Copy Markdown
Member

@gyutaeb gyutaeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me! The integration of the test case is well done. Also, thank you for explaining the Network policy implications; I learned a lot from it. I'm glad to see the structure improving.

@julianwiedmann
Copy link
Copy Markdown
Member Author

Everything looks good to me! The integration of the test case is well done. Also, thank you for explaining the Network policy implications; I learned a lot from it. I'm glad to see the structure improving.

FWIW, I don't see the Network policy aspects as important. Folks would already have the "correct" policy in place to support access via ClusterIP - it's just that now they can remove the slightly awkward tolerations for egressing to REMOTE_NODE and ingressing from WORLD (and WORLD would still be required to support proper N/S access to the Nodeport service).

#41963 added support for wildcard
matching of Nodeport services in bpf_lxc's per-packet LB path. When
accessing a DSR Nodeport service on a remote node, the LB now happens at
the source. This addressed the scenario where client and backend sit on the
same node, and therefore reply traffic would never pass through a native
interface hook for RevDNAT.

Extend this logic to also cover non-DSR Nodeport services. They are not
affected by the mentioned bug (since replies would first go back to the
intermediate LB node) - but doing the LB at the source still provides
benefits. It allows for precise network policy, and saves resources on the
intermediate node.

In detail the Network policy implications are:
1. client egress policy needs to allows access to the service's backends.
   But no longer needs to allow access to remote nodes.
2. backend ingress policy needs to allow access by client pods. But no
   longer needs to allow WORLD for serving in-cluster clients.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann force-pushed the pr/jwi/main/lxc-nodeport branch from 8bcffd2 to 7fd30ed Compare February 25, 2026 14:44
@julianwiedmann julianwiedmann requested a review from a team as a code owner February 25, 2026 14:44
@julianwiedmann julianwiedmann removed the dont-merge/preview-only Only for preview or testing, don't merge it. label Feb 25, 2026
@julianwiedmann
Copy link
Copy Markdown
Member Author

now also added docs @gyutaeb I believe this roughly also applies to the DSR scenario, so I kept it generic for all NodePort-type services.

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@qmonnet qmonnet added the upgrade-impact This PR has potential upgrade or downgrade impact. label Feb 25, 2026
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack for doc change, I haven't looked at the datapath change in details though.

Copy link
Copy Markdown
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Cleans things up well.

@gyutaeb
Copy link
Copy Markdown
Member

gyutaeb commented Feb 26, 2026

now also added docs @gyutaeb I believe this roughly also applies to the DSR scenario, so I kept it generic for all NodePort-type services.

Thanks for adding the docs! I agree that making it generic for all NodePort-type services is the right approach. LGTM.

@julianwiedmann julianwiedmann added this pull request to the merge queue Feb 26, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 26, 2026
Merged via the queue into main with commit ad3b432 Feb 26, 2026
817 of 845 checks passed
@julianwiedmann julianwiedmann deleted the pr/jwi/main/lxc-nodeport branch February 26, 2026 07:22
julianwiedmann added a commit that referenced this pull request Mar 13, 2026
This removes the test introduced by
a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part
of #36929.

The test targeted the scenario of an in-cluster client which accesses a
NodePort service on a remote node. But with the changes from
#44507, this type of access no longer
exists, as the traffic will always get DNATed at the source node.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 13, 2026
This removes the test introduced by
a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part
of #36929.

The test targeted the scenario of an in-cluster client which accesses a
NodePort service on a remote node. But with the changes from
#44507, this type of access no longer
exists, as the traffic will always get DNATed at the source node.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
javiercardona-work pushed a commit to javiercardona-work/cilium that referenced this pull request Mar 18, 2026
This removes the test introduced by
a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part
of cilium#36929.

The test targeted the scenario of an in-cluster client which accesses a
NodePort service on a remote node. But with the changes from
cilium#44507, this type of access no longer
exists, as the traffic will always get DNATed at the source node.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
gyutaeb pushed a commit to gyutaeb/cilium that referenced this pull request Mar 25, 2026
[ upstream commit c3238e8 ]

[ backporter's notes: added lb4/6_svc_is_nodeport() check before the
  existing nodeport_uses_dsr4/6() check. On main, nodeport_uses_dsr4/6()
  was removed by PR cilium#44507, but that PR is not backported here. ]

After the wildcard NodePort lookup in the per-packet LB path, verify
that the returned service actually has the SVC_FLAG_NODEPORT flag set.
Without this check, a non-NodePort service entry (e.g. HostPort) that
happens to match the wildcard port could be incorrectly treated as a
NodePort service.

This matches SocketLB's sock4_wildcard_lookup_full() /
sock6_wildcard_lookup_full() behavior, which guards the wildcard result
with lb4_svc_is_nodeport() / lb6_svc_is_nodeport().

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
gyutaeb pushed a commit to gyutaeb/cilium that referenced this pull request Mar 25, 2026
[ upstream commit 3e363a8 ]

[ backporter's notes: applied to tc_lxc_lb{4,6}_dsr_nodeport.c and
  tc_lxc_lb{4,6}_hybrid_dsr_nodeport.c instead of tc_lxc_lb{4,6}_nodeport.c,
  as PR cilium#44507 which renamed these files is not backported. ]

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
gyutaeb pushed a commit to gyutaeb/cilium that referenced this pull request Mar 26, 2026
[ upstream commit 3e363a8 ]

[ backporter's notes: applied to tc_lxc_lb{4,6}_dsr_nodeport.c and
  tc_lxc_lb{4,6}_hybrid_dsr_nodeport.c instead of tc_lxc_lb{4,6}_nodeport.c,
  as PR cilium#44507 which renamed these files is not backported. ]

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
gyutaeb pushed a commit to gyutaeb/cilium that referenced this pull request Mar 26, 2026
[ upstream commit c3238e8 ]

[ backporter's notes: added lb4/6_svc_is_nodeport() check before the
  existing nodeport_uses_dsr4/6() check. On main, nodeport_uses_dsr4/6()
  was removed by PR cilium#44507, but that PR is not backported here. ]

After the wildcard NodePort lookup in the per-packet LB path, verify
that the returned service actually has the SVC_FLAG_NODEPORT flag set.
Without this check, a non-NodePort service entry (e.g. HostPort) that
happens to match the wildcard port could be incorrectly treated as a
NodePort service.

This matches SocketLB's sock4_wildcard_lookup_full() /
sock6_wildcard_lookup_full() behavior, which guards the wildcard result
with lb4_svc_is_nodeport() / lb6_svc_is_nodeport().

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2026
[ upstream commit 3e363a8 ]

[ backporter's notes: applied to tc_lxc_lb{4,6}_dsr_nodeport.c and
  tc_lxc_lb{4,6}_hybrid_dsr_nodeport.c instead of tc_lxc_lb{4,6}_nodeport.c,
  as PR #44507 which renamed these files is not backported. ]

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2026
[ upstream commit c3238e8 ]

[ backporter's notes: added lb4/6_svc_is_nodeport() check before the
  existing nodeport_uses_dsr4/6() check. On main, nodeport_uses_dsr4/6()
  was removed by PR #44507, but that PR is not backported here. ]

After the wildcard NodePort lookup in the per-packet LB path, verify
that the returned service actually has the SVC_FLAG_NODEPORT flag set.
Without this check, a non-NodePort service entry (e.g. HostPort) that
happens to match the wildcard port could be incorrectly treated as a
NodePort service.

This matches SocketLB's sock4_wildcard_lookup_full() /
sock6_wildcard_lookup_full() behavior, which guards the wildcard result
with lb4_svc_is_nodeport() / lb6_svc_is_nodeport().

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/kpr Anything related to our kube-proxy replacement. area/loadbalancing Impacts load-balancing and Kubernetes service implementations ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. upgrade-impact This PR has potential upgrade or downgrade impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants