Skip to content

lb: Hybrid-DSR using the annotation-based dispatch mode#42788

Merged
julianwiedmann merged 2 commits intocilium:mainfrom
saiaunghlyanhtet:pr/sai/hybrid-dsr-annotation-based
Dec 1, 2025
Merged

lb: Hybrid-DSR using the annotation-based dispatch mode#42788
julianwiedmann merged 2 commits intocilium:mainfrom
saiaunghlyanhtet:pr/sai/hybrid-dsr-annotation-based

Conversation

@saiaunghlyanhtet
Copy link
Copy Markdown
Member

(see the commit desc)

Fixes: #41690

Hybrid-DSR using the annotation-based dispatch mode

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 15, 2025
@saiaunghlyanhtet saiaunghlyanhtet marked this pull request as ready for review November 15, 2025 07:03
@saiaunghlyanhtet saiaunghlyanhtet requested review from a team as code owners November 15, 2025 07:03
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Thank you! I added some minor comments below.

For testing, I think we want both controlplane tests (see pkg/loadbalancer/tests) to demonstrate that we're producing the expected frontend entries. And BPF-level tests (see bpf/tests) which call nodeport_uses_dsr4() / nodeport_uses_dsr6() with a fake SVC entry.

@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from 192a56b to 96584c8 Compare November 17, 2025 12:03
@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from 96584c8 to e017301 Compare November 19, 2025 01:25
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, thank you! One consideration would be whether we want a full-blown controlplane test that validates the expected BPF map entries (examples in pkg/loadbalancer/tests/testdata/) when hybrid mode is enabled.

We'll also need a quick rebase to latest main, sorry.

@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

saiaunghlyanhtet commented Nov 26, 2025

One consideration would be whether we want a full-blown controlplane test that validates the expected BPF map entries (examples in pkg/loadbalancer/tests/testdata/) when hybrid mode is enabled.

As I am not that familiar with it, would the following be ok if we add a full-blown test? @julianwiedmann

#! --lb-test-fault-probability=0.0 --bpf-lb-mode=hybrid

# Start the test application
hive start

# Add TCP and UDP services
k8s/add service-tcp.yaml endpointslice-tcp.yaml
k8s/add service-udp.yaml endpointslice-udp.yaml
db/cmp frontends frontends.table
db/cmp services services.table

# Check maps - TCP should have DSR flag, UDP should not
lb/maps-dump maps.actual
* cmp maps.actual maps.expected

#####

-- services.table --
Name          Source   TrafficPolicy   Flags
test/echo-tcp k8s      Cluster
test/echo-udp k8s      Cluster

-- frontends.table --
Address               Type         ServiceName    PortName   Status  Backends
0.0.0.0:30781/TCP     NodePort     test/echo-tcp  http       Done    10.244.1.1:80/TCP
0.0.0.0:30782/UDP     NodePort     test/echo-udp  dns        Done    10.244.1.2:53/UDP
10.96.50.104:80/TCP   ClusterIP    test/echo-tcp  http       Done    10.244.1.1:80/TCP
10.96.50.105:53/UDP   ClusterIP    test/echo-udp  dns        Done    10.244.1.2:53/UDP

-- service-tcp.yaml --
apiVersion: v1
kind: Service
metadata:
  name: echo-tcp
  namespace: test
spec:
  clusterIP: 10.96.50.104
  clusterIPs:
  - 10.96.50.104
  ports:
  - name: http
    nodePort: 30781
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    name: echo-tcp
  sessionAffinity: None
  type: NodePort

-- endpointslice-tcp.yaml --
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  labels:
    kubernetes.io/service-name: echo-tcp
  name: echo-tcp-kvlm2
  namespace: test
addressType: IPv4
endpoints:
- addresses:
  - 10.244.1.1
  nodeName: nodeport-worker
ports:
- name: http
  port: 80
  protocol: TCP

-- service-udp.yaml --
apiVersion: v1
kind: Service
metadata:
  name: echo-udp
  namespace: test
spec:
  clusterIP: 10.96.50.105
  clusterIPs:
  - 10.96.50.105
  ports:
  - name: dns
    nodePort: 30782
    port: 53
    protocol: UDP
    targetPort: 53
  selector:
    name: echo-udp
  sessionAffinity: None
  type: NodePort

-- endpointslice-udp.yaml --
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  labels:
    kubernetes.io/service-name: echo-udp
  name: echo-udp-abc123
  namespace: test
addressType: IPv4
endpoints:
- addresses:
  - 10.244.1.2
  nodeName: nodeport-worker
ports:
- name: dns
  port: 53
  protocol: UDP

-- maps.expected --
BE: ID=1 ADDR=10.244.1.1:80/TCP STATE=active
BE: ID=2 ADDR=10.244.1.2:53/UDP STATE=active
REV: ID=1 ADDR=0.0.0.0:30781
REV: ID=2 ADDR=10.96.50.104:80
REV: ID=3 ADDR=0.0.0.0:30782
REV: ID=4 ADDR=10.96.50.105:53
SVC: ID=0 ADDR=10.96.50.104:0/ANY SLOT=0 LBALG=undef AFFTimeout=0 COUNT=0 QCOUNT=0 FLAGS=ClusterIP+non-routable
SVC: ID=0 ADDR=10.96.50.105:0/ANY SLOT=0 LBALG=undef AFFTimeout=0 COUNT=0 QCOUNT=0 FLAGS=ClusterIP+non-routable
SVC: ID=1 ADDR=0.0.0.0:30781/TCP SLOT=0 LBALG=undef AFFTimeout=0 COUNT=1 QCOUNT=0 FLAGS=NodePort+non-routable+dsr
SVC: ID=1 ADDR=0.0.0.0:30781/TCP SLOT=1 BEID=1 COUNT=0 QCOUNT=0 FLAGS=NodePort+non-routable+dsr
SVC: ID=2 ADDR=10.96.50.104:80/TCP SLOT=0 LBALG=undef AFFTimeout=0 COUNT=1 QCOUNT=0 FLAGS=ClusterIP+non-routable+dsr
SVC: ID=2 ADDR=10.96.50.104:80/TCP SLOT=1 BEID=1 COUNT=0 QCOUNT=0 FLAGS=ClusterIP+non-routable+dsr
SVC: ID=3 ADDR=0.0.0.0:30782/UDP SLOT=0 LBALG=undef AFFTimeout=0 COUNT=1 QCOUNT=0 FLAGS=NodePort+non-routable
SVC: ID=3 ADDR=0.0.0.0:30782/UDP SLOT=1 BEID=2 COUNT=0 QCOUNT=0 FLAGS=NodePort+non-routable
SVC: ID=4 ADDR=10.96.50.105:53/UDP SLOT=0 LBALG=undef AFFTimeout=0 COUNT=1 QCOUNT=0 FLAGS=ClusterIP+non-routable
SVC: ID=4 ADDR=10.96.50.105:53/UDP SLOT=1 BEID=2 COUNT=0 QCOUNT=0 FLAGS=ClusterIP+non-routable

Re-implement Hybrid-DSR mode to use the same flag-based mechanism
as annotation mode, eliminating duplicate datapath logic.

Changes:
- Update ToSVCForwardingMode() to accept optional protocol parameter
- For LBModeHybrid, return DSR mode for TCP (proto 6), SNAT otherwise
- Pass protocol to ToSVCForwardingMode() in BPF reconciler
- Add tests to demonstrate that we're producing the expected frontend entries.

This allows the control plane to set SVC_FLAG_FWD_MODE_DSR on a
per-frontend basis for TCP services in Hybrid mode, while the
datapath uses the same flag-checking logic for all selective DSR
modes.

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
@julianwiedmann julianwiedmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Nov 26, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 26, 2025
@julianwiedmann julianwiedmann added feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. area/kpr Anything related to our kube-proxy replacement. labels Nov 26, 2025
@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from e017301 to 3ecd7b6 Compare November 26, 2025 13:22
@julianwiedmann
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you!

@julianwiedmann
Copy link
Copy Markdown
Member

note that when grep'ing for ENABLE_DSR_HYBRID there's a bunch more hits in the datapath, build & complexity configs.

The build & complexity tests we can clean up separately to avoid tons of little conflicts, but the remaining datapath parts need fixing.

@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from 3ecd7b6 to f60f836 Compare November 27, 2025 11:41
@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

@julianwiedmann

the remaining datapath parts need fixing.

I have fixed the remaining parts in bpf_xdp.c and nodeport_egress.h. To make sure I do not miss anything, can you have a look at it again?

@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

/test

@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

Is failing test needed to retrigger as it only fails on arm? @julianwiedmann

@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from f60f836 to ce6d71d Compare November 27, 2025 13:29
Remove the protocol-based DSR decision logic from the datapath.
With Hybrid-DSR now using the annotation-based infrastructure,
the datapath only needs to check the SVC_FLAG_FWD_MODE_DSR flag
instead of examining both the protocol type and mode.

Changes:
- Remove ENABLE_DSR_HYBRID conditional from nodeport_uses_dsr()
- Replace ENABLE_DSR_HYBRID with ENABLE_DSR_BYUSER in conditional
  compilation directives for SNAT fallback paths
- GENEVE decapsulation for DSR now only happens when DSR is
  in by-user (flag-based) mode
- SNAT tail call now happens when DSR is disabled OR
  when DSR is not in by-user mode (i.e., in hybrid mode)
- Update SVC_FLAG_FWD_MODE_DSR comment to reflect general usage
- Set ENABLE_DSR_BYUSER for both Hybrid and annotation modes
- Add hybrid_dsr_test

This consolidates selective DSR modes (Hybrid and annotation-based)
to use a single code path in the datapath.

Signed-off-by: saiaunghlyanhtet <saiaunghlyanhtet2003@gmail.com>
@saiaunghlyanhtet saiaunghlyanhtet force-pushed the pr/sai/hybrid-dsr-annotation-based branch from ce6d71d to ce30277 Compare November 28, 2025 00:29
@julianwiedmann
Copy link
Copy Markdown
Member

/test

@julianwiedmann julianwiedmann added this pull request to the merge queue Dec 1, 2025
Merged via the queue into cilium:main with commit d83509a Dec 1, 2025
74 of 75 checks passed
@julianwiedmann
Copy link
Copy Markdown
Member

@saiaunghlyanhtet thank you for the contribution! When you have a minute, could please switch over all the remaining ENABLE_DSR_HYBRID bits in the tests to say ENABLE_DSR_BYUSER instead?

@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

@saiaunghlyanhtet thank you for the contribution! When you have a minute, could please switch over all the remaining ENABLE_DSR_HYBRID bits in the tests to say ENABLE_DSR_BYUSER instead?

Yes, I will do it.

@saiaunghlyanhtet
Copy link
Copy Markdown
Member Author

I raised the pr @julianwiedmann .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/kpr Anything related to our kube-proxy replacement. area/loadbalancing Impacts load-balancing and Kubernetes service implementations feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. release-note/misc This PR makes changes that have no direct user impact.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

loadbalancer: implement Hybrid-DSR using the annotation-based dispatch mode

3 participants