bpf: egw: fix over-eager matching of EGW policy against reply traffic#36929
Merged
julianwiedmann merged 2 commits intocilium:mainfrom Jan 14, 2025
Merged
bpf: egw: fix over-eager matching of EGW policy against reply traffic#36929julianwiedmann merged 2 commits intocilium:mainfrom
julianwiedmann merged 2 commits intocilium:mainfrom
Conversation
8bffa19 to
9a1a2e2
Compare
Member
Author
|
/test |
Member
Author
|
(I'll also want to cook a BPF-level test for this problem) |
9a1a2e2 to
d243f65
Compare
The code to handle EGW reply traffic lives in the nodeport ingress path, after a packet has been RevSNATed. In contrast to the outbound path, it doesn't consider whether the packet's source is an in-cluster entity. This becomes problematic when the EGW policy specifies a broad-range `destinationCIDR`, such as 0.0.0.0/0, and accidentally matches in-cluster service traffic. More precisely: * a client pod matches an EGW policy of the form "from $client_pod to 0.0.0.0/0, exit via gateway nodeB" * client_pod accesses a nodeport service on gateway nodeB, where the request is DNATed to a remote backend, and SNATed to nodeB's IP. * once the backend's reply reaches nodeB, it is revSNATed to client_pod. At this point the packet matches the EGW policy entry, and is thus redirected into the overlay network and forwarded to client_pod. * As the RevDNAT step has been skipped, client_pod rejects this packet from an unknown source. Fix this by applying the RevDNAT logic *prior* to looking for EGW policy matches. Double-checking whether the packet's origin is an in-cluster entity might still make sense, but no longer strictly necessary. From a tcpdump on a kind cluster (with EGW, KPR without SocketLB and native routing): ``` # client accesses nodeport 31936 on remote LB node IP 10.244.1.236.35556 > 172.18.0.2.31936: Flags [S] # request is DNATed + SNATed, and forwarded to remote backend IP 172.18.0.2.35556 > 10.244.3.121.80: Flags [S] # backend's reply reaches the LB node IP 10.244.3.121.80 > 172.18.0.2.35556: Flags [S.] # LB node applies revSNAT, adds VXLAN encapsulation and forwards to client IP 172.18.0.2.54624 > 172.18.0.4.8472: OTV, flags [I] (0x08), overlay 0, instance 6 IP 10.244.3.121.80 > 10.244.1.236.35556: Flags [S.] # client's RST to the backend IP 10.244.1.236.35556 > 10.244.3.121.80: Flags [R] ``` Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Reply traffic for an SNATed in-cluster nodeport connection should not be captured by a catch-all EGW policy. Otherwise the packet reaches the client without the source being RevDNATed back to the nodeport, and thus resulting in a RST by the client. This adds testing for the scenario that was fixed in the preceding commit. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
d243f65 to
97d8d85
Compare
Member
Author
|
/test |
45 tasks
19 tasks
julianwiedmann
added a commit
that referenced
this pull request
Mar 13, 2026
This removes the test introduced by a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part of #36929. The test targeted the scenario of an in-cluster client which accesses a NodePort service on a remote node. But with the changes from #44507, this type of access no longer exists, as the traffic will always get DNATed at the source node. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 13, 2026
This removes the test introduced by a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part of #36929. The test targeted the scenario of an in-cluster client which accesses a NodePort service on a remote node. But with the changes from #44507, this type of access no longer exists, as the traffic will always get DNATed at the source node. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
javiercardona-work
pushed a commit
to javiercardona-work/cilium
that referenced
this pull request
Mar 18, 2026
This removes the test introduced by a899c4a ("bpf: test: cover NAT LB with catch-all EGW policy"), as part of cilium#36929. The test targeted the scenario of an in-cluster client which accesses a NodePort service on a remote node. But with the changes from cilium#44507, this type of access no longer exists, as the traffic will always get DNATed at the source node. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The code to handle EGW reply traffic lives in the nodeport ingress path, after a packet has been RevSNATed. In contrast to the outbound path, it doesn't consider whether the packet's source is an in-cluster entity.
This becomes problematic when the EGW policy specifies a broad-range
destinationCIDR, such as 0.0.0.0/0, and accidentally matches in-cluster service traffic. More precisely:Fix this by applying the RevDNAT logic prior to looking for EGW policy matches. Double-checking whether the packet's origin is an in-cluster entity might still make sense, but no longer strictly necessary.
From a tcpdump on a kind cluster (with EGW, KPR without SocketLB and native routing):