-
Notifications
You must be signed in to change notification settings - Fork 3.7k
CFP: Allow opting CiliumPodIPPools out from SNAT / masquerading #40131
Description
Cilium Feature Proposal
Is your proposed feature related to a problem?
Hello friends,
We're running the 1.18.0-pre.3 release, and trying to combine tunnel mode with multi-pool IPAM.
We use BGP control plane to advertise routes to pods from a certain pool. This pool is globally routable. The reason for having this globally routable pool is to give predictable and controlled IPs to certain workloads.
On the other hand, pods from the default pool are not routable and should still be SNATted to the host IP and tunneled like normal.
Pods from both pools might reach the same destination CIDRs and ipv4-native-routing-cidr only works on destination IPs, so we have the option of:
- default pool pods not working because they aren't masqueraded
- routable pool pods being masqueraded and losing the functionality we desire from having multiple pools in the first place
We can reach the pods on their BGP advertised IP, but when they are egressing they are using the host IP instead.
Describe the feature you'd like
This can be solved in one of two ways:
- an additional flag like ipv4-native-routing-cidr, which defines the source IPs that are not SNATted
- annotation or label on CiliumPodIPPool resources that opt-out all CIDRs associated with that pool from SNAT
This way we can control which pods have SNAT/masquerade enabled based on the pod IP rather than the destination IP.
The second solution would be more complex but allows more flexibility and ease of use.
(Optional) Describe your proposed solution
Option 1:
Create a new var IPV4_SNAT_EXCLUSION_SRC_CIDR similar to IPV4_SNAT_EXCLUSION_DST_CIDR, which is set via cilium config
Add similar handling in nat.h based on SRC_CIDR - if source address in this cidr, NAT_PUNT_TO_STACK
Could also implement it slightly differently than the DST_CIDR by using an LPM trie map to support multiple prefixes.
Option 2:
Watch CiliumPodIPPools and populate a trie map with CIDRs from pools that have a certain annotation (like snat.cilium.io/exclude). Handle changes to pools or annotations on pools and update map accordingly.
BPF code would differ slightly from option 1 as it would be based on trie map lookups.
Somewhat handwaving ipv6 and iptables implementations for now...
Looking forward to hearing your feedback, thanks.