Skip to content

CFP: Allow opting CiliumPodIPPools out from SNAT / masquerading #40131

@alimehrabikoshki

Description

@alimehrabikoshki

Cilium Feature Proposal

Is your proposed feature related to a problem?

Hello friends,

We're running the 1.18.0-pre.3 release, and trying to combine tunnel mode with multi-pool IPAM.

We use BGP control plane to advertise routes to pods from a certain pool. This pool is globally routable. The reason for having this globally routable pool is to give predictable and controlled IPs to certain workloads.

On the other hand, pods from the default pool are not routable and should still be SNATted to the host IP and tunneled like normal.

Pods from both pools might reach the same destination CIDRs and ipv4-native-routing-cidr only works on destination IPs, so we have the option of:

  1. default pool pods not working because they aren't masqueraded
  2. routable pool pods being masqueraded and losing the functionality we desire from having multiple pools in the first place

We can reach the pods on their BGP advertised IP, but when they are egressing they are using the host IP instead.

Describe the feature you'd like

This can be solved in one of two ways:

  1. an additional flag like ipv4-native-routing-cidr, which defines the source IPs that are not SNATted
  2. annotation or label on CiliumPodIPPool resources that opt-out all CIDRs associated with that pool from SNAT

This way we can control which pods have SNAT/masquerade enabled based on the pod IP rather than the destination IP.

The second solution would be more complex but allows more flexibility and ease of use.

(Optional) Describe your proposed solution

Option 1:
Create a new var IPV4_SNAT_EXCLUSION_SRC_CIDR similar to IPV4_SNAT_EXCLUSION_DST_CIDR, which is set via cilium config

Add similar handling in nat.h based on SRC_CIDR - if source address in this cidr, NAT_PUNT_TO_STACK

Could also implement it slightly differently than the DST_CIDR by using an LPM trie map to support multiple prefixes.

Option 2:

Watch CiliumPodIPPools and populate a trie map with CIDRs from pools that have a certain annotation (like snat.cilium.io/exclude). Handle changes to pools or annotations on pools and update map accordingly.

BPF code would differ slightly from option 1 as it would be based on trie map lookups.

Somewhat handwaving ipv6 and iptables implementations for now...

Looking forward to hearing your feedback, thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/cfpCilium Feature Proposalkind/featureThis introduces new functionality.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions