Skip to content

PurgeOrphanNATEntries removed valid NAT entries #44727

@brb

Description

@brb

TL;DR PurgeOrphanNATEntreis removed NAT entries which had a corresponding CT entry.

I applied the following patch to log the orphan NAT entries removal and to trigger it upon each GC cycle:

patch
diff --git a/pkg/maps/ctmap/ctmap.go b/pkg/maps/ctmap/ctmap.go
index 3ff4a33a39a..98abe5f20f9 100644
--- a/pkg/maps/ctmap/ctmap.go
+++ b/pkg/maps/ctmap/ctmap.go
@@ -672,11 +672,15 @@ func PurgeOrphanNATEntries(ctMapTCP, ctMapAny *Map) *NatGCStats {
                natMap.Logger.Error("NATmap dump failed during GC", logfields.Error, err)
        } else {
                for _, key := range egressEntriesToDelete {
+                       natMap.Logger.Debug("Deleting orphan egress NAT entry",
+                               "key", key.ToHost())
                        if deleted, _ := natMap.Delete(key); deleted {
                                stats.EgressDeleted++
                        }
                }
                for _, key := range ingressEntriesToDelete {
+                       natMap.Logger.Debug("Deleting orphan ingress NAT entry",
+                               "key", key.ToHost())
                        if deleted, _ := natMap.Delete(key); deleted {
                                stats.IngressDeleted++
                        }
diff --git a/pkg/maps/ctmap/gc/gc.go b/pkg/maps/ctmap/gc/gc.go
index 74488c992e5..501928a6916 100644
--- a/pkg/maps/ctmap/gc/gc.go
+++ b/pkg/maps/ctmap/gc/gc.go
@@ -471,6 +471,8 @@ func (gc *GC) runGC(ipv4, ipv6, triggeredBySignal bool, filter ctmap.GCFilter) (
                }
        }

+       triggeredBySignal = true
+
        if triggeredBySignal {
                // This works under the assumption that [maps] contains consecutive pairs
                // of CT maps, respectively of TCP and ANY type, which is enforced for

Then set the following values in the Cilium's ConfigMap:

bpf-ct-timeout-regular-tcp-fin: 3600s # default is 60s, so to avoid CT GC removing it
conntrack-gc-interval: 60s

Then observed on the node the following entries (local node -> remote pod):

root@cluster-worker2:/home/cilium# cilium bpf ct list global | grep 51064
TCP OUT 172.128.0.6:51064 -> 10.196.0.36:4240 expires=105280 Packets=0 Bytes=0 RxFlagsSeen=0x1a LastRxReport=97280 TxFlagsSeen=0x1a LastTxReport=97280 Flags=0x0010 [ SeenNonSyn ] RevNAT=0 SourceSecurityID=0 BackendID=0 NatPort=0
root@cluster-worker2:/home/cilium# cilium bpf nat list  | grep 10.196.0
TCP IN 10.196.0.36:4240 -> 172.128.0.6:51064 XLATE_DST 172.128.0.6:51064 Created=35sec ago NeedsCT=1
TCP OUT 172.128.0.6:51064 -> 10.196.0.36:4240 XLATE_SRC 172.128.0.6:51064 Created=35sec ago NeedsCT=1

The latter NAT entries were gone after a few seconds. Meanwhile in the agent logs:

lsource=cilium/pkg/maps/ctmap/ctmap.go:675 msg="Deleting orphan egress NAT entry" bpfMapPath=cilium_snat_v4_external bpfMapName=cilium_snat_v4_external key="172.128.0.6:51064 --> 10.196.0.36:4240, 6, 0"
source=cilium/pkg/maps/ctmap/ctmap.go:682 msg="Deleting orphan ingress NAT entry" bpfMapPath=cilium_snat_v4_external bpfMapName=cilium_snat_v4_external key="10.196.0.36:4240 --> 172.128.0.6:51064, 6, 1"

I haven't looked whether the issue persists in all stable versions. Also, I haven't checked whether other flows are affected (other than local node -> remote).

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.feature/conntrackkind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions