Skip to content

Inconsistent behavior related to VRRP packets drops #18347

@ooraini

Description

@ooraini

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I'm running keepalived on two of my hosts and they are both reporting a timeout.
The VIP is 192.168.33.200. Running ip ad | grep 200 on the two nodes:

node3 | CHANGED | rc=0 >>
    inet 192.168.33.200/32 scope global eth1
node2 | CHANGED | rc=0 >>
    inet 192.168.33.200/32 scope global eth1

Looking at hubble observe -t drop I see the following:

Dec 30 21:01:55.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:55.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.415: fe:54:00:7f:be:9e <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.472: fe:54:00:34:d6:c7 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.702: fe:54:00:e3:4a:21 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.829: fe:54:00:4b:40:45 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.861: fe:54:00:1e:82:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:57.140: fe:54:00:2c:b7:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:57.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:57.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:58.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:58.456: fe:54:00:34:d6:c7 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:58.686: fe:54:00:e3:4a:21 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:58.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:59.124: fe:54:00:2c:b7:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)

JSON output:

{"time":"2021-12-30T21:06:17.103685380Z","verdict":"DROPPED","drop_reason":166,"ethernet":{"source":"fe:54:00:2c:b7:f6","destination":"01:80:c2:00:00:00"},"source":{"identity":1,"labels":["reserved:host"]},"destination":{"identity":2,"labels":["reserved:world"]},"Type":"L3_L4","node_name":"node2","event_type":{"type":1,"sub_type":166},"traffic_direction":"INGRESS","drop_reason_desc":"UNSUPPORTED_L2_PROTOCOL","Summary":"Ethernet"}
{"time":"2021-12-30T21:06:17.210109754Z","verdict":"DROPPED","drop_reason":137,"ethernet":{"source":"52:54:00:2c:b7:f6","destination":"52:54:00:4b:40:45"},"IP":{"source":"192.168.33.20","destination":"192.168.33.30","ipVersion":"IPv4"},"source":{"identity":1,"labels":["reserved:host"]},"destination":{"identity":6,"labels":["reserved:remote-node"]},"Type":"L3_L4","node_name":"node2","event_type":{"type":1,"sub_type":137},"traffic_direction":"INGRESS","drop_reason_desc":"CT_UNKNOWN_L4_PROTOCOL","Summary":"IPv4"}

The two nodes IPs are 192.168.33.20 and 192.168.33.30.

I've tested with both 1.11 and 1.10 with exactly identical machines(except for Cilium) and the issue only happens on 1.11

Cilium Version

Defaulted container "cilium-agent" out of: cilium-agent, clean-cilium-state (init)
Client: 1.11.0 27e0848 2021-12-05T15:34:41-08:00 go version go1.17.3 linux/amd64
Daemon: 1.11.0 27e0848 2021-12-05T15:34:41-08:00 go version go1.17.3 linux/amd64

Kernel Version

Linux node1 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Tue Dec 21 19:02:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:09:57Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

https://file.io/Aai7Vju4o9I3

Relevant log output

No response

Anything else?

cilium status --verbose:

KVStore:                Ok   Disabled
Kubernetes:             Ok   1.23 (v1.23.0) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict    [eth1 192.168.33.20 (Direct Routing)]
Host firewall:          Enabled   [eth1]
Cilium:                 Ok   1.11.0 (v1.11.0-27e0848)
NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok   
IPAM:                   IPv4: 3/254 allocated from 172.16.1.0/24, 
Allocated addresses:
  172.16.1.134 (router)
  172.16.1.196 (ingress-nginx/ingress-nginx-controller-njn5d)
  172.16.1.26 (health)
BandwidthManager:       Disabled
Host Routing:           Legacy
Masquerading:           IPTables [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF:   ktime
Controller Status:      24/24 healthy
  Name                                  Last success   Last error     Count   Message
  cilium-health-ep                      16s ago        never          0       no error   
  dns-garbage-collector-job             27s ago        never          0       no error   
  endpoint-2534-regeneration-recovery   never          never          0       no error   
  endpoint-3923-regeneration-recovery   never          never          0       no error   
  endpoint-4044-regeneration-recovery   never          never          0       no error   
  endpoint-gc                           27s ago        never          0       no error   
  ipcache-inject-labels                 2h35m8s ago    2h35m25s ago   0       no error   
  k8s-heartbeat                         26s ago        never          0       no error   
  mark-k8s-node-as-available            2h35m17s ago   never          0       no error   
  metricsmap-bpf-prom-sync              5s ago         never          0       no error   
  neighbor-table-refresh                17s ago        never          0       no error   
  resolve-identity-2534                 16s ago        never          0       no error   
  resolve-identity-3923                 1m33s ago      never          0       no error   
  resolve-identity-4044                 17s ago        never          0       no error   
  sync-endpoints-and-host-ips           17s ago        never          0       no error   
  sync-lb-maps-with-k8s-services        2h35m17s ago   never          0       no error   
  sync-policymap-2534                   14s ago        never          0       no error   
  sync-policymap-3923                   14s ago        never          0       no error   
  sync-policymap-4044                   14s ago        never          0       no error   
  sync-to-k8s-ciliumendpoint (2534)     6s ago         never          0       no error   
  sync-to-k8s-ciliumendpoint (3923)     3s ago         never          0       no error   
  sync-to-k8s-ciliumendpoint (4044)     7s ago         never          0       no error   
  template-dir-watcher                  never          never          0       no error   
  update-k8s-node-annotations           2h35m25s ago   never          0       no error   
Proxy Status:   OK, ip 172.16.1.134, 0 redirects active on ports 10000-20000
Hubble:         Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 3.61   Metrics: Disabled
KubeProxyReplacement Details:
  Status:                 Strict
  Socket LB Protocols:    TCP, UDP
  Devices:                eth1 192.168.33.20 (Direct Routing)
  Mode:                   SNAT
  Backend Selection:      Random
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767) 
  - LoadBalancer:   Enabled 
  - externalIPs:    Enabled 
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Non-TCP connection tracking   65536
  TCP connection tracking       131072
  Endpoint policy               65535
  Events                        2
  IP cache                      512000
  IP masquerading agent         16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           131072
  Neighbor table                131072
  Global policy                 16384
  Per endpoint policy           65536
  Session affinity              65536
  Signal                        2
  Sockmap                       65535
  Sock reverse NAT              65536
  Tunnel                        65536
Encryption:           Disabled
Cluster health:       6/6 reachable   (2021-12-30T21:15:14Z)
  Name                IP              Node        Endpoints
  node2 (localhost)   192.168.33.20   reachable   reachable
  node1               192.168.33.10   reachable   reachable
  node3               192.168.33.30   reachable   reachable
  node4               192.168.33.40   reachable   reachable
  node5               192.168.33.50   reachable   reachable
  node6               192.168.33.60   reachable   reachable

I'm running Rocky Linux 8.5.
My Helm overrides:

values:
  hostPort:
    enabled: true
  hostServices:
    enabled: true
  containerRuntime:
    integration: crio
  hostFirewall:
    enabled: true
  hubble:
    relay:
      enabled: true
    ui:
      enabled: true
      ingress:
        enabled: true
        className: "{{ ingress_class }}"
        hosts: ["{{ ingress_hosts['hubble'] }}"]

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/host-firewallImpacts the host firewall or the host endpoint.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.pinnedThese issues are not marked stale by our issue bot.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions