-
Notifications
You must be signed in to change notification settings - Fork 49
Upgrade to systemd 243+ breaks pod networking with AWS CNI due to veth MAC Address getting overwritten #278
Copy link
Copy link
Closed
Description
Description
Our flatcar image was auto-updated from 2512.4.0 to 2605.5.0, this somehow broke the ability for the Node to talk to pods running on it.
Impact
Pods on worker nodes are not able to communicate with Master Nodes API server pods.
Environment and steps to reproduce
- Set-up:
- Kubernetes Client Version: version.Info Major:"1", Minor:"19", GitVersion:"v1.19.1"
- Kubernetes Server Version: version.Info Major:"1", Minor:"16", GitVersion:"v1.16.13"
- Running on AWS instances using Flatcar 2605.5.0 (also tested with 2605.7.0)
- Cilium v1.7.5 (also tested with Cilium v1.8.5)
- AWS VPC CNI (v1.6.3)
-
Task: Reach a pod running on the Node
-
Action(s):
a. Upgrading from Flatcar 2512.4.0 to 2605.5.0 -
Error: The node cannot reach the pod running on it.
Node (ip-10-64-52-104.eu-west-1.compute.internal ) to POD (10.64.36.243) on Master-newFC (ip-10-64-32-253.eu-west-1.compute.internal)
tracepath 10.64.36.243
1?: [LOCALHOST] pmtu 9001
1: ip-10-64-32-253.eu-west-1.compute.internal 0.503ms
1: ip-10-64-32-253.eu-west-1.compute.internal 0.464ms
2: no reply
3: no reply
4: no reply
5: no reply
6: no reply
...
30: no reply
Too many hops: pmtu 9001
Resume: pmtu 9001
Expected behavior
Node (ip-10-64-52-104.eu-west-1.compute.internal ) to POD (10.64.33.129) on Master-oldFC (ip-10-64-34-191.eu-west-1.compute.internal)
tracepath 10.64.33.129
1?: [LOCALHOST] pmtu 9001
1: ip-10-64-34-191.eu-west-1.compute.internal 0.538ms
1: ip-10-64-34-191.eu-west-1.compute.internal 0.460ms
2: ip-10-64-33-129.eu-west-1.compute.internal 0.475ms reached
Resume: pmtu 9001 hops 2 back 2
Additional information
Cilium-monitor output when trying to run tracepath on a node with a pod running on it
level=info msg="Initializing dissection cache..." subsys=monitor
-> endpoint 1077 flow 0xd4db6b68 identity 1->66927 state new ifindex 0 orig-ip 10.64.32.253: 10.64.32.253:36282 -> 10.64.39.43:44444 udp
-> stack flow 0xa466c6d3 identity 66927->1 state related ifindex 0 orig-ip 0.0.0.0: 10.64.39.43 -> 10.64.32.253 DestinationUnreachable(Port)
TCP dump on Node trying to reach a pod running on it.
15:18:00.676152 IP ip-10-64-32-253.eu-west-1.compute.internal.58914 > ip-10-64-52-104.eu-west-1.compute.internal.4240: Flags [.], ack 548860955, win 491, options [nop,nop,TS val 3987550058 ecr 3030925508], length 0
15:18:00.676520 IP ip-10-64-52-104.eu-west-1.compute.internal.4240 > ip-10-64-32-253.eu-west-1.compute.internal.58914: Flags [.], ack 1, win 489, options [nop,nop,TS val 3030955756 ecr 3987534941], length 0
15:18:00.919448 IP ip-10-64-52-104.eu-west-1.compute.internal.4240 > ip-10-64-32-253.eu-west-1.compute.internal.58914: Flags [.], ack 1, win 489, options [nop,nop,TS val 3030955999 ecr 3987534941], length 0
15:18:00.919497 IP ip-10-64-32-253.eu-west-1.compute.internal.58914 > ip-10-64-52-104.eu-west-1.compute.internal.4240: Flags [.], ack 1, win 491, options [nop,nop,TS val 3987550301 ecr 3030955756], length 0
15:18:01.465589 IP ip-10-64-52-104.eu-west-1.compute.internal.34294 > ip-10-64-36-243.eu-west-1.compute.internal.44448: UDP, length 8973
15:18:01.465630 IP ip-10-64-52-104.eu-west-1.compute.internal.34294 > ip-10-64-36-243.eu-west-1.compute.internal.44448: UDP, length 8973
15:18:01.465647 IP ip-10-64-36-243.eu-west-1.compute.internal > ip-10-64-52-104.eu-west-1.compute.internal: ICMP ip-10-64-36-243.eu-west-1.compute.internal udp port 44448 unreachable, length 556
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels