Bug report
General Information
- Cilium version (run
cilium version)
Client: 1.6.4 8048d320a 2019-11-27T17:00:12+01:00 go version go1.12.13 linux/amd64
Daemon: 1.6.4 8048d320a 2019-11-27T17:00:12+01:00 go version go1.12.13 linux/amd64
- Kernel version (run
uname -a)
Linux tender-drake 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Orchestration system version in use (e.g.
kubectl version, Mesos, ...)
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
- Link to relevant artifacts (policies, deployments scripts, ...)
- Upload a system dump (run
curl -sLO releases.cilium.io/tools/cluster-diagnosis.zip && python cluster-diagnosis.zip sysdump and then attach the generated zip file)
( The sysdump was too big to attach. Please let me know if it's needed and I'll send or make it available somehow. )
How to reproduce the issue
- Install Cilium 1.6.4 using Helm chart from https://github.com/cilium/cilium/archive/v1.6.4.tar.gz with the following values:
---
global:
tag: v1.6.4
cluster:
name: "k8slab"
id: "1"
cni:
install: true
chainingMode: portmap
ipv4:
enabled: true
ipv6:
enabled: false
prometheus:
enabled: false
- Deploy container using
hostPort
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
hostPort: 35000
- Deploy some debug container connecting to the
hostPort service
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ubuntu
spec:
selector:
matchLabels:
app: ubuntu
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu
command:
- '/bin/bash'
- '-c'
- '--'
args:
- 'apt-get update -y; apt-get install -y netcat curl; while true; do sleep 30; done;'
- Select a
hostPort pod and debug pod on the same host (e.g. nginx-2cb8j and ubuntu-vf44z in the example below)
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-2cb8j 1/1 Running 0 24m 172.16.3.16 tender-drake <none> <none>
nginx-7b4wr 1/1 Running 0 24m 172.16.7.154 mutual-dodo <none> <none>
nginx-drtk8 1/1 Running 0 24m 172.16.8.63 full-evil <none> <none>
nginx-dzk7n 1/1 Running 0 24m 172.16.4.59 right-ferret <none> <none>
nginx-l89zr 1/1 Running 0 24m 172.16.6.17 loving-gopher <none> <none>
nginx-l9ntn 1/1 Running 0 24m 172.16.5.128 strong-caiman <none> <none>
ubuntu-bb44g 1/1 Running 0 20m 172.16.5.210 strong-caiman <none> <none>
ubuntu-hhpv7 1/1 Running 0 20m 172.16.7.223 mutual-dodo <none> <none>
ubuntu-n2p99 1/1 Running 0 20m 172.16.4.245 right-ferret <none> <none>
ubuntu-rlv5c 1/1 Running 0 20m 172.16.6.156 loving-gopher <none> <none>
ubuntu-tjr88 1/1 Running 0 20m 172.16.8.70 full-evil <none> <none>
ubuntu-vf44z 1/1 Running 0 20m 172.16.3.47 tender-drake <none> <none>
- Try to contact
hostPort pod via its hostIP
Using the above example:
$ kubectl get pod nginx-2cb8j -ojsonpath='{.status.hostIP}'
10.24.214.230
$ kubectl exec -ti ubuntu-vf44z bash
root@ubuntu-vf44z:/# curl 10.24.214.230:35000 -v
* Rebuilt URL to: 10.24.214.230:35000/
* Trying 10.24.214.230...
* TCP_NODELAY set
* connect to 10.24.214.230 port 35000 failed: Connection timed out
* Failed to connect to 10.24.214.230 port 35000: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to 10.24.214.230 port 35000: Connection timed out
The cilium monitor output is (where id 568 is the debug pod ubuntu-vf44z):
cilium monitor --related-to 568
Listening for events on 32 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> host from flow 0xab4466b0 identity 111636->1 state new ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> endpoint 568 flow 0x85c0bb7d identity 111706->111636 state new ifindex lxc2faa014a7904: 172.16.3.16:80 -> 172.16.3.47:49290 tcp SYN, ACK
-> host from flow 0x65945f77 identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0x405d06ef identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0xc974addb identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0x73876df9 identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
Contacting the hostPort pod using its ClusterIP works:
root@ubuntu-vf44z:/# curl 172.16.3.16:80 -I
HTTP/1.1 200 OK
...
and contacting a pod running on another host using the hostIP and hostPort works:
$ kubectl get pod nginx-7b4wr -ojsonpath='{.status.hostIP}'
10.24.214.232
root@ubuntu-vf44z:/# curl 10.24.214.232:35000 -I
HTTP/1.1 200 OK
...
It seems the issue only arises when the source and destination pod is on the same host and the hostIP:hostPort is used.
Bug report
General Information
cilium version)uname -a)kubectl version, Mesos, ...)curl -sLO releases.cilium.io/tools/cluster-diagnosis.zip && python cluster-diagnosis.zip sysdumpand then attach the generated zip file)( The sysdump was too big to attach. Please let me know if it's needed and I'll send or make it available somehow. )
How to reproduce the issue
hostPorthostPortservicehostPortpod and debug pod on the same host (e.g.nginx-2cb8jandubuntu-vf44zin the example below)hostPortpod via itshostIPUsing the above example:
The
cilium monitoroutput is (where id 568 is the debug podubuntu-vf44z):Contacting the hostPort pod using its ClusterIP works:
root@ubuntu-vf44z:/# curl 172.16.3.16:80 -I HTTP/1.1 200 OK ...and contacting a pod running on another host using the hostIP and hostPort works:
It seems the issue only arises when the source and destination pod is on the same host and the hostIP:hostPort is used.