Skip to content

Request time out from container to other container using hostIP:hostPort on same host with portmap CNI chained #9784

@splushii

Description

@splushii

Bug report

General Information

  • Cilium version (run cilium version)
Client: 1.6.4 8048d320a 2019-11-27T17:00:12+01:00 go version go1.12.13 linux/amd64
Daemon: 1.6.4 8048d320a 2019-11-27T17:00:12+01:00 go version go1.12.13 linux/amd64
  • Kernel version (run uname -a)
Linux tender-drake 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Orchestration system version in use (e.g. kubectl version, Mesos, ...)
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:09:21Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Link to relevant artifacts (policies, deployments scripts, ...)
  • Upload a system dump (run curl -sLO releases.cilium.io/tools/cluster-diagnosis.zip && python cluster-diagnosis.zip sysdump and then attach the generated zip file)

( The sysdump was too big to attach. Please let me know if it's needed and I'll send or make it available somehow. )

How to reproduce the issue

  1. Install Cilium 1.6.4 using Helm chart from https://github.com/cilium/cilium/archive/v1.6.4.tar.gz with the following values:
---
global:
  tag: v1.6.4
  cluster:
    name: "k8slab"
    id: "1"
  cni:
    install: true
    chainingMode: portmap
  ipv4:
    enabled: true
  ipv6:
    enabled: false
  prometheus:
    enabled: false
  1. Deploy container using hostPort
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
          - containerPort: 80
            hostPort: 35000
  1. Deploy some debug container connecting to the hostPort service
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ubuntu
spec:
  selector:
    matchLabels:
      app: ubuntu
  template:
    metadata:
      labels:
        app: ubuntu
    spec:
      containers:
      - name: ubuntu
        image: ubuntu
        command:
          - '/bin/bash'
          - '-c'
          - '--'
        args:
          - 'apt-get update -y; apt-get install -y netcat curl; while true; do sleep 30; done;'
  1. Select a hostPort pod and debug pod on the same host (e.g. nginx-2cb8j and ubuntu-vf44z in the example below)
$ kubectl get po -owide
NAME           READY   STATUS    RESTARTS   AGE   IP             NODE            NOMINATED NODE   READINESS GATES
nginx-2cb8j    1/1     Running   0          24m   172.16.3.16    tender-drake    <none>           <none>
nginx-7b4wr    1/1     Running   0          24m   172.16.7.154   mutual-dodo     <none>           <none>
nginx-drtk8    1/1     Running   0          24m   172.16.8.63    full-evil       <none>           <none>
nginx-dzk7n    1/1     Running   0          24m   172.16.4.59    right-ferret    <none>           <none>
nginx-l89zr    1/1     Running   0          24m   172.16.6.17    loving-gopher   <none>           <none>
nginx-l9ntn    1/1     Running   0          24m   172.16.5.128   strong-caiman   <none>           <none>
ubuntu-bb44g   1/1     Running   0          20m   172.16.5.210   strong-caiman   <none>           <none>
ubuntu-hhpv7   1/1     Running   0          20m   172.16.7.223   mutual-dodo     <none>           <none>
ubuntu-n2p99   1/1     Running   0          20m   172.16.4.245   right-ferret    <none>           <none>
ubuntu-rlv5c   1/1     Running   0          20m   172.16.6.156   loving-gopher   <none>           <none>
ubuntu-tjr88   1/1     Running   0          20m   172.16.8.70    full-evil       <none>           <none>
ubuntu-vf44z   1/1     Running   0          20m   172.16.3.47    tender-drake    <none>           <none>
  1. Try to contact hostPort pod via its hostIP
    Using the above example:
$ kubectl get pod nginx-2cb8j -ojsonpath='{.status.hostIP}'
10.24.214.230
$ kubectl exec -ti ubuntu-vf44z bash
root@ubuntu-vf44z:/# curl 10.24.214.230:35000 -v
* Rebuilt URL to: 10.24.214.230:35000/
*   Trying 10.24.214.230...
* TCP_NODELAY set
* connect to 10.24.214.230 port 35000 failed: Connection timed out
* Failed to connect to 10.24.214.230 port 35000: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to 10.24.214.230 port 35000: Connection timed out

The cilium monitor output is (where id 568 is the debug pod ubuntu-vf44z):

cilium monitor --related-to 568
Listening for events on 32 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
-> host from flow 0xab4466b0 identity 111636->1 state new ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> endpoint 568 flow 0x85c0bb7d identity 111706->111636 state new ifindex lxc2faa014a7904: 172.16.3.16:80 -> 172.16.3.47:49290 tcp SYN, ACK
-> host from flow 0x65945f77 identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0x405d06ef identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0xc974addb identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN
-> host from flow 0x73876df9 identity 111636->1 state established ifindex cilium_net: 172.16.3.47:49290 -> 10.24.214.230:35000 tcp SYN

Contacting the hostPort pod using its ClusterIP works:

root@ubuntu-vf44z:/# curl 172.16.3.16:80 -I
HTTP/1.1 200 OK
...

and contacting a pod running on another host using the hostIP and hostPort works:

$ kubectl get pod nginx-7b4wr -ojsonpath='{.status.hostIP}'
10.24.214.232
root@ubuntu-vf44z:/# curl 10.24.214.232:35000 -I   
HTTP/1.1 200 OK
...

It seems the issue only arises when the source and destination pod is on the same host and the hostIP:hostPort is used.

Metadata

Metadata

Assignees

Labels

area/cniImpacts the Container Networking Interface between Cilium and the orchestrator.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.priority/highThis is considered vital to an upcoming release.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions