Skip to content

cilium: fix up source address selection for cluster ip#8141

Merged
tgraf merged 1 commit intomasterfrom
pr/cluster-ip-fix
May 29, 2019
Merged

cilium: fix up source address selection for cluster ip#8141
tgraf merged 1 commit intomasterfrom
pr/cluster-ip-fix

Conversation

@borkmann
Copy link
Copy Markdown
Member

@borkmann borkmann commented May 28, 2019

The CI test 'K8sServicesTest Checks ClusterIP Connectivity Checks service
on same node' failed recently due to a buggy L7 policy removal from prior
test; fixed via d8ff018 ("test: fix incorrect deletion statement for
policy"). This incorrect deletion uncovered another bug however triggered
by curl failing to connect to the endpoint backing the service:

[...] (see commit message)

Fixes: #7902


This change is Reviewable

@borkmann borkmann requested review from a team and brb May 28, 2019 20:10
@borkmann borkmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/daemon Impacts operation of the Cilium daemon. kind/bug This is a bug in the Cilium logic. needs-backport/1.5 labels May 28, 2019
@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

The CI test 'K8sServicesTest Checks ClusterIP Connectivity Checks service
on same node' failed recently due to a buggy L7 policy removal from prior
test; fixed via d8ff018 ("test: fix incorrect deletion statement for
policy"). This incorrect deletion uncovered another bug however triggered
by curl failing to connect to the endpoint backing the service:

  # curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 8 http://10.110.178.169/ -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
time-> DNS: '0.000038()', Connect: '0.000000',Transfer '0.000000', total '3.203689'

  (Note, DNS has nothing to do with the failure.)

  # docker exec -ti 6d5465170bf4 bash
  # cilium bpf lb list
  SERVICE ADDRESS      BACKEND ADDRESS
  10.110.178.169:80    0.0.0.0:0 (6)        <--- service from above curl
                       10.10.0.183:80 (6)
                       10.10.0.62:80 (6)
  10.101.55.250:2379   10.10.1.67:2379 (3)
                       0.0.0.0:0 (3)
                       10.10.1.162:2379 (3)
                       10.10.0.24:2379 (3)
  10.96.0.10:53        0.0.0.0:0 (2)
                       10.10.1.185:53 (2)
  10.96.0.1:443        192.168.36.11:6443 (1)
                       0.0.0.0:0 (1)
  # exit

  # ip route get 10.110.178.169
  10.110.178.169 via 10.0.2.2 dev enp0s3 src 10.0.2.15 uid 0
      cache
  # ip route get 10.10.0.183
  10.10.0.183 via 10.10.0.160 dev cilium_host src 10.10.0.160 uid 0
      cache
  # ip route get 10.10.0.62
  10.10.0.62 via 10.10.0.160 dev cilium_host src 10.10.0.160 uid 0
      cache
  # ip r
  default via 10.0.2.2 dev enp0s3 proto dhcp src 10.0.2.15 metric 100
  10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15
  10.0.2.2 dev enp0s3 proto dhcp scope link src 10.0.2.15 metric 100
  10.10.0.0/24 via 10.10.0.160 dev cilium_host src 10.10.0.160
  10.10.0.160 dev cilium_host scope link
  10.10.1.0/24 via 10.10.0.160 dev cilium_host src 10.10.0.160
  172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
  172.28.128.0/24 dev enp0s9 proto kernel scope link src 172.28.128.3
  192.168.9.0/24 dev br-e6fcdf93b490 proto kernel scope link src 192.168.9.1
  192.168.36.0/24 dev enp0s8 proto kernel scope link src 192.168.36.11

The curl is executed out of hostns and the service IP is xlated
via Kubernetes cluster IP which hooks into iptables nat table in
post-routing. What could be seen in cilium monitor was that at the
cilium_host device (bpf_netdev on egress), the following mapping
was found in the ipcache ...

  Successfully mapped daddr=10.0.2.15 to identity=2

... and the packet dropped later on due to policy. The id mapping
of 2 (world) turns out to be the 0/0 catch-all:

  # cilium bpf ipcache get 10.0.2.15
  10.0.2.15 maps to identity 2 0 0.0.0.0

Aside form the mapping, the other part which is unexpected is the source
IP selection. Assumption would have been that given the backend is local,
we would have used 10.10.0.160 based on this route:

  10.10.0.0/24 via 10.10.0.160 dev cilium_host src 10.10.0.160

As can be seen from above `ip route get` however the source IP selection
is based upon the service IP. In retrospect this makes sense because the
Kubernetes iptables backend selection is at post-routing, so nothing in
terms of source IP will change at that point. Thus, add a SNAT rule that
comes after the KUBE-POSTROUTING chain such that we can fix the source
address to cilium_host's address instead where it will later also find
host identity in the ipcache.

Fixes: #7902
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann force-pushed the pr/cluster-ip-fix branch from 1817172 to 10f2ffc Compare May 28, 2019 20:21
@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

@borkmann
Copy link
Copy Markdown
Member Author

test-missed-k8s

@coveralls
Copy link
Copy Markdown

coveralls commented May 28, 2019

Coverage Status

Coverage decreased (-0.03%) to 41.901% when pulling 10f2ffc on pr/cluster-ip-fix into e39da71 on master.

@borkmann
Copy link
Copy Markdown
Member Author

timeout, retrying

@borkmann
Copy link
Copy Markdown
Member Author

test-missed-k8s

@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

@ianvernon
Copy link
Copy Markdown
Member

Test failures: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/12645/

Running again to see if they reproduce.

@ianvernon ianvernon closed this May 29, 2019
@ianvernon ianvernon reopened this May 29, 2019
@ianvernon
Copy link
Copy Markdown
Member

Sigh, I fat-fingered and closed the PR facepalm

@borkmann
Copy link
Copy Markdown
Member Author

Hit #7105

@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

@tgraf
Copy link
Copy Markdown
Contributor

tgraf commented May 29, 2019

@borkmann Since we are only hitting known flakes, let's merge.

@tgraf tgraf merged commit 2bdca06 into master May 29, 2019
@tgraf tgraf deleted the pr/cluster-ip-fix branch May 29, 2019 08:49
@aanm aanm mentioned this pull request May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Impacts operation of the Cilium daemon. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/bug This is a bug in the Cilium logic.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[not a flake] K8sServicesTest Checks ClusterIP Connectivity Checks service on same node

5 participants