Skip to content

datapath,daemon: Fix initialization panics when IPv6 is enabled#12197

Merged
borkmann merged 2 commits intomasterfrom
pr/brb/ipv6-panic-omg
Jun 19, 2020
Merged

datapath,daemon: Fix initialization panics when IPv6 is enabled#12197
borkmann merged 2 commits intomasterfrom
pr/brb/ipv6-panic-omg

Conversation

@brb
Copy link
Copy Markdown
Member

@brb brb commented Jun 19, 2020

This PR fixes two panics mentioned in #12188 (see commit msgs).

Fix #12188

brb added 2 commits June 19, 2020 09:56
Previously, if IPv{4,6} global scope addrs could have not been derived
for BPF NodePort, the agent had logged an error and panicked.

In v1.8, we extended the device detection to include devices with k8s
InternalIP/ExternalIP addrs. The detection does not check a scope of
those addrs. So, it is possible that an upgrade to v1.8 might break
for users with --enable-ipv6=true and --kube-proxy-replacement=probe.

To avoid that, for now just disable BPF NodePort and friends if no
global scope addr can be detected. In the future, we should revisit
whether it makes sense to consider local scope addrs for IPv6 too.

Reported-by: Jed Salazar <jed@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Otherwise, cilium-agent will panic when --enable-ipv6=true with the
following:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b17ec7]

    [...]
    /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:344 +0x31d7

Fixes: a562b74 ("bpf: Check native-routing-cidr in BPF masquerade")
Reported-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb added kind/bug This is a bug in the Cilium logic. pending-review area/daemon Impacts operation of the Cilium daemon. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 19, 2020
@brb brb requested review from a team and borkmann June 19, 2020 08:18
@brb
Copy link
Copy Markdown
Member Author

brb commented Jun 19, 2020

test-me-please

Comment thread daemon/cmd/kube_proxy_replacement.go
@brb brb requested a review from aanm June 19, 2020 08:35
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.01%) to 37.11% when pulling 8d97887 on pr/brb/ipv6-panic-omg into 0e535cb on master.

@rolinh rolinh self-requested a review June 19, 2020 09:06
@borkmann
Copy link
Copy Markdown
Member

provision fail on net-next suite:

09:11:58  0%...10%...20%...30%...40%...
09:11:58  Progress state: VBOX_E_INVALID_OBJECT_STATE
09:11:58  VBoxManage: error: Machine delete failed
09:11:58  VBoxManage: error: Medium '/root/VirtualBox VMs/ubuntu-18.04.4-amd64-59_1592555301695_52087/ubuntu-18.04.4-amd64-59-disk001.vmdk' is locked for reading by another task
09:11:58  VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component MediumWrap, interface IMedium
09:11:58  VBoxManage: error: Context: "RTEXITCODE handleUnregisterVM(HandlerArg*)" at line 162 of file VBoxManageMisc.cpp
09:11:58  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:11:58  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:11:58  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:11:58  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:11:58  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:12:00  0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
09:12:00  Clean docker images from registry
09:12:01  registry

@borkmann
Copy link
Copy Markdown
Member

retest-net-next

Copy link
Copy Markdown
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR with also the fix in #12198 fix the panic I reported here.
However, I uncovered another bug that prevents an IPv6 only cluster to run (not a regression, already the case for v1.7.5): #12201.

I haven't tested a dual-stack cluster yet.

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 19, 2020
@borkmann borkmann merged commit 5462ced into master Jun 19, 2020
@borkmann borkmann deleted the pr/brb/ipv6-panic-omg branch June 19, 2020 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Impacts operation of the Cilium daemon. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.8-rc3 crashloop with ipv6 enabled

5 participants