Skip to content
This repository was archived by the owner on Oct 16, 2020. It is now read-only.
This repository was archived by the owner on Oct 16, 2020. It is now read-only.

Kernel panic with vxlan in openvswitch (via openshift) #2382

@squeed

Description

@squeed

Issue Report

The Openshift openvswitch-based network kernel panics as soon as a pod receives a packet from an external node.

Bug

Container Linux Version

1688.3.0

Environment

libvirt+qemu

Reproduction Steps

This is a bit complicated. I have a hybrid OpenShift cluster on qemu, where some workers are CentOS and some are Container Linux (don't judge)

I've got a script and some bootstrapping instructions here: https://github.com/squeed/os-on-cl

Once you have a cluster running:

  1. Kill all the other workers, so you get scheduled where you want.
  2. Run a pod: kubectl run --rm -ri --image alpine test /bin/sh
  3. Get the pods IP: ip addr
  4. On another node in the cluster, ping that IP. The node should kernel panic instantly.

Other information

The traceback:

[ 3187.113634] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3187.127232] IP:           (null)
[ 3187.130072] PGD 0 P4D 0 
[ 3187.132790] Oops: 0010 [#1] SMP PTI
[ 3187.135579] Modules linked in: veth xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_mark xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c crc32c_generic overlay nls_ascii nls_cp437 vfat fat mousedev evdev virtio_balloon psmouse i2c_piix4 i2c_core button sch_fq_codel ext4 crc16 mbcache jbd2 fscrypto dm_verity dm_bufio virtio_console virtio_blk uhci_hcd crc32c_intel 8139too ata_piix aesni_intel aes_x86_64 crypto_simd libata ehci_pci cryptd ehci_hcd glue_helper scsi_mod virtio_pci virtio_ring virtio
[ 3187.152444]  usbcore usb_common 8139cp mii qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod dax
[ 3187.155193] CPU: 0 PID: 828 Comm: handler2 Not tainted 4.14.24-coreos #1
[ 3187.157113] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[ 3187.160340] task: ffff9bca82850000 task.stack: ffffa98080454000
[ 3187.162307] RIP: 0010:          (null)
[ 3187.163972] RSP: 0018:ffffa98080457738 EFLAGS: 00010286
[ 3187.166516] RAX: 0000000000000000 RBX: ffff9bca82ab73a8 RCX: 00000000000005aa
[ 3187.168583] RDX: ffff9bca82ab7700 RSI: 0000000000000000 RDI: ffff9bca82ab7300
[ 3187.170548] RBP: ffffa98080457820 R08: 0000000000000006 R09: ffff9bcaba1cf300
[ 3187.172544] R10: 0000000000000002 R11: 0000000000000000 R12: ffff9bca82ab7700
[ 3187.174510] R13: ffff9bcab6a32c00 R14: ffff9bcab6a32c00 R15: ffff9bcab80d2000
[ 3187.176484] FS:  00007f0b95c12700(0000) GS:ffff9bcabfc00000(0000) knlGS:0000000000000000
[ 3187.179044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3187.183692] CR2: 0000000000000000 CR3: 0000000075a3e003 CR4: 00000000003606f0
[ 3187.185690] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3187.187634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3187.189599] Call Trace:
[ 3187.190635]  ? vxlan_dev_create+0x9d0/0x2d2d [vxlan]
[ 3187.192165]  ? vxlan_dev_create+0x2164/0x2d2d [vxlan]
[ 3187.193774]  ? vxlan_dev_create+0x2164/0x2d2d [vxlan]
[ 3187.195253]  ? dev_hard_start_xmit+0xa1/0x200
[ 3187.197008]  ? vxlan_dev_create+0x1f60/0x2d2d [vxlan]
[ 3187.199381]  ? dev_hard_start_xmit+0xa1/0x200
[ 3187.201510]  ? __dev_queue_xmit+0x688/0x7c0
[ 3187.202913]  ? 0xffffffffc03c6d34
[ 3187.204154]  ? __dev_queue_xmit+0x7c0/0x7c0
[ 3187.205521]  ? 0xffffffffc03c6d34
[ 3187.206680]  ? ovs_match_init+0x82a/0xd10 [openvswitch]
[ 3187.208464]  ? __kmalloc+0x191/0x210
[ 3187.209941]  ? ovs_execute_actions+0x48/0x110 [openvswitch]
[ 3187.211757]  ? ovs_execute_actions+0x48/0x110 [openvswitch]
[ 3187.214478]  ? action_fifos_exit+0x2e9/0x34e0 [openvswitch]
[ 3187.215895]  ? genl_family_rcv_msg+0x1e4/0x390
[ 3187.217117]  ? genl_rcv_msg+0x47/0x90
[ 3187.218199]  ? __kmalloc_node_track_caller+0x222/0x2c0
[ 3187.219499]  ? genl_family_rcv_msg+0x390/0x390
[ 3187.220688]  ? netlink_rcv_skb+0x4d/0x130
[ 3187.222159]  ? genl_rcv+0x24/0x40
[ 3187.223190]  ? netlink_unicast+0x196/0x240
[ 3187.224333]  ? netlink_sendmsg+0x2b8/0x3b0
[ 3187.225448]  ? sock_sendmsg+0x36/0x40
[ 3187.226478]  ? ___sys_sendmsg+0x2a0/0x2f0
[ 3187.227523]  ? sock_poll+0x70/0x90
[ 3187.228522]  ? ep_send_events_proc+0x86/0x1a0
[ 3187.230168]  ? ep_ptable_queue_proc+0xa0/0xa0
[ 3187.231823]  ? ep_scan_ready_list.constprop.17+0x217/0x220
[ 3187.233234]  ? ep_poll+0x1e3/0x3a0
[ 3187.234271]  ? __sys_sendmsg+0x51/0x90
[ 3187.235343]  ? __sys_sendmsg+0x51/0x90
[ 3187.236470]  ? do_syscall_64+0x67/0x120
[ 3187.237579]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3187.239199] Code:  Bad RIP value.
[ 3187.240352] RIP:           (null) RSP: ffffa98080457738
[ 3187.241930] CR2: 0000000000000000
[ 3187.243113] ---[ end trace 5714c8771c746674 ]---
[ 3187.245624] Kernel panic - not syncing: Fatal exception in interrupt
[ 3187.248332] Kernel Offset: 0x2a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3187.251304] Rebooting in 10 seconds..

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions