Skip to content

cilium-agent SIGSEGV in filterAndDestroyUDPSockets due to missing nil check on sockInfo #44768

@JonaLoefflerLeanix

Description

@JonaLoefflerLeanix

Bug Report

Summary

cilium-agent crashes with a SIGSEGV (nil pointer dereference) in filterAndDestroyUDPSockets when iterateNetlinkSockets encounters a netlink error during UDP socket cleanup. The callback dereferences sockInfo.ID without checking if sockInfo is nil.

Version

  • Cilium: v1.18.5
  • The bug is still present on main (where the function was renamed to filterAndDestroySockets).

How to reproduce

The crash is triggered when the load balancer reconciler runs terminateUDPConnectionsToBackend (e.g., after a Kubernetes node deletion event) and iterateNetlinkSockets encounters a netlink error (such as NLMSG_ERROR).

In our case, a node deletion event caused all cilium-agent DaemonSet pods on the affected node pool (3 nodes) to crash simultaneously, since they all attempted UDP socket cleanup at the same time and hit the same netlink error path.

Logs / Stack Trace

Immediately before the crash, a node deletion event is logged:

time=2026-03-12T13:00:26.133296772Z level=info msg="Node deleted" module=agent.controlplane.node-manager nodeName=<redacted>

Followed by the panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x34ede8a]

goroutine 54262 [running, locked to thread]:
github.com/cilium/cilium/pkg/datapath/sockets.Destroy.filterAndDestroyUDPSockets.func2(0xc0130dc7c0?, {0x506ffe0?, 0x517e470?})
	/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/termination.go:91 +0x56
github.com/cilium/cilium/pkg/datapath/sockets.iterateNetlinkSockets(0x11, 0x2, 0xffff, 0xc002cade00)
	/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:347 +0x4c4
github.com/cilium/cilium/pkg/datapath/sockets.filterAndDestroyUDPSockets(...)
	/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:154
	/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:155 +0x2a
github.com/cilium/cilium/pkg/datapath/sockets.Destroy(0xc00183d890, {{0xc02280f49c, 0x4, 0x4}, 0x35, 0x2, 0x11, 0xc012db7a50})
	/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:93 +0x1e5
github.com/cilium/cilium/pkg/loadbalancer/reconciler.terminateUDPConnectionsToBackend.terminateUDPConnectionsToBackend.func2.func3()
	/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/termination.go:206 +0xd7
github.com/cilium/cilium/pkg/netns.(*NetNS).Do.func1()
	/go/src/github.com/cilium/cilium/pkg/netns/netns_linux.go:175 +0x8e
github.com/cilium/cilium/pkg/loadbalancer/reconciler.socketDestroyer.Destroy({0x134?}, {{0xc02280f49c, 0x4, 0x4}, 0x35, 0x2, 0x11, 0xc012db7a50})
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x93
	/go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:93 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1276

Root Cause

In pkg/datapath/sockets/sockets.go, filterAndDestroyUDPSockets passes sockInfo.ID to the callback without checking for nil:

// v1.18.5 - line 137
func filterAndDestroyUDPSockets(family uint8, socketCB func(socket netlink.SocketID, err error)) error {
	return iterateNetlinkSockets(unix.IPPROTO_UDP, family, 0xffff, func(sockInfo *Socket, err error) error {
		socketCB(sockInfo.ID, err)  // <-- sockInfo can be nil here
		return nil
	})
}

However, iterateNetlinkSockets passes nil for sockInfo in multiple error paths (lines 301, 305, 309, 317):

// s.Receive() error
fn(nil, err)

// wrong sender PID
fn(nil, fmt.Errorf("Wrong sender portid %d, expected %d", from.Pid, nl.PidKernel))

// empty messages
fn(nil, errors.New("no message nor error from netlink"))

// NLMSG_ERROR
fn(nil, syscall.Errno(-error))

On main, the function was renamed to filterAndDestroySockets but the same missing nil check exists.

Suggested Fix

Add a nil guard in the callback:

func filterAndDestroyUDPSockets(family uint8, socketCB func(socket netlink.SocketID, err error)) error {
	return iterateNetlinkSockets(unix.IPPROTO_UDP, family, 0xffff, func(sockInfo *Socket, err error) error {
		if sockInfo == nil {
			return err
		}
		socketCB(sockInfo.ID, err)
		return nil
	})
}

Impact

  • All cilium-agent pods on affected nodes crash simultaneously when a node deletion triggers UDP socket cleanup and a netlink error occurs.
  • Pods enter a crash loop since the same condition is hit on restart if the trigger persists.
  • Affects v1.18.5 (confirmed), v1.18.7 (confirmed by code review), and main (confirmed by code review).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions