-
Notifications
You must be signed in to change notification settings - Fork 3.7k
cilium-agent SIGSEGV in filterAndDestroyUDPSockets due to missing nil check on sockInfo #44768
Description
Bug Report
Summary
cilium-agent crashes with a SIGSEGV (nil pointer dereference) in filterAndDestroyUDPSockets when iterateNetlinkSockets encounters a netlink error during UDP socket cleanup. The callback dereferences sockInfo.ID without checking if sockInfo is nil.
Version
- Cilium: v1.18.5
- The bug is still present on
main(where the function was renamed tofilterAndDestroySockets).
How to reproduce
The crash is triggered when the load balancer reconciler runs terminateUDPConnectionsToBackend (e.g., after a Kubernetes node deletion event) and iterateNetlinkSockets encounters a netlink error (such as NLMSG_ERROR).
In our case, a node deletion event caused all cilium-agent DaemonSet pods on the affected node pool (3 nodes) to crash simultaneously, since they all attempted UDP socket cleanup at the same time and hit the same netlink error path.
Logs / Stack Trace
Immediately before the crash, a node deletion event is logged:
time=2026-03-12T13:00:26.133296772Z level=info msg="Node deleted" module=agent.controlplane.node-manager nodeName=<redacted>
Followed by the panic:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x34ede8a]
goroutine 54262 [running, locked to thread]:
github.com/cilium/cilium/pkg/datapath/sockets.Destroy.filterAndDestroyUDPSockets.func2(0xc0130dc7c0?, {0x506ffe0?, 0x517e470?})
/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/termination.go:91 +0x56
github.com/cilium/cilium/pkg/datapath/sockets.iterateNetlinkSockets(0x11, 0x2, 0xffff, 0xc002cade00)
/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:347 +0x4c4
github.com/cilium/cilium/pkg/datapath/sockets.filterAndDestroyUDPSockets(...)
/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:154
/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:155 +0x2a
github.com/cilium/cilium/pkg/datapath/sockets.Destroy(0xc00183d890, {{0xc02280f49c, 0x4, 0x4}, 0x35, 0x2, 0x11, 0xc012db7a50})
/go/src/github.com/cilium/cilium/pkg/datapath/sockets/sockets.go:93 +0x1e5
github.com/cilium/cilium/pkg/loadbalancer/reconciler.terminateUDPConnectionsToBackend.terminateUDPConnectionsToBackend.func2.func3()
/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/termination.go:206 +0xd7
github.com/cilium/cilium/pkg/netns.(*NetNS).Do.func1()
/go/src/github.com/cilium/cilium/pkg/netns/netns_linux.go:175 +0x8e
github.com/cilium/cilium/pkg/loadbalancer/reconciler.socketDestroyer.Destroy({0x134?}, {{0xc02280f49c, 0x4, 0x4}, 0x35, 0x2, 0x11, 0xc012db7a50})
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x93
/go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:93 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1276
Root Cause
In pkg/datapath/sockets/sockets.go, filterAndDestroyUDPSockets passes sockInfo.ID to the callback without checking for nil:
// v1.18.5 - line 137
func filterAndDestroyUDPSockets(family uint8, socketCB func(socket netlink.SocketID, err error)) error {
return iterateNetlinkSockets(unix.IPPROTO_UDP, family, 0xffff, func(sockInfo *Socket, err error) error {
socketCB(sockInfo.ID, err) // <-- sockInfo can be nil here
return nil
})
}However, iterateNetlinkSockets passes nil for sockInfo in multiple error paths (lines 301, 305, 309, 317):
// s.Receive() error
fn(nil, err)
// wrong sender PID
fn(nil, fmt.Errorf("Wrong sender portid %d, expected %d", from.Pid, nl.PidKernel))
// empty messages
fn(nil, errors.New("no message nor error from netlink"))
// NLMSG_ERROR
fn(nil, syscall.Errno(-error))On main, the function was renamed to filterAndDestroySockets but the same missing nil check exists.
Suggested Fix
Add a nil guard in the callback:
func filterAndDestroyUDPSockets(family uint8, socketCB func(socket netlink.SocketID, err error)) error {
return iterateNetlinkSockets(unix.IPPROTO_UDP, family, 0xffff, func(sockInfo *Socket, err error) error {
if sockInfo == nil {
return err
}
socketCB(sockInfo.ID, err)
return nil
})
}Impact
- All cilium-agent pods on affected nodes crash simultaneously when a node deletion triggers UDP socket cleanup and a netlink error occurs.
- Pods enter a crash loop since the same condition is hit on restart if the trigger persists.
- Affects v1.18.5 (confirmed), v1.18.7 (confirmed by code review), and
main(confirmed by code review).