Skip to content

hubble: fix nil pointer dereference in Peer.Equal() when one Address is nil#45021

Merged
tklauser merged 1 commit intocilium:mainfrom
pesarkhobeee:fix/hubble-relay-peer-nil-panic
Mar 31, 2026
Merged

hubble: fix nil pointer dereference in Peer.Equal() when one Address is nil#45021
tklauser merged 1 commit intocilium:mainfrom
pesarkhobeee:fix/hubble-relay-peer-nil-panic

Conversation

@pesarkhobeee
Copy link
Copy Markdown

@pesarkhobeee pesarkhobeee commented Mar 27, 2026

Summary

Peer.Equal() in pkg/hubble/peer/types/peer.go panics with a nil pointer dereference when one peer has a nil Address and the other does not. This causes hubble-relay to crash loop (exit code 2) in environments with dynamic node pools.

Root cause

The original expression:

addrEq := (p.Address == nil && o.Address == nil) ||
    (p.Address.String() == o.Address.String() && p.Address.Network() == o.Address.Network())

When only one Address is nil, the first condition evaluates to false, so Go evaluates the second branch and calls .String() / .Network() on the nil net.Addr interface — panic.

Fix

Replace the single boolean expression with a switch statement that explicitly handles three cases: both nil, one nil (mixed), and both non-nil.

Trigger scenario

In GKE with Node Auto-Provisioning (or any environment with node churn — autoscaling, spot/preemptible nodes), the hubble peer service fires multiple rapid PEER_ADDED notifications for the same peer. When the peer address is unresolvable (e.g. connection refused), FromChangeNotification() returns a Peer with Address: nil. This nil-address Peer is then passed to Peer.Equal() during PeerManager.upsert() in watchNotifications(), triggering the panic.

Panic stack trace

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1e7380e]

goroutine 37 [running]:
github.com/cilium/cilium/pkg/hubble/peer/types.Peer.Equal(
    {{0xc00005ae40, 0x33}, {0x0, 0x0}, 0x1, {0xc000dac060, 0x51}},
    {{0xc00005b540, 0x33}, {0x29b9368, ...}, ...}
    /go/src/github.com/cilium/cilium/pkg/hubble/peer/types/peer.go:105 +0x6e
github.com/cilium/cilium/pkg/hubble/relay/pool.(*PeerManager).upsert(...)
    /go/src/github.com/cilium/cilium/pkg/hubble/relay/pool/manager.go:281 +0x16a
github.com/cilium/cilium/pkg/hubble/relay/pool.(*PeerManager).watchNotifications(...)
    /go/src/github.com/cilium/cilium/pkg/hubble/relay/pool/manager.go:152 +0x935

Impact

  • hubble-relay enters a permanent crash loop (exit code 2)
  • All Hubble observability is lost
  • Any environment with node churn is affected
  • Bug is present in all released versions (verified up to v1.19.2)

Test plan

  • Added test case "one address nil other non-nil" to TestEqual — verifies no panic and correct false result
  • All 15 existing + new tests pass: go test ./pkg/hubble/peer/types/ -v
  • Verified the fix resolves the crash on a live GKE cluster with NAP (retina-hubble chart v1.1.0)
Fix panic in Hubble Relay when new peer address is unresolvable

@pesarkhobeee pesarkhobeee requested a review from a team as a code owner March 27, 2026 15:12
@pesarkhobeee pesarkhobeee requested a review from devodev March 27, 2026 15:12
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 27, 2026
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 27, 2026
@devodev devodev added kind/bug This is a bug in the Cilium logic. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 27, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 27, 2026
@devodev devodev added release-note/bug This PR fixes an issue in a previous release of Cilium. area/hubble Impacts hubble server or relay labels Mar 27, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Mar 27, 2026
@devodev devodev added this to the 1.20-feature-freeze milestone Mar 27, 2026
@devodev
Copy link
Copy Markdown
Contributor

devodev commented Mar 27, 2026

/test

Copy link
Copy Markdown
Contributor

@devodev devodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great!

@devodev devodev added needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 27, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 28, 2026
…is nil

When a peer has a nil Address field and is compared to a peer with a
non-nil Address, Peer.Equal() panics with a nil pointer dereference.
The original boolean expression short-circuits only when both addresses
are nil; when just one is nil, it falls through to call .String() and
.Network() on the nil net.Addr interface value.

This causes hubble-relay to crash loop (exit code 2) in environments
with dynamic node pools (e.g. GKE NAP, spot/preemptible nodes) where
rapid PEER_ADDED notifications can produce Peer structs with nil
Address fields during the upsert path in PeerManager.watchNotifications().

Replace the single boolean expression with a switch statement that
explicitly handles three cases: both nil, one nil, and both non-nil.
Add a test case covering the mixed nil/non-nil scenario.

Signed-off-by: pesarkhobeee <ahmadian.farid.1988@gmail.com>
@pesarkhobeee pesarkhobeee force-pushed the fix/hubble-relay-peer-nil-panic branch from 981c36a to 4429b3c Compare March 28, 2026 07:04
@pesarkhobeee
Copy link
Copy Markdown
Author

Thanks @devodev for your attention and fast reply, I had to amend my commit to fix the Signed-off-by check failure
(https://github.com/cilium/cilium/actions/runs/23653244555/job/68969405897), my fork's author email didn't match the sign-off

@HadrienPatte
Copy link
Copy Markdown
Member

/test

@tklauser tklauser added this pull request to the merge queue Mar 31, 2026
Merged via the queue into cilium:main with commit e97dce9 Mar 31, 2026
78 of 79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hubble Impacts hubble server or relay kind/bug This is a bug in the Cilium logic. kind/community-contribution This was a contribution made by a community member. needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants