lb: BPF reconciler panics with nil pointer dereference on nil slot during ResetAndRestore

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Version

lower than v1.19.0

### What happened?

The Cilium agent crashes with a nil pointer dereference in `BPFOps.ResetAndRestore()` when iterating over quarantined backend slots during BPF map restoration.

In `pkg/loadbalancer/reconciler/bpf_reconciler.go`, the loop that processes quarantined backend slots:

```go
for _, slot := range slots[1+master.GetCount():] {
    if addr, found := backendIDToAddress[slot.GetBackendID()]; found {
        backends.Insert(addr)
    }
}
```

does not guard against nil entries. The crash occurs during agent startup or restore, making it especially disruptive since the agent cannot recover without the clearing the bpf map.


### How can we reproduce the issue?

Haven't been able to replicate the issue on an actual cluster but have a set of steps which I think will lead to this issue. Basically need to have a bpf map state where the `slotID` for the backends are incorrectly published.
1. Service has active + terminating backends. A service with 2 active and 2 terminating backends is reconciled. BPF maps Slots 1 and 2 hold active backends, slots 3 and 4 hold quarantined (terminating) backends.
2. Agent crashes and restarts. All in-memory state is lost. The BPF map survives in the kernel. It reads the BPF map and populates quarantined backend with the two terminating backends from slot 3 and slot 4.
2. At the same time of step 2. Service scales up with 2 ready and 1 not ready pod. New active backends and a not-ready pod (ready as false and terminating as false -> Maintenance state in cilium agent in memory) are added. Slot 3 and slot 4 pods are still terminating.
3. First reconciliation fires, `slotID` being incrementally allocated in v1.18(i.e `i+1`), agent skips the maintenance state backend but `i` still increments and writes the other backend with a gap in `slotID`(i.e, `1, 2, 3, 4, 6, 7` instead of `1,2,3,4,5,6`). While internal reference state still references the maintenance backend.
4. Second reconciliation now sees the difference in the map state and the cilium backend reference and deletes the highest slot from the map.
5. Now when the agent restarts, it tries to access slot 5 during the reconciliation loop but panic because it does not exist. And never recovers.

Why it does not happen more often ?
The check in the reconciler `len(slots) == 1 + count + qcount)` is what saves most restarts. After the first buggy reconciliation (step 3), the map has an extra slot (7) beyond what the master stores as the total count, so the guard fails (8 ≠ 7) and the quarantine loop is safely skipped. It takes the second reconciliation (step 4) to delete that extra slot and make the math line up. So you need the specific sequence to hit the panic.

### Cilium Version

v1.18.6

### Kernel Version

Not able to get from the cluster


### Kubernetes Version

v1.34

### Regression

_No response_

### Sysdump

_No response_

### Relevant log output

```shell
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x34fed87]
 
goroutine 1 [running]:
github.com/cilium/cilium/pkg/loadbalancer/reconciler.(*BPFOps).ResetAndRestore(0xc000d7cd20)
	/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/bpf_reconciler.go:313 +0xde7
github.com/cilium/cilium/pkg/loadbalancer/reconciler.(*BPFOps).start(...)
	/go/src/github.com/cilium/cilium/pkg/loadbalancer/reconciler/bpf_reconciler.go:226
github.com/cilium/hive/cell.Hook.Start(...)
	/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/cell/lifecycle.go:43
github.com/cilium/hive/cell.(*DefaultLifecycle).Start(0xc0008751a0, 0xc000486ff0, {0x50f16e0?, 0xc001c3e000?})
	/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/cell/lifecycle.go:128 +0x2fd
github.com/cilium/hive.(*Hive).Start(0xc0009b1b30, 0xc000486ff0, {0x50f16e0, 0xc001c3e000})
	/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/hive.go:359 +0x131
github.com/cilium/hive.(*Hive).Run(0xc0009b1b30, 0xc000486ff0)
	/go/src/github.com/cilium/cilium/vendor/github.com/cilium/hive/hive.go:231 +0x85
github.com/cilium/cilium/daemon/cmd.NewAgentCmd.func1(0xc000304f08, {0x49c6cfe?, 0x4?, 0x49c6b9a?})
	/go/src/github.com/cilium/cilium/daemon/cmd/root.go:52 +0x1f9
github.com/spf13/cobra.(*Command).execute(0xc000304f08, {0xc0001be110, 0x1, 0x1})
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:1019 +0xa91
github.com/spf13/cobra.(*Command).ExecuteC(0xc000304f08)
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:1148 +0x46f
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:1071
github.com/cilium/cilium/daemon/cmd.Execute(0x4c29a38?)
	/go/src/github.com/cilium/cilium/daemon/cmd/root.go:90 +0x13
main.main()
	/go/src/github.com/cilium/cilium/daemon/main.go:15 +0x1f
```

### Anything else?

_No response_

### Cilium Users Document

- [x] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lb: BPF reconciler panics with nil pointer dereference on nil slot during ResetAndRestore #44896

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

lb: BPF reconciler panics with nil pointer dereference on nil slot during ResetAndRestore #44896

Description

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions