Loadbalancer NLB Target Group health checks failing since upgrade to v1.16.0

### Is there an existing issue for this?

- [X] I have searched the existing issues

### Version

higher than v1.16.0 and lower than v1.17.0

### What happened?

I am running a cluster in AWS EKS with three nodes using the Ingress Controller to manage a network load balancer within AWS. After upgrading to 1.16.0, all health checks on the target groups began to fail. Strangely, replacing a node would allow the checks on port 80 for the new instance to return healthy, at least until the Cilium DS is restarted.

While troubleshooting the issue, I noticed that `cilium status` would periodically return the following error:
```
controller node-neighbor-link is failing since 4s (2x): unable to determine next hop IPv4 address for eth1 (<node_ip>): remote node IP is non-routable
unable to determine next hop IPv4 address for eth2 (<node_ip>): remote node IP is non-routable
unable to determine next hop IPv4 address for eth3 (<node_ip>): remote node IP is non-routable
```

I exec'd into the pod reporting this and ran `cilium-dbg status`  and only thing of note was this line:
```
Modules Health: Stopped(0) Degraded(3) OK(148)
```

Running `cilium-dbg status --verbose` showed that node-manager was degraded with the message `Failed node neighbor link update`. I have not been able to determine if the two issues are connected, but I have stripped down our values.yaml to the bare minimum of settings in order to see if our configuration is at fault, but have not been able to restore the health checks.

### How can we reproduce the issue?

1. Install Cilium with the following values.yaml file:

```yaml
egressMasqueradeInterfaces: eth0
eni:
  awsReleaseExcessIPs: true
  enabled: true
envoy:
  enabled: true
ingressController:
  enableProxyProtocol: false
  enabled: true
  loadbalancerMode: shared
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-alpn-policy: HTTP2Preferred
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <ssl-cert-arn>
      service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: ELBSecurityPolicy-TLS13-1-2-2001-06
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
      service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=false
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
ipam:
  mode: eni
ipv4NativeRoutingCIDR: <CIDR>
k8sServiceHost: <eks-endpoint>
k8sServicePort: 443
kubeProxyReplacement: true
routingMode: native
```


### Cilium Version

v1.16.0

### Kernel Version

5.10.219-208.866.amzn2.x86_64

### Kubernetes Version

v1.29.4-eks-036c24b

### Regression

v1.15.6

### Sysdump

_No response_

### Relevant log output

_No response_

### Anything else?

_No response_

### Cilium Users Document

- [x] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadbalancer NLB Target Group health checks failing since upgrade to v1.16.0 #34093

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loadbalancer NLB Target Group health checks failing since upgrade to v1.16.0 #34093

Description

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions