fix: reallocate envoy port on binding failure#42859
Conversation
|
/test |
|
Commit 3e33e37 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
3e33e37 to
158060f
Compare
|
Oops, I missed the test code for the newly added function. I have re-verified the previously failing test suites:
Could you please re-run the tests to confirm everything is working correctly? @pchaigno |
|
/test |
|
@pchaigno Successful run: https://github.com/inerplat/cilium/actions/runs/19655926227 The error appears to be a transient failure. Could you please re-run the failed job? Failed job: https://github.com/cilium/cilium/actions/runs/19645818650/job/56260701695 Thanks! |
|
@joamaki Could you take a look, especially if the Listener resource mutation (with the new proxy port) happens in the right place w.r.t. statedb read-only data? |
|
/test |
f279633 to
de93437
Compare
|
@jrajahalme |
|
/test |
1 similar comment
|
/test |
jrajahalme
left a comment
There was a problem hiding this comment.
Looking good, only small nits left. After addressing these, please squash the commits together to retain cleaner commit history when this is merged (we do not squash automatically on merge).
de93437 to
bf88d8d
Compare
bf88d8d to
8194511
Compare
|
@jrajahalme Thanks for the review. I have addressed all the comments and squashed the commits as requested. Regarding the logging concern:
I verified the behavior and confirmed that the returned error is not logged to stdout/stderr by the caller. Instead, I observed that the error is persisted in Here are the verification results from my local reproduction: Case 1: Port conflict triggers reallocation (Success) Case 2: Port range exhaustion (Error Persistence) Agent Logs: StateDB Verification (cilium-dbg): $ cilium-dbg shell -- db/show envoy-resources
Name Listeners Endpoints References Status Since Error
# ...
cec:kube-system/cilium-ingress kube-system/cilium-ingress/listener Error 10s failed to reallocate ports after binding failure: failed to reallocate proxy port for listener kube-system/cilium-ingress/listener: no available proxy ports (original error: NACK received: Error adding/updating listener(s) kube-system/cilium-ingress/listener: cannot bind '127.0.0.1:10000': Address already in use)
$ cilium-dbg statedb | grep failed
{"ID":{"Module":["agent","controlplane","ciliumenvoyconfig"],"Component":["job-reconcile"]},"Level":"Degraded","Message":"1 error(s)","Error":"failed to reallocate ports after binding failure: failed to reallocate proxy port for listener kube-system/cilium-ingress/listener: no available proxy ports (original error: NACK received: Error adding/updating listener(s) kube-system/cilium-ingress/listener: cannot bind '127.0.0.1:10000': Address already in use\n)","LastOK":"0001-01-01T00:00:00Z","Updated":"2026-01-24T04:19:31.152330937Z","Stopped":"0001-01-01T00:00:00Z","Final":"","Count":165},
{"Name":{"Origin":"cec","Cluster":"","Namespace":"kube-system","Name":"cilium-ingress"},"Status":{"updated-at":"2026-01-24T04:19:31.150824728Z","error":"failed to reallocate ports after binding failure: failed to reallocate proxy port for listener kube-system/cilium-ingress/listener: no available proxy ports (original error: NACK received: Error adding/updating listener(s) kube-system/cilium-ingress/listener: cannot bind '127.0.0.1:10000': Address already in use\n)","id":543,"kind":"Error"}
# ... |
Signed-off-by: DH Kim <inerplat@gmail.com>
8194511 to
eceb8de
Compare
|
/test |
Fixes: #42858
This PR adds:
Port binding error detection (
isPortBindingError): Detects port binding failures by checking error messages for common indicators like "cannot bind", "address already in use", and "eaddrinuse".Port reallocation function (
AllocateCRDProxyPortWithReallocate): Forces reallocation of a new port by resetting bothProxyPortandrulesPortwhenforceReallocateis true, ensuring a truly new random port is allocated.Retry logic (
retryWithNewPorts): When a port binding failure is detected:Integration (
Updatemethod): Integrates the retry logic into the Envoy reconciler's update flow.Testing