egressgateway: fix initial reconciliation#18325
Conversation
kkourt
left a comment
There was a problem hiding this comment.
LGTM.
(Not a big fun of using Sleep, but I can't think of a better approach at this moment.)
| if manager.k8sCacheSyncedChecker.K8sCacheIsSynced() { | ||
| break | ||
| } | ||
|
|
There was a problem hiding this comment.
non-blocking comment: If there is a bug that does not allow the cache to be synced, we would get stuck in an infinite loop. Should we add an info message here? We can print it after N iterations if we think it would be too verbose.
There was a problem hiding this comment.
Looks like we are already logging this info in the watchers package:
cilium/pkg/k8s/watchers/watcher.go
Lines 391 to 398 in 2581084
so probably no need to log it also here
Agree 👍 the alternative would be to use a |
f04be2b to
91879e2
Compare
When a new egress gateway manager is created, it will wait for the k8s cache to be fully synced before running the first reconciliation. Currently the logic is based on the WaitUntilK8sCacheIsSynced method of the Daemon object, which waits on the k8sCachesSynced channel to be closed (which indicates that the cache has been indeed synced). The issue with this approach is that Daemon object is passed to the NewEgressGatewayManager method _before_ its k8sCachesSynced channel is properly initialized. This in turn causes the WaitUntilK8sCacheIsSynced method to never return. Since NewEgressGatewayManager must be called before that channel is initialized, we need to switch to a polling approach, where the k8sCachesSynced is checked periodically. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
91879e2 to
f9140cd
Compare
|
/test |
|
l4lb is failing consistently (will be fixed with #18370) marking as ready to merge |
When a new egress gateway manager is created, it will wait for the k8s
cache to be fully synced before running the first reconciliation.
Currently the logic is based on the WaitUntilK8sCacheIsSynced method
of the Daemon object, which waits on the k8sCachesSynced channel to be
closed (which indicates that the cache has been indeed synced).
The issue with this approach is that Daemon object is passed to
the NewEgressGatewayManager method before its k8sCachesSynced
channel is properly initialized. This in turn causes the
WaitUntilK8sCacheIsSynced method to never return.
Since NewEgressGatewayManager must be called before that channel is
initialized, we need to switch to a polling approach, where the
k8sCachesSynced is checked periodically.