ads: ensure ads refuses connections when caches are not synced#20296
ads: ensure ads refuses connections when caches are not synced#20296istio-testing merged 3 commits intoistio:masterfrom
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test? Courtesy of your friendly test nag. |
howardjohn
left a comment
There was a problem hiding this comment.
shouldn't we mark ourselves as not ready rather than refuse connections?
This is just to reject local Envoy connection. We are not marked ready till cache sync is done so external Envoys can not connect. But by allowing local pilot to connect we are initializing push context with empty config. |
|
And Pilot's envoy does not care about readiness because it connects via localhost. |
|
But no one connects to the local proxy until it's ready? As far as I know the local proxy is 100% static so it's not connecting to pilot for xds, is that not true? |
|
I'm fine with having a second mechanism to catch weird cases but I think we need to enhance the readiness probe for the normal case, unless I misunderstood |
|
I am phone to verify what Istio’s master template does, but in our case local proxy definitely connects to pilot and as I mentioned in the issue it definitely connected to Pilot before caches synced up. |
|
I see, I didn't realize you had a custom setup.
to me it seems like this is the part we should be fixing. shouldn't it be creating a new context? I'm on a phone so this is from memory though, maybe that doesn't make sense |
|
/cc i will have a look |
pilot/pkg/bootstrap/server.go
Outdated
| } | ||
|
|
||
| // grpcServer is shared by Galley, CA, XDS - must Serve at the end, but before 'wait' | ||
| log.Infof("starting gRPC discovery service at %s", s.GRPCListener.Addr()) |
There was a problem hiding this comment.
move into the below go func?
There was a problem hiding this comment.
yeah. makes sense. Moved this and the below http log line in to go func
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
I don't think this was a good idea to merge. Should we fix the real problem instead of hacking around it? |
I could not think of better solution immediately than what is here in this PR and regarding your question about
This is a global context so I guess it is assumed to be created once. I do not know of the implications of creating a new context/updating it every time apart from obvious perf reason (which in it self may be good enough reason to not create a new context). But if you have any better ideas, please let me know - I am happy to implement that and revert this change. If you do not have any better ideas immediately, I am OK open a new issue that we can track to improve this if you think that is better to do. Please let me know. |
| // OnServerReady is called when caches have been synced so that it can process requests. | ||
| func (s *DiscoveryServer) OnServerReady() { | ||
| s.updateMutex.Lock() | ||
| s.serverReady = true |
There was a problem hiding this comment.
Should we just recreate the context here?
There was a problem hiding this comment.
Yeah, that seems to be an option - We can add RefreshContext method in PushContext that just reloads every thing without checking if init is done or not, and we can call it once caches have been synced and before we declare our self ready.
Only downside of it is pilot's local proxy would get incomplete config in the first call but would immediately get refreshed because of config change and since it is any way starting afresh, it should be OK.
If this is an option, I can make that change.
@hzxuzhonghu What do you think about this?
There was a problem hiding this comment.
Sorry, i could not catch up, what do you mean by context here?
There was a problem hiding this comment.
PushContext - recreate the PushContext once caches have been synced
There was a problem hiding this comment.
#20323 - This is what he is suggesting. PTAL if it is OK or it causes any issues
There was a problem hiding this comment.
Got it. Then we can do this: trigger full push to all connected clients here.
There was a problem hiding this comment.
But there is one concern: there may be race condition between the first xds request/push and the new push here.
There was a problem hiding this comment.
There wont be any new clients here except for local pilot proxy because Pilot is not ready it self here, it will only be ready after this line. Regarding pilot's proxy, I guess it would get updates because of regular configupdate push and since it is any way starting now, It should not be an issue.
Fixes #20295
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[X ] Networking
[ ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure