-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
I did a cursory walk through the code in charge of activating HCs and I don't think it waits for secrets to be ready.
Furthermore, the initial_fetch_timeout broadly refers to the config gRPC stream becoming active (not necessarily to the actual payloads being received):
https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst
So for clusters where we don't set an initial_jitter we hit upstream_context_secrets_not_ready during hot restarts and we also see failed upstream health checks. I didn't test this for cold starts -- though it might be an issue there too.
I think we should either:
a) not activate HCs until the needed secrets are available
or
b) document the async nature of SDS and how setting initial_jitter might be necessary to workaround a race between the first HCs and secrets becoming available (we could add this to https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst
)
Thoughts?