Skip to content

Should HC activation be delayed until needed secrets are available? #12389

@rgs1

Description

@rgs1

I did a cursory walk through the code in charge of activating HCs and I don't think it waits for secrets to be ready.

Furthermore, the initial_fetch_timeout broadly refers to the config gRPC stream becoming active (not necessarily to the actual payloads being received):

https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst

So for clusters where we don't set an initial_jitter we hit upstream_context_secrets_not_ready during hot restarts and we also see failed upstream health checks. I didn't test this for cold starts -- though it might be an issue there too.

I think we should either:

a) not activate HCs until the needed secrets are available

or

b) document the async nature of SDS and how setting initial_jitter might be necessary to workaround a race between the first HCs and secrets becoming available (we could add this to https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst
)

Thoughts?

cc: @mattklein123 @fishcakez

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions