Skip to content

Cluster initialization issue. Envoy start could be blocked forever #13874

@lambdai

Description

@lambdai

If a cluster is warming during server start, a second update on this cluster may put the envoy in the state that start never happen.

Notes that another update must be "real" which means the config cannot be determined as duplicated config.

The full story is

  1. cluster A is warming up
  2. CDS deliver a new config on cluster A
  3. Envoy removes the old warming cluster [1], but the warming secondary cluster counter is not decreased[2]
  4. Envoy creates a new cluster A with the new config.

The counter not decrease will block the Envoy initialization, see istio/istio#28500 (comment)

I am not fully sure how bad is destroy-before-add, in LDS api this is explicitly forbidden since the last warming destroy will early signal the init manager that "all clusters are ready"

Also the entire log file could be found in the attached istio issue.

[1] init manager destroyed before init

2020-11-03T01:05:59.112926Z	debug	envoy upstream	cm init: adding: cluster=outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local primary=0 secondary=31
2020-11-03T01:05:59.112929Z	info	envoy upstream	cds: add/update cluster 'outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local'
2020-11-03T01:05:59.116583Z	debug	envoy upstream	initializing secondary cluster outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local
2020-11-03T01:05:59.124471Z	debug	envoy init	init manager Cluster outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local destroyed
2020-11-03T01:05:59.124483Z	debug	envoy upstream	add/update cluster outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local during init

[2] increase twice, decrease once

$ grep "secondary=" logs.txt |grep -B 1 "svc00-0-6-0.service-graph00"
2020-11-03T01:05:59.112393Z	debug	envoy upstream	cm init: adding: cluster=outbound|8080||svc00-0-5-0.service-graph00.svc.cluster.local primary=0 secondary=30
2020-11-03T01:05:59.112926Z	debug	envoy upstream	cm init: adding: cluster=outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local primary=0 secondary=31
--
2020-11-03T01:05:59.116213Z	debug	envoy upstream	cm init: adding: cluster=InboundPassthroughClusterIpv4 primary=0 secondary=34
2020-11-03T01:05:59.124538Z	debug	envoy upstream	cm init: adding: cluster=outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local primary=0 secondary=35
2020-11-03T01:05:59.126380Z	debug	envoy upstream	cm init: init complete: cluster=outbound|8080||svc00-0-6-0.service-graph00.svc.cluster.local primary=0 secondary=34

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions