-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
When creating clusters that reference SDS certificates, the warming behavior does not seem correct. My expectation is that until the secret is sent, the cluster will be marked as "warming" until the initial_fetch_timeout, and block the rest of initialization from occuring.
What I am actually seeing is initialization is blocked, but there is nothing indicating the clusters are warming.
Using this config:
docker run -v $HOME/kube/local:/config -p 15000:15000 envoyproxy/envoy-dev -c /config/envoy-sds-lds.yaml --log-format-prefix-with-location 0 --reject-unknown-dynamic-fields
with envoy version: 49efb9841a58ebdc43a666f55c445911c8e4181c/1.15.0-dev/Clean/RELEASE/BoringSSL
and config files:
cds.yaml:
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: outbound_cluster_tls
connect_timeout: 5s
max_requests_per_connection: 1
load_assignment:
cluster_name: xds-grpc
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8080
type: STATIC
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
common_tls_context:
tls_certificate_sds_secret_configs:
- name: "default"
sds_config:
initial_fetch_timeout: 20s
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: "sds-grpc"
refresh_delay: 60s
combined_validation_context:
default_validation_context: {}
validation_context_sds_secret_config:
name: ROOTCA
sds_config:
initial_fetch_timeout: 20s
api_config_source:
api_type: GRPC
grpc_services:
- envoy_grpc:
cluster_name: sds-grpc envoy-sds-lds.yaml:
admin:
access_log_path: /dev/null
address:
socket_address:
address: 0.0.0.0
port_value: 15000
node:
id: id
cluster: sdstest
dynamic_resources:
lds_config:
api_config_source:
api_type: GRPC
grpc_services:
envoy_grpc:
cluster_name: lds
cds_config:
path: /config/cds.yaml
static_resources:
clusters:
- name: sds-grpc
type: STATIC
http2_protocol_options: {}
connect_timeout: 5s
lb_policy: ROUND_ROBIN
- name: lds
type: STATIC
http2_protocol_options: {}
connect_timeout: 5s
lb_policy: ROUND_ROBINBasically what should happen here is we get a dynamic CDS cluster with SDS config. This SDS config fails, as the sds server is not setup. We have initial_fetch_timeout, so for 20s everything should be warming.
What we see instead:
- Stats are not showing warming:
cluster_manager.cds.init_fetch_timeout: 0
cluster_manager.cds.update_attempt: 1
cluster_manager.cds.update_failure: 0
cluster_manager.cds.update_rejected: 0
cluster_manager.cds.update_success: 1
cluster_manager.cds.update_time: 1588972075968
cluster_manager.cds.version: 17241709254077376921
cluster_manager.cluster_added: 3
cluster_manager.cluster_modified: 0
cluster_manager.cluster_removed: 0
cluster_manager.cluster_updated: 0
cluster_manager.cluster_updated_via_merge: 0
cluster_manager.update_merge_cancelled: 0
cluster_manager.update_out_of_merge_window: 0
cluster_manager.warming_clusters: 0
We also see init_fetch_timeout is 0; this does not change after 20s
- LDS is not requested until 20s later, indicating the initial_fetch_timeout is respected. This can be seen in the logs:
(note - for simple testing I don't have a real LDS server, but we can see its not even attempted until 20s in)
[2020-05-08 21:07:55.967][1][info][upstream] cds: add 1 cluster(s), remove 2 cluster(s)
[2020-05-08 21:07:55.968][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:55.968][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:55.968][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:55.968][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:55.968][1][info][upstream] cds: add/update cluster 'outbound_cluster_tls'
[2020-05-08 21:07:55.968][1][info][main] starting main dispatch loop
[2020-05-08 21:07:56.703][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:56.703][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:56.938][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:56.938][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:57.135][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:57.135][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:57.682][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:57.682][1][warning][config] Unable to establish new stream
[2020-05-08 21:07:58.671][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:07:58.671][1][warning][config] Unable to establish new stream
[2020-05-08 21:08:08.992][1][warning][config] StreamSecrets gRPC config stream closed: 14, no healthy upstream
[2020-05-08 21:08:08.992][1][warning][config] Unable to establish new stream
[2020-05-08 21:08:15.967][1][info][upstream] cm init: all clusters initialized
[2020-05-08 21:08:15.967][1][info][main] all clusters initialized. initializing init manager
[2020-05-08 21:08:15.967][1][warning][config] StreamListeners gRPC config stream closed: 14, no healthy upstream
dynamic_active_clustersshows the cluster in cds.yaml. I would expect it to be "warming".
This example above is meant to simplify it, I have originally seen this with a normal deployment using ADS gRPC server (Istio) not just files.