Skip to content

Envoy is stuck in PRE_INITIALIZING state when management server doesn't respond #5862

@MarcinFalkowski

Description

@MarcinFalkowski

This is a follow up to a previous issue: #5622
First of all, thanks for that fix. It did help for the case, when management server doesn't accept TCP connection.

Hovewer, the problem still exists when management server accepts the connection, but doesn't send any response.
Envoy will be in PRE_INITIALIZING state forever in that case.

We tried to set timeout parameter (https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/grpc_service.proto#core-grpcservice), but it changes nothing.

The problem can be easily reproduced by replacing the management server with nc:

  1. Run nc in listening mode nc -l 50000
  2. Run envoy with config presented at the end of the post.
  3. In envoy admin, you will see that envoy is in PRE_INITIALIZING state forever
  4. In nc output you will see, that envoy sent xDS request.

The real world scenario for this is a heavily loaded management server, that manages to accept TCP connection, but fails to deliver response in acceptable time.

I think it would be nice to be able to configure Envoy in a way that it would be immune to any problem with management server. It should work with static-config-only in that case.

Tested on commit f9107b2

node:
  id: "node-1"
  cluster: "nodes"
static_resources:
  listeners:
  - name: full_static_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 7000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: full_static_http
          route_config:
            name: full_static_routes
            virtual_hosts:
            - name: route
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: "static_service"
          http_filters:
          - name: envoy.router
  clusters:
  - name: static_service
    type: STATIC
    hosts:
    - socket_address:
        address: 127.0.0.1
        port_value: 8080
    connect_timeout: 1s
  - name: xds
    type: static
    hosts:
    - socket_address:
        address: 192.168.65.2  # netcat listen on this address
        port_value: 50000
    connect_timeout: 1s
    http2_protocol_options: {}

dynamic_resources:
  cds_config:
    api_config_source:
      api_type: GRPC
      grpc_services:
      - envoy_grpc:
          cluster_name: xds
        timeout: 1s

admin:
  access_log_path: "/dev/null"
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 6000

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests. Not bugs or questions.help wantedNeeds help!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions