Skip to content

envoy crash when test http3 upgrade #18160

@YaoZengzeng

Description

@YaoZengzeng

I tried to load test the http3 upgrade between two envoy proxies, the deploy model as:

fortio --http1--> client side envoy proxy --http3--> server side envoy proxy --http1--> envoy as application

The config of client side envoy proxy is

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9902
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10001
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite_literal: domain1.example.com
                  cluster: service_google
          http_filters:
          - name: envoy.filters.http.router
  clusters:
  - name: service_google
    connect_timeout: 30s
    type: LOGICAL_DNS
    # Comment out the following line to test on v6 networks
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_google
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 10000
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http3_protocol_options: {}
        common_http_protocol_options:
          idle_timeout: 1s
    transport_socket:
      name: envoy.transport_sockets.quic
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.quic.v3.QuicUpstreamTransport
        upstream_tls_context:
          sni: proxy-postgres-backend.example.com
          common_tls_context:
            validation_context:
              match_subject_alt_names:
              - exact: proxy-postgres-backend.example.com
              trusted_ca:
                filename: certs/cacert.pem

The config of server side envoy proxy is:

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
  - name: listener_tcp
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificates:
            - certificate_chain:
                filename: certs/servercert.pem
              private_key:
                filename: certs/serverkey.pem
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: HTTP2
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              response_headers_to_add:
              - header:
                  key: alt-svc
                  value: h3=":10000"; ma=86400, h3-29=":10000"; ma=86400
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite_literal: www.envoyproxy.io
                  cluster: service_envoyproxy_io
          http3_protocol_options:
          http_filters:
          - name: envoy.filters.http.router

  - name: listener_udp
    address:
      socket_address:
        protocol: UDP
        address: 0.0.0.0
        port_value: 10000
    udp_listener_config:
      quic_options: {}
      downstream_socket_config:
        prefer_gro: true
    filter_chains:
    - transport_socket:
        name: envoy.transport_sockets.quic
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.transport_sockets.quic.v3.QuicDownstreamTransport
          downstream_tls_context:
            common_tls_context:
              tls_certificates:
    udp_listener_config:
      quic_options: {}
      downstream_socket_config:
        prefer_gro: true
    filter_chains:
    - transport_socket:
        name: envoy.transport_sockets.quic
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.transport_sockets.quic.v3.QuicDownstreamTransport
          downstream_tls_context:
            common_tls_context:
              tls_certificates:
              - certificate_chain:
                  filename: certs/servercert.pem
                private_key:
                  filename: certs/serverkey.pem
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: HTTP3
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  #host_rewrite_literal: www.google.com
                  cluster: service_envoyproxy_io
          http3_protocol_options:
          http_filters:
          - name: envoy.filters.http.router
  clusters:
  - name: service_envoyproxy_io
    connect_timeout: 30s
    type: LOGICAL_DNS
    # Comment out the following line to test on v6 networks
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_envoyproxy_io
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                #address: www.google.com
                address: 127.0.0.1
                port_value: 8080
    #transport_socket:
    #  name: envoy.transport_sockets.tls
    #  typed_config:
    #    "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
    #    sni: www.google.com

The envoy version is 1.20 and with --concurrency of 1.

If I test with follow command (10 connections with no qps limit):

fortio load -qps -1 -c 10 -t 10s --timeout 120s http://127.0.0.1:10001

Envoy won't crash but the server side proxy use almost 1 core and client side only use 0.1 core, then the result qps is poor.

But for test with 100 connections, the client side envoy would crash, the log as follow:

[2021-09-17 09:52:05.915][16036][info][main] [source/server/server.cc:803] all clusters initialized. initializing init manager
[2021-09-17 09:52:05.915][16036][info][config] [source/server/listener_manager_impl.cc:779] all dependencies initialized. starting workers
[2021-09-17 09:52:05.916][16036][info][main] [source/server/server.cc:822] starting main dispatch loop
[2021-09-17 09:52:11.291][16043][info][quic] [bazel-out/k8-opt/bin/external/com_github_google_quiche/quiche/quic/core/tls_client_handshaker.cc:463] Client: handshake finished
[2021-09-17 09:52:15.917][16036][info][main] [source/server/drain_manager_impl.cc:171] shutting down parent after drain
[2021-09-17 09:52:29.290][16043][info][quic] [bazel-out/k8-opt/bin/external/com_github_google_quiche/quiche/quic/core/tls_client_handshaker.cc:463] Client: handshake finished
[2021-09-17 09:53:59.468][16043][info][quic] [bazel-out/k8-opt/bin/external/com_github_google_quiche/quiche/quic/core/tls_client_handshaker.cc:463] Client: handshake finished
[2021-09-17 09:53:59.719][16043][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x0
[2021-09-17 09:53:59.719][16043][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2021-09-17 09:53:59.719][16043][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: cc1d41e7ee9fbfb7ee3c8f73724cdc41d7c6bbb0/1.20.0-dev/Clean/RELEASE/BoringSSL
[2021-09-17 09:53:59.720][16043][critical][backtrace] [./source/server/backtrace.h:96] #0: __restore_rt [0x7f7e471c6980]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x55c355aa1247]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #2: [0x55c355a445b3]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #3: [0x55c355a47fcc]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #4: [0x55c355a48e34]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #5: [0x55c355a442cf]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #6: [0x55c355c3b9d6]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #7: [0x55c355c41864]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #8: [0x55c355b8496e]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #9: [0x55c355b764a5]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #10: [0x55c355bba357]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #11: [0x55c355bb7af7]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #12: [0x55c355bc017f]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #13: [0x55c355eef897]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #14: [0x55c355bbfbaf]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #15: [0x55c355bb5f44]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #16: [0x55c355bb56df]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #17: [0x55c355bb6185]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #18: [0x55c355b7203c]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #19: [0x55c355e0b6ff]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #20: [0x55c355e04afa]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #21: [0x55c355e02649]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #22: [0x55c355df39b1]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #23: [0x55c355df4c5c]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #24: [0x55c355efbb78]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #25: [0x55c355efa571]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #26: [0x55c35581d362]
[2021-09-17 09:53:59.762][16043][critical][backtrace] [./source/server/backtrace.h:98] #27: [0x55c35613aa73]
[2021-09-17 09:53:59.763][16043][critical][backtrace] [./source/server/backtrace.h:96] #28: start_thread [0x7f7e471bb6db]
ActiveStream 0x5e05bf517e00, stream_id_: 9246261070474838081&filter_manager_: 
  FilterManager 0x5e05bf517e78, state_.has_continue_headers_: 0
  filter_manager_callbacks_.requestHeaders(): 
    ':authority', 'domain1.example.com'
    ':path', '/'
    ':method', 'GET'
    ':scheme', 'http'
    'user-agent', 'fortio.org/fortio-dev'
    'x-forwarded-proto', 'http'
    'x-request-id', '00ce776b-f1ba-43d4-88c8-e10942fd7e69'
    'x-envoy-expected-rq-timeout-ms', '15000'
  filter_manager_callbacks_.requestTrailers():   null
  filter_manager_callbacks_.responseHeaders():   null
  filter_manager_callbacks_.responseTrailers():   null
  &stream_info_: 
    StreamInfoImpl 0x5e05bf517f78, upstream_connection_id_: null, protocol_: 1, response_code_: null, response_code_details_: null, attempt_count_: 1, health_check_request_: 0, route_name_: 
    OverridableRemoteConnectionInfoSetterStreamInfo 0x5e05bf517f78, remoteAddress(): 127.0.0.1:37496, directRemoteAddress(): 127.0.0.1:37496, localAddress(): 127.0.0.1:10001
Http1::ConnectionImpl 0x5e05bf44d508, dispatching_: 1, dispatching_slice_already_drained_: 0, reset_stream_called_: 0, handling_upgrade_: 0, deferred_end_stream_headers_: 1, require_strict_1xx_and_204_headers_: 1, send_strict_1xx_and_204_headers_: 1, processing_trailers_: 0, no_chunked_encoding_header_for_304_: 1, buffered_body_.length(): 0, header_parsing_state_: Done, current_header_field_: , current_header_value_: 
active_request_: 
, request_url_: null, response_encoder_.local_end_stream_: 0
absl::get<RequestHeaderMapPtr>(headers_or_trailers_): null
current_dispatching_buffer_ front_slice length: 76 contents: "GET / HTTP/1.1\r\nHost: 127.0.0.1:10001\r\nUser-Agent: fortio.org/fortio-dev\r\n\r\n"
ConnectionImpl 0x5e05bf465140, connecting_: 0, bind_error_: 0, state(): Open, read_buffer_limit_: 1048576
socket_: 
  ListenSocketImpl 0x5e05bf7a8900, transport_protocol_: raw_buffer
  connection_info_provider_: 
    ConnectionInfoSetterImpl 0x5e05bf735b18, remote_address_: 127.0.0.1:37496, direct_remote_address_: 127.0.0.1:37496, local_address_: 127.0.0.1:10001, server_name_: 
Segmentation fault (core dumped)

Any ideas? cc @alyssawilk :)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions