Skip to content

No observeability into failures at listener_filter level #12809

@howardjohn

Description

@howardjohn

I am trying to get some sort of observability into failures at the listener_filter level. Specifically, we see common issues where users have server first protocols, causing the inspectors to timeout (if timeout enabled) or hang (if no timeout set).

Config:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 10000
    listener_filters:
    - name: envoy.filters.listener.tls_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    - name: envoy.filters.listener.http_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.http_inspector.v3.HttpInspector
    listener_filters_timeout: 0s
    access_log:
    - name: envoy.access_loggers.file
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
        path: "/dev/stdout"
        format: "[%START_TIME%] listener %RESPONSE_CODE% %RESPONSE_FLAGS% %DOWNSTREAM_LOCAL_ADDRESS%
          %DOWNSTREAM_REMOTE_ADDRESS%\n"
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  host_rewrite: www.google.com
                  cluster: service_google
                typed_per_filter_config:
                  envoy.fault:
                    '@type': type.googleapis.com/envoy.config.filter.http.fault.v2.HTTPFault
                    abort:
                      http_status: 409
                      percentage:
                        numerator: 100
                        denominator: HUNDRED
          http_filters:
          - name: envoy.router
  clusters:
  - name: service_google
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    dns_refresh_rate: 10000s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_google
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: www.google.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.api.v2.auth.UpstreamTlsContext
        sni: www.google.com

Ran with docker run -p 10000:10000 -v $HOME/kube/local:/config --entrypoint envoy envoyproxy/envoy-dev -c /config/envoy.yaml -l trace --log-format-prefix-with-location 0 then nc localhost 10000.

Trace logs, with listener filter timeout:

[debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:172] tls inspector: recv: -1
[debug][conn_handler] [source/server/connection_handler_impl.cc:227] listener filter times out after 1000 ms

Trace logs, with timeout on client side (closes connection):

[debug][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:78] tls inspector: new connection accepted
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:172] tls inspector: recv: -1
[trace][filter] [source/extensions/filters/listener/tls_inspector/tls_inspector.cc:193] tls inspector: done: false
[trace][main] [source/common/event/dispatcher_impl.cc:170] item added to deferred deletion list (size=1)
[trace][main] [source/common/event/dispatcher_impl.cc:78] clearing deferred deletion list (size=1)

In both cases, we can see there are no access logs emitted.\

cc @kyessenov

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions