Skip to content

Envoy crash while using TCP socket tapping of upstream cluster #13608

@jphx

Description

@jphx

Since this problem involves a crash, I originally reported it to the envoy-security@googlegroups.com mailing list. Since the traffic tapping capability is for debugging, however, I was asked to report it here.

I'm trying out Envoy's TCP socket traffic tapping capability. With a fairly simple configuration in an upstream cluster definition, Envoy crashes when I submit an HTTP request to the cluster. Here's the upstream cluster's tapping configuration:

"transport_socket": {
   "name": "envoy.transport_sockets.tap",
   "typed_config": {
     "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tap.v3.Tap",
     "common_config": {
       "static_config": {
         "match_config": {
           "any_match": true
         },
         "output_config": {
           "sinks": [
             {
               "file_per_tap": {
                 "path_prefix": "/tmp/mtu.hserver-multi-tenant-upstream.cerberus"
               }
             }
           ],
           "streaming": true
         }
       }
     },
     "transport_socket": {
       "name": "envoy.transport_sockets.tls"
     }
   }
 },

If I remove streaming: true, it works as expected, but with streaming: true, it crashes whenever an HTTP request is submitted. I activated Envoy's trace-level debugging just before submitting the request. I'm attaching the part of the log starting at the first message at trace level (file crash-log.txt). Unfortunately, the log includes some processing from a Kubernetes readiness probe that happened to be going on at the same time, but you can ignore that. The request that triggered the crash is the GET to /http-serving/this/is/a/test. The stack trace is not particularly helpful, though, due to this problem. I've reproduced this crash with Envoy versions 1.15.0, 1.15.2, and 1.16.0.

To get a better stack trace of the failure, I built a debug version of Envoy 1.15.2 and put that executable into my docker image. With that executable, a better stack trace is produced by Envoy when the failure occurs. I've attached that stack trace (file stack-trace.txt). I also managed to capture a core file from that failure, and with some effort I was able to get gdb to produce a stack trace as well, including the values of local variables, in case that helps (file gdb-backtrace-with-locals.txt). I've attached that file too.

debug-data.tar.gz

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions