Improve Envoy crash logging

While core dumps are often better for in-depth debugging we've found that a high percentage of bugs can be debugged a bit more quickly with a combination of stack trace and information about the stream which caused the crash.

To start with I'd love to replicate what we have for L7 debugging in-house, where we track and dump active session.   Essentially each active stream in each worker thread registers itself with a scoped thread-local object on all dispatcher entry points, and the crash signal handler logs a bunch of information about active stream on segfault.  This is super useful for debug but for consistent state dumping, every alarm and IO entry point has to create a scoped tracker for traces (worse case you just miss out on debug info)

It'd be a bunch of code churn but I don't think it's terrible to have an Printable interface with dumpState() function which various  stream / hcm / connection interfaces can implement, and have L7 alarms and IO entry points (interested folks could implement for L4 as well, if inclined, we'd likely not need that for some time) which latch a thread-local-storage Printable interface for the stream.   the dumpState functions can also be quite helpful in debug/error logging, especially for ASSERTs/RELEASE_ASSERT as they generally have enough information about state to help assess what went wrong.

Checking in with @envoyproxy/maintainers before I go off code spelunking, both for if we're up for the extra APIage and plumbing, and if you all have lower hanging fruit which might make sense to tackle first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Envoy crash logging #7300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Envoy crash logging #7300

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions