annotate proxy errors with target metadata by hawkw · Pull Request #2154 · linkerd/linkerd2-proxy

hawkw · 2023-01-12T21:11:32Z

In order to improve the proxy's debuggability, it's desirable that the
errors logged by the proxy and/or returned to HTTP clients include
relevant information about where in the proxy that error occurred.
Currently, buffer layers are constructed with a string "name" parameter
that describes which part of the stack is behind that buffer, so that
the fail-fast errors returned by those buffers can contain the stack
layer's name. While this provides some useful context, it isn't a
complete solution, as this context is only added to fail-fast errors,
and not other errors such as I/O errors, Additionally, these names only
indicate where in the stack the error occurred, and don't provide
metadata describing which instance of that stack layer produced the
error --- e.g., there is no context describing the client address,
logical address, concrete address, or endpoint (as appropriate) that
produced the error.

Instead, we should annotate all errors with the following information:

The name of the stack layer where the error occurred
Target metadata including the logical, concrete, and endpoint
addresses (as appropriate)

This branch introduces a new AnnotateError middleware in
linkerd-stack. This middleware consists of a NewService which
extracts a Param from the target and formats it into a
reference-counted Arc<str>, and the Service produced by that
NewService, which wraps any errors returned by the inner Service
with that context. An ExtractParam implementation can also be provided
to configure how the context is extracted from the target. In addition,
the annotate_error modules contains helpers for formatting a Param
provided by the target plus a string name describing a stack layer.

Furthermore, this branch removes the existing name parameter from buffer
layers, and replaces them with the AnnotateError middleware, which is
used to add target metadata and a string name describing the stack
layer to all errors. Additionally, the various HttpRescue
implementations have been changed slightly, so that when a specific
cause is found for an error, the entire error chain is still formatted
in the HTTP response l5d-error header, rather than just the cause.
This ensures that all context added to an error is included in the
formatted message, rather than just the root cause (which lacks target
metadata).

This changes the `HttpRescue` impls to use the entire error value as the message, rather than just the cause value. This way, we will include all context generated by `AnnotateError` layers in the stack.

hawkw · 2023-01-12T21:12:16Z

linkerd/app/inbound/src/http/tests.rs

        .get(L5D_PROXY_ERROR)
        .expect("response did not contain L5D_PROXY_ERROR header");
-    assert_eq!(message, "service in fail-fast");
+    assert_eq!(message, "HTTP logical (127.0.0.1:80): service in fail-fast");


i'm open to bikeshedding this format, but it seemed reasonable to me....

olix0r

Is it possible to, instead of using this "naming" strategy, allow each stack to produce entirely different error type? This will make error handling/inspection simpler, I think. It also means we're not beholden to the error format of <name> (<context>).

What if the NewAnnotateError stack takes an ExtractParam that produces, something like Annotate<Fn(E) -> Error> (i.e. a marker type wrapping an error converter). Then we could perhaps even use MapErr directly (i.e. replacing the AnnotateError service).

Then, each stack could define its own error type and even do error inspection themselves.

This branch introduces a new `AnnotateError` middleware in `linkerd-stack`, for adding context to errors returned by a wrapped `Service`. The `annotate_error` module contains a new trait, `ErrorContext`, which is implemented by types that provide context to `Error`s by wrapping them with a new `Error` type, and the `AnnotateError` middleware itself. The middleware consists of a `NewService` which extracts a `Param` implementing `ErrorContext` and produces a `Service`, which wraps any errors returned by the inner `Service` using that `ErrorContext`. Additionally, `NewAnnotateError` may be constructed with an `ExtractParam` implementation that configures how an `ErrorContext` implementation is produced from a target type, when the `Param`s themselves do not implement `ErrorContext`. Finally, a pre-made implementation of `ErrorContext`, called `FromTarget`, that produces new errors when the error type implements `From<(&T, Error)>` is provided. In this case, the error type can implement `From` with the appropriate target type, and the `annotate_error::layer_from_target` constructor can be used, without requiring an `ExtractParam` implementation to be written for each stack. This change was extracted from PR #2154.

hawkw · 2023-01-13T23:39:41Z

Closing this branch out, as I'm working on a new version on top of #2158.

hawkw added 8 commits January 12, 2023 10:45

stack: add AnnotateError middleware

fcd952f

add new error annotations in outbound

5920164

de-boilerplate

6b2010f

replace uses of buffer names with new error layers

8efa6f9

prettier formatting?

8a0f09c

change error responders to use the whole error

7f03d21

This changes the `HttpRescue` impls to use the entire error value as the message, rather than just the cause value. This way, we will include all context generated by `AnnotateError` layers in the stack.

update assertions

40bcb65

docs

0bcd24a

hawkw requested a review from a team as a code owner January 12, 2023 21:11

hawkw commented Jan 12, 2023

View reviewed changes

update some assertions

20ec006

olix0r reviewed Jan 13, 2023

View reviewed changes

sketch out per route error types

bb7e807

hawkw mentioned this pull request Jan 13, 2023

stack: add AnnotateError middleware #2158

Merged

hawkw closed this Jan 13, 2023

olix0r deleted the eliza/error-context branch March 7, 2023 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotate proxy errors with target metadata#2154

annotate proxy errors with target metadata#2154
hawkw wants to merge 10 commits intomainfrom
eliza/error-context

hawkw commented Jan 12, 2023

Uh oh!

hawkw Jan 12, 2023

Uh oh!

olix0r left a comment •

edited

Loading

Uh oh!

hawkw commented Jan 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hawkw commented Jan 12, 2023

Uh oh!

hawkw Jan 12, 2023

Choose a reason for hiding this comment

Uh oh!

olix0r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw commented Jan 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

olix0r left a comment •

edited

Loading