Skip to content

annotate proxy errors with target metadata#2154

Closed
hawkw wants to merge 10 commits intomainfrom
eliza/error-context
Closed

annotate proxy errors with target metadata#2154
hawkw wants to merge 10 commits intomainfrom
eliza/error-context

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Jan 12, 2023

In order to improve the proxy's debuggability, it's desirable that the
errors logged by the proxy and/or returned to HTTP clients include
relevant information about where in the proxy that error occurred.
Currently, buffer layers are constructed with a string "name" parameter
that describes which part of the stack is behind that buffer, so that
the fail-fast errors returned by those buffers can contain the stack
layer's name. While this provides some useful context, it isn't a
complete solution, as this context is only added to fail-fast errors,
and not other errors such as I/O errors, Additionally, these names only
indicate where in the stack the error occurred, and don't provide
metadata describing which instance of that stack layer produced the
error --- e.g., there is no context describing the client address,
logical address, concrete address, or endpoint (as appropriate) that
produced the error.

Instead, we should annotate all errors with the following information:

  • The name of the stack layer where the error occurred
  • Target metadata including the logical, concrete, and endpoint
    addresses (as appropriate)

This branch introduces a new AnnotateError middleware in
linkerd-stack. This middleware consists of a NewService which
extracts a Param from the target and formats it into a
reference-counted Arc<str>, and the Service produced by that
NewService, which wraps any errors returned by the inner Service
with that context. An ExtractParam implementation can also be provided
to configure how the context is extracted from the target. In addition,
the annotate_error modules contains helpers for formatting a Param
provided by the target plus a string name describing a stack layer.

Furthermore, this branch removes the existing name parameter from buffer
layers, and replaces them with the AnnotateError middleware, which is
used to add target metadata and a string name describing the stack
layer to all errors. Additionally, the various HttpRescue
implementations have been changed slightly, so that when a specific
cause is found for an error, the entire error chain is still formatted
in the HTTP response l5d-error header, rather than just the cause.
This ensures that all context added to an error is included in the
formatted message, rather than just the root cause (which lacks target
metadata).

hawkw added 8 commits January 12, 2023 10:45
This changes the `HttpRescue` impls to use the entire error value
as the message, rather than just the cause value. This way, we will
include all context generated by `AnnotateError` layers in the stack.
@hawkw hawkw requested a review from a team as a code owner January 12, 2023 21:11
.get(L5D_PROXY_ERROR)
.expect("response did not contain L5D_PROXY_ERROR header");
assert_eq!(message, "service in fail-fast");
assert_eq!(message, "HTTP logical (127.0.0.1:80): service in fail-fast");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm open to bikeshedding this format, but it seemed reasonable to me....

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to, instead of using this "naming" strategy, allow each stack to produce entirely different error type? This will make error handling/inspection simpler, I think. It also means we're not beholden to the error format of <name> (<context>).

What if the NewAnnotateError stack takes an ExtractParam that produces, something like Annotate<Fn(E) -> Error> (i.e. a marker type wrapping an error converter). Then we could perhaps even use MapErr directly (i.e. replacing the AnnotateError service).

Then, each stack could define its own error type and even do error inspection themselves.

hawkw added a commit that referenced this pull request Jan 13, 2023
This branch introduces a new `AnnotateError` middleware in
`linkerd-stack`, for adding context to errors returned by a wrapped
`Service`.

The `annotate_error` module contains a new trait, `ErrorContext`, which
is implemented by types that provide context to `Error`s by wrapping
them with a new `Error` type, and the `AnnotateError` middleware itself.
The middleware consists of a `NewService` which extracts a `Param`
implementing `ErrorContext` and produces a `Service`, which wraps any
errors returned by the inner `Service` using that `ErrorContext`.

Additionally, `NewAnnotateError` may be constructed with an
`ExtractParam` implementation that configures how an `ErrorContext`
implementation is produced from a target type, when the `Param`s
themselves do not implement `ErrorContext`. Finally, a pre-made
implementation of `ErrorContext`, called `FromTarget`, that produces new
errors when the error type implements `From<(&T, Error)>` is provided.
In this case, the error type can implement `From` with the appropriate
target type, and the `annotate_error::layer_from_target` constructor can
be used, without requiring an `ExtractParam` implementation to be
written for each stack.

This change was extracted from PR #2154.
@hawkw
Copy link
Contributor Author

hawkw commented Jan 13, 2023

Closing this branch out, as I'm working on a new version on top of #2158.

@hawkw hawkw closed this Jan 13, 2023
@olix0r olix0r deleted the eliza/error-context branch March 7, 2023 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants