fix(http/prom): record bodies when eos reached#3856
Merged
Conversation
this adds a development dependency, so we can use this mock body type in the outbound proxy's unit tests. Signed-off-by: katelyn martin <kate@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit fixes a bug discovered by @alpeb, which was introduced in proxy v2.288.0. > The associated metric is `outbound_http_route_request_statuses_total`: > > ``` > $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10 > ``` > > The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`. > Before that the metrics looked like: > > ``` > $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96 > outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96 > ``` > > So the difference is the non-empty value for `error=UNKNOWN` even > when `https_status` is 2xx, which `linkerd viz stat-outbound` > interprets as failed requests. in #3086 we introduced a suite of route- and backend-level metrics. that subsystem contains a body middleware that will report itself as having reached the end-of-stream by delegating directly down to its inner body's `is_end_stream()` hint. this is roughly correct, but is slightly distinct from the actual invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must call its `end_stream` helper to classify the outcome and increment the corresponding time series in the `outbound_http_route_request_statuses_total` metric family. in #3504 we upgraded our hyper dependency. while doing so, we neglected to include a call to `end_stream` if a data frame is yielded and the inner body reports itself as having reached the end-of-stream. this meant that instrumented bodies would be polled until the end is reached, but were being dropped before a `None` was encountered. this commit fixes this issue in two ways, to be defensive: * invoke `end_stream()` if a non-trailers frame is yielded, and the inner body now reports itself as having ended. this restores the behavior in place prior to #3504. see the relevant component of that diff, here: <https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274> * rather than delegating to the inner `<B as Body>::is_end_stream()` method, report the end-of-stream being reached by inspecting whether or not the inner response state has been taken. this is the state that directly indicates whether or not the `ResponseBody<B>` middleware is finished. X-ref: #3504 X-ref: #3086 X-ref: linkerd/linkerd2#8733 Signed-off-by: katelyn martin <kate@buoyant.io>
olix0r
approved these changes
Apr 9, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this commit fixes a bug discovered by @alpeb, which was introduced in proxy
v2.288.0.
in #3086 we introduced a suite of route- and backend-level metrics. that
subsystem contains a body middleware that will report itself as having
reached the end-of-stream by delegating directly down to its inner
body's
is_end_stream()hint.this is roughly correct, but is slightly distinct from the actual invariant: a
linkerd_http_prom::record_response::ResponseBody<B>must call itsend_streamhelper to classify the outcome and increment the correspondingtime series in the
outbound_http_route_request_statuses_totalmetric family.in #3504 we upgraded our hyper dependency. while doing so, we neglected to
include a call to
end_streamif a data frame is yielded and the inner bodyreports itself as having reached the end-of-stream.
this meant that instrumented bodies would be polled until the end is reached,
but were being dropped before a
Nonewas encountered.this commit fixes this issue in two ways, to be defensive:
end_stream()if a non-trailers frame is yielded, and the inner bodynow reports itself as having ended. this restores the behavior in place prior
to chore(deps)!: upgrade to hyper 1.x #3504. see the relevant component of that diff, here:
https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274
<B as Body>::is_end_stream()method,report the end-of-stream being reached by inspecting whether or not the inner
response state has been taken. this is the state that directly indicates
whether or not the
ResponseBody<B>middleware is finished.this adds a
linkerd-mock-http-bodydevelopment dependency, so we can use thismock body type in the outbound proxy's unit tests. additional test coverage for
http and grpc routes' metrics is added to demonstrate that this fix works as
expected.
X-ref: #3504
X-ref: #3086
X-ref: linkerd/linkerd2#8733