internal: Split retry, http-classify, and http-metrics#409
Conversation
The retry middleware is coupled to both the HTTP metrics infrastructure as well as the various request extensions that are required to route a request (i.e. via `TryClone`). Because of this, retry metrics are present for all HTTP metric scopes, even where retries can't apply. This change extracts several components from the `proxy-http` conglomerate subcrate: * `http-classify` includes classification traits and a middleware layer; * `http-metrics` wraps `tower::retry` with a `MakeService` configuration layer; * `retry` wraps `tower::retry` with a `MakeService` configuration layer; * `app-core` now handles retry-configuration Additionally, the `metrics::Metric` type is now generic over the metric name type so that prefixed names do not need to be formatted eagerly. The `route_actual_retry_skipped_total` counter has been replaced with a `route_retryable_total` counter. This new value is only present when retries are enabled on a route; and a `skipped` labels is added when retryability is ignored (i.e. due to budget being exhausted). This metric is not consumed by the control plane, so it should be safe to change.
hawkw
left a comment
There was a problem hiding this comment.
All of the code that moved around in this PR makes it a little difficult to tell what has actually changed. That said, as far as I can tell, this seems good. I had a few minor comments.
linkerd/app/core/src/classify.rs
Outdated
| use super::{Class, SuccessOrFailure}; | ||
| use crate::proxy::http::metrics::classify::{ClassifyEos as _CE, ClassifyResponse as _CR}; | ||
| use http::{HeaderMap, Response, StatusCode}; | ||
| use linkerd2_http_classify::{ClassifyEos as _CE, ClassifyResponse as _CR}; |
There was a problem hiding this comment.
Out of curiosity (and I know this code was here previously) is there a reason we import these traits as _CE/_CR rather than just as _?
kleimkuhler
left a comment
There was a problem hiding this comment.
All the different pieces moving around seems good. It's a little hard to follow what exactly is new, but I understand the purpose of the change. Eliza touched on it above, but I think naming of tower pieces still goes back and forth in the code. For example the Service in http-metrics and http-classify... otherwise 👍
|
There are followup changes coming to these modules. Naming will change, etc. |
This release fixes a bug in the proxy's logging subsystem that could cause the proxy to consume memory until the process is OOMKilled, especially when the proxy was configured to log diagnostic information. The proxy also now properly emits `grpc-status` headers when signaling proxy errors to gRPC clients. This release upgrades the proxy's Rust version, the `http` crate dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the `prost` crate dependency has been patched to address RUSTSEC-2020-02. --- * internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408) * Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410) * Update http to v0.1.21 (linkerd/linkerd2-proxy#412) * internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409) * Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413) * patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414) * metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415) * Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416) * trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418) * discover: Warn on discovery error (linkerd/linkerd2-proxy#422) * router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421) * errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424) * app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425) * trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
This release fixes a bug in the proxy's logging subsystem that could cause the proxy to consume memory until the process is OOMKilled, especially when the proxy was configured to log diagnostic information. The proxy also now properly emits `grpc-status` headers when signaling proxy errors to gRPC clients. This release upgrades the proxy's Rust version, the `http` crate dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the `prost` crate dependency has been patched to address RUSTSEC-2020-02. --- * internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408) * Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410) * Update http to v0.1.21 (linkerd/linkerd2-proxy#412) * internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409) * Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413) * patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414) * metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415) * Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416) * trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418) * discover: Warn on discovery error (linkerd/linkerd2-proxy#422) * router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421) * errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424) * app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425) * trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
The retry middleware is coupled to both the HTTP metrics infrastructure
as well as the various request extensions that are required to route a
request (i.e. via
TryClone). Because of this, retry metrics arepresent for all HTTP metric scopes, even where retries can't apply.
This change extracts several components from the
proxy-httpconglomerate subcrate:
http-classifyincludes classification traits and a middleware layer;http-metricswrapstower::retrywith aMakeServiceconfiguration layer;retrywrapstower::retrywith aMakeServiceconfiguration layer;app-corenow handles retry-configurationAdditionally, the
metrics::Metrictype is now generic over the metricname type so that prefixed names do not need to be formatted eagerly.
The
route_actual_retry_skipped_totalcounter has been replaced with aroute_retryable_totalcounter. This new value is only present whenretries are enabled on a route; and a
skippedlabels is added whenretryability is ignored (i.e. due to budget being exhausted). This
metric is not consumed by the control plane, so it should be safe to
change.