Skip to content

internal: Split retry, http-classify, and http-metrics#409

Merged
olix0r merged 4 commits intomasterfrom
ver/retry-metrics-split
Jan 23, 2020
Merged

internal: Split retry, http-classify, and http-metrics#409
olix0r merged 4 commits intomasterfrom
ver/retry-metrics-split

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Jan 21, 2020

The retry middleware is coupled to both the HTTP metrics infrastructure
as well as the various request extensions that are required to route a
request (i.e. via TryClone). Because of this, retry metrics are
present for all HTTP metric scopes, even where retries can't apply.

This change extracts several components from the proxy-http
conglomerate subcrate:

  • http-classify includes classification traits and a middleware layer;
  • http-metrics wraps tower::retry with a MakeService configuration layer;
  • retry wraps tower::retry with a MakeService configuration layer;
  • app-core now handles retry-configuration

Additionally, the metrics::Metric type is now generic over the metric
name type so that prefixed names do not need to be formatted eagerly.

The route_actual_retry_skipped_total counter has been replaced with a
route_retryable_total counter. This new value is only present when
retries are enabled on a route; and a skipped labels is added when
retryability is ignored (i.e. due to budget being exhausted). This
metric is not consumed by the control plane, so it should be safe to
change.

The retry middleware is coupled to both the HTTP metrics infrastructure
as well as the various request extensions that are required to route a
request (i.e. via `TryClone`). Because of this, retry metrics are
present for all HTTP metric scopes, even where retries can't apply.

This change extracts several components from the `proxy-http`
conglomerate subcrate:

* `http-classify` includes classification traits and a middleware layer;
* `http-metrics` wraps `tower::retry` with a `MakeService` configuration layer;
* `retry` wraps `tower::retry` with a `MakeService` configuration layer;
* `app-core` now handles retry-configuration

Additionally, the `metrics::Metric` type is now generic over the metric
name type so that prefixed names do not need to be formatted eagerly.

The `route_actual_retry_skipped_total` counter has been replaced with a
`route_retryable_total` counter. This new value is only present when
retries are enabled on a route; and a `skipped` labels is added when
retryability is ignored (i.e. due to budget being exhausted). This
metric is not consumed by the control plane, so it should be safe to
change.
@olix0r olix0r self-assigned this Jan 21, 2020
@olix0r olix0r requested a review from a team January 22, 2020 02:10
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code that moved around in this PR makes it a little difficult to tell what has actually changed. That said, as far as I can tell, this seems good. I had a few minor comments.

use super::{Class, SuccessOrFailure};
use crate::proxy::http::metrics::classify::{ClassifyEos as _CE, ClassifyResponse as _CR};
use http::{HeaderMap, Response, StatusCode};
use linkerd2_http_classify::{ClassifyEos as _CE, ClassifyResponse as _CR};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity (and I know this code was here previously) is there a reason we import these traits as _CE/_CR rather than just as _?

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the different pieces moving around seems good. It's a little hard to follow what exactly is new, but I understand the purpose of the change. Eliza touched on it above, but I think naming of tower pieces still goes back and forth in the code. For example the Service in http-metrics and http-classify... otherwise 👍

@olix0r
Copy link
Member Author

olix0r commented Jan 23, 2020

There are followup changes coming to these modules. Naming will change, etc.

@olix0r olix0r merged commit 5264573 into master Jan 23, 2020
@olix0r olix0r deleted the ver/retry-metrics-split branch January 23, 2020 18:59
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 4, 2020
This release fixes a bug in the proxy's logging subsystem that could
cause the proxy to consume memory until the process is OOMKilled,
especially when the proxy was configured to log diagnostic information.

The proxy also now properly emits `grpc-status` headers when signaling
proxy errors to gRPC clients.

This release upgrades the proxy's Rust version, the `http` crate
dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the
`prost` crate dependency has been patched to address RUSTSEC-2020-02.

---

* internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408)
* Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410)
* Update http to v0.1.21 (linkerd/linkerd2-proxy#412)
* internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409)
* Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413)
* patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414)
* metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415)
* Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416)
* trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418)
* discover: Warn on discovery error (linkerd/linkerd2-proxy#422)
* router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421)
* errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424)
* app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425)
* trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
adleong pushed a commit to linkerd/linkerd2 that referenced this pull request Feb 4, 2020
This release fixes a bug in the proxy's logging subsystem that could
cause the proxy to consume memory until the process is OOMKilled,
especially when the proxy was configured to log diagnostic information.

The proxy also now properly emits `grpc-status` headers when signaling
proxy errors to gRPC clients.

This release upgrades the proxy's Rust version, the `http` crate
dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the
`prost` crate dependency has been patched to address RUSTSEC-2020-02.

---

* internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408)
* Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410)
* Update http to v0.1.21 (linkerd/linkerd2-proxy#412)
* internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409)
* Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413)
* patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414)
* metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415)
* Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416)
* trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418)
* discover: Warn on discovery error (linkerd/linkerd2-proxy#422)
* router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421)
* errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424)
* app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425)
* trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants