outbound: Configure balancers with service metadata by olix0r · Pull Request #2374 · linkerd/linkerd2-proxy

olix0r · 2023-04-02T17:57:25Z

The OutboundPolicies API includes resource references with its
responses. These references allow us to accurately identify an arbitrary
(Kubernetes style) resource by its group, kind, namespace, and name.

To support new metrics that include these resource coordinates as
labels, we need access to the parent and backend references from the
balancer stack.

This change makes the following changes to accommodate this:

We introduce ParentRef, RouteRef, BackendRef, and
EndpointRef new-types to serve as explicit markers for
different types of metadata we may encounter.
The profile stack is updated to parse out metadata from service names
in the form ..svc.*.
We also support discovering pod metadata from the endpoint labels
dst_pod and dst_namespace.
When we can't infer any metadata from a profile response, we use an
"unknown" metadata.
When using policy, we use control-plane provided metadata.

In practice, pod metadata won't be surfaced by the existing code path;
but we have a relatively simple path forward to using it.

This change splits the balancer stack into its own layer so it's
distinct from the rest of the concrete stack configuration.

There are only minor functional changes in this branch:

The balance{addr=...} tracing context is replace by
service{ns=..,name=...} for clarity;
Errors returned from the balance stack are now rendered as:
Service {name}.{ns}: {error}

When performing policy-based routing, proxies may dispatch requests through per-route backend configurations. In order to illustrate how routing rules apply and how backend distributions are being honored, this change adds two new metrics: * `outbound_http_route_backend_requests_total`; and * `outbound_grpc_route_backend_requests_total` Each of these metrics includes labels that identify a routes parent (i.e. a Service), the route resource being used, and the backend resource being used. This implementation does NOT implement any form of metrics eviction for these new metrics. This is tolerable for the short term as the cardinality of services and routes is generally much less than the cardinality of individual endpoints (where we do require timeout/eviction for metrics).

There are no explicit indications of failure accrual related behavior. This change adds INFO-level logs when a failure accrual breaker is tripped or reeopened. These log messages are scoped within the 'endpoint' log context so that log lines include endpoint and balancer addresses. INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker tripped INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker reopened

The OutboundPolicies API includes resource references with its responses. These references allow us to accurately identify an arbitrary (Kubernetes style) resource by its group, kind, namespace, and name. To support new metrics that include these resource coordinates as labels, we need access to the parent and backend references from the balancer stack. This change makes the following changes to accomodate this: 1. We introduce `ParentRef`, `BackendRef`, and `EndpointRef` new-types to serve as explicit markers for different types of metadata we may encounter. 2. The profile stack is updated to parse out metadata from service names in the form <name>.<ns>.svc.*. 3. We also support discovering pod metadata from the endpoint labels `dst_pod` and `dst_namespace`. 4. When we can't infer any metadata from a profile response, we use an "unknown" metadata. 5. When using policy, we use control-plane provided metadata. In practice, pod metadata won't be surfaced by the existing code path; but we have a relatively simple path forward to using it. This change splits the balancer stack into its own layer so it's distinct from the rest of the concrete stack configuration. There are only minor functional changes in this branch: 1. The `balance{addr=...}` tracing context is replace by `service{ns=..,name=...}` for clarity; 2. Errors returned from the balance stack are now rendered as: "Service <name>.<ns>: <error>"

hawkw

Overall this looks good to me! I left a handful of fairly minor comments.

hawkw · 2023-04-04T18:07:00Z

linkerd/app/outbound/src/http/logical/policy/router.rs

-            policy::BackendDispatcher::Fail { ref message } => {
-                mk_concrete(concrete::Dispatch::Fail {
+            policy::BackendDispatcher::Fail { ref message } => mk_concrete(
+                BackendRef(policy::Meta::new_default("fail")),


not a blocker for now, but it occurs to me that the failing backends could potentially have non-default BackendRef metadata, since they're returned by the controller when a backend service doesn't exist...so they should have the metadata of the referenced (non-existent) service.

hawkw · 2023-04-04T18:09:46Z

linkerd/app/outbound/src/lib.rs

+}
+
+impl EndpointRef {
+    fn new(md: &Metadata, port: NonZeroU16) -> Self {


nit, take it or leave it (definitely not important): since this always returns metadata for a Pod, should this maybe be

Suggested change

fn new(md: &Metadata, port: NonZeroU16) -> Self {

fn new_pod(md: &Metadata, port: NonZeroU16) -> Self {

or something?

hawkw · 2023-04-04T18:10:49Z

linkerd/app/outbound/src/lib.rs

+        };
+        let name = match md.labels().get("dst_pod") {
+            Some(pod) => pod.clone(),
+            None => return Self(UNKNOWN_META.clone()),


not that this will ever happen in practice, since we expect the controller to populate both the namespace and pod labels, but would it make sense to return a metadata that has the namespace field populated but the name field set to <unknown>, rather than throwing out the information that we do have (the ns and the kind)?

Yes probably but way out of scope for now

linkerd/proxy/client-policy/src/lib.rs

linkerd/app/outbound/src/http/concrete.rs

This proxy release adds new `outbound_http_route_backend_requests_total` and `outbound_grpc_route_backend_requests_total` metrics, which can be used to track how routing rules and backend distributions apply to requests. These metrics contain labels describing the route's parent (i.e. a Service), the route resource being used, and the backend resource being used by each request. In addition, this proxy release adds new `INFO`-level logs for failure accrual, and updates logging contexts with Kubernetes resource metadata. --- * outbound: Emit INFO-level logs on failure accrual changes (linkerd/linkerd2-proxy#2373) * app: Rename Metrics types as InboundMetrics and OutboundMetrics (linkerd/linkerd2-proxy#2376) * outbound: Configure balancers with service metadata (linkerd/linkerd2-proxy#2374) * outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2372) * outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2377)

Looks like we accidentally merged PR #2375 without a CI build against the latest state of `main`. In the meantime since #2375 was last built on CI, PR #2374 added an additional metadata field to `policy::HttpParams`, which the `HttpParams` constructed in the test added from #2375 doesn't populate. Therefore, merging this PR broke the build. Whoops! This commit populates the `meta` field, fixing it.

olix0r added 7 commits April 1, 2023 19:58

Move endpoint instrumentation to wrap the failure accrual breaker

ea631f9

Add INFO-level logging on failure accrual breaker state change

d0450e9

wip

11ff1c6

Merge branch 'ver/route-reqs-mtx' into ver/failure-accrual-logs

9bb1783

olix0r changed the title ~~Ver/service meta~~ outbound: Configure balancers with service metadata Apr 2, 2023

This comment was marked as outdated.

Sign in to view

olix0r added 11 commits April 4, 2023 00:41

Merge branch 'main' into ver/service-meta

3d6cdd8

Add ports

bdcbbf1

Use port metadata

6d891ce

undo HTTPRoute metadata/metrics for now

93ee4e1

Merge branch 'main' into ver/service-meta

b282187

back out cargo changes

98a14ea

rm metrics

2a4a629

Hold references on Concrete target type

e2018f8

revert

e854bc5

routeref

aade46f

revert

f9e50cc

olix0r marked this pull request as ready for review April 4, 2023 14:34

olix0r requested a review from a team as a code owner April 4, 2023 14:34

olix0r marked this pull request as draft April 4, 2023 15:33

handle missing port (for tests, really)

cc42f9c

olix0r marked this pull request as ready for review April 4, 2023 16:32

hawkw approved these changes Apr 4, 2023

View reviewed changes

olix0r merged commit 7859b9a into main Apr 4, 2023

olix0r deleted the ver/service-meta branch April 4, 2023 21:04

hawkw mentioned this pull request Apr 4, 2023

proxy: v2.196.0 linkerd/linkerd2#10685

Merged

hawkw mentioned this pull request Apr 24, 2023

outbound: add missing meta field in test policy #2400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

outbound: Configure balancers with service metadata #2374

outbound: Configure balancers with service metadata #2374
olix0r merged 19 commits intomainfrom
ver/service-meta

olix0r commented Apr 2, 2023 •

edited

Loading

Uh oh!

This comment was marked as outdated.

hawkw left a comment

Uh oh!

hawkw Apr 4, 2023

Uh oh!

hawkw Apr 4, 2023

Uh oh!

hawkw Apr 4, 2023

Uh oh!

olix0r Apr 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	fn new(md: &Metadata, port: NonZeroU16) -> Self {
	fn new_pod(md: &Metadata, port: NonZeroU16) -> Self {

Conversation

olix0r commented Apr 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

hawkw Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

hawkw Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

hawkw Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

olix0r Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

olix0r commented Apr 2, 2023 •

edited

Loading