Skip to content

outbound: Configure balancers with service metadata #2374

Merged
olix0r merged 19 commits intomainfrom
ver/service-meta
Apr 4, 2023
Merged

outbound: Configure balancers with service metadata #2374
olix0r merged 19 commits intomainfrom
ver/service-meta

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Apr 2, 2023

The OutboundPolicies API includes resource references with its
responses. These references allow us to accurately identify an arbitrary
(Kubernetes style) resource by its group, kind, namespace, and name.

To support new metrics that include these resource coordinates as
labels, we need access to the parent and backend references from the
balancer stack.

This change makes the following changes to accommodate this:

  1. We introduce ParentRef, RouteRef, BackendRef, and
    EndpointRef new-types to serve as explicit markers for
    different types of metadata we may encounter.
  2. The profile stack is updated to parse out metadata from service names
    in the form ..svc.*.
  3. We also support discovering pod metadata from the endpoint labels
    dst_pod and dst_namespace.
  4. When we can't infer any metadata from a profile response, we use an
    "unknown" metadata.
  5. When using policy, we use control-plane provided metadata.

In practice, pod metadata won't be surfaced by the existing code path;
but we have a relatively simple path forward to using it.

This change splits the balancer stack into its own layer so it's
distinct from the rest of the concrete stack configuration.

There are only minor functional changes in this branch:

  1. The balance{addr=...} tracing context is replace by
    service{ns=..,name=...} for clarity;
  2. Errors returned from the balance stack are now rendered as:
    Service {name}.{ns}: {error}

olix0r added 7 commits April 1, 2023 19:58
When performing policy-based routing, proxies may dispatch requests
through per-route backend configurations. In order to illustrate how
routing rules apply and how backend distributions are being honored,
this change adds two new metrics:

* `outbound_http_route_backend_requests_total`; and
* `outbound_grpc_route_backend_requests_total`

Each of these metrics includes labels that identify a routes parent
(i.e. a Service), the route resource being used, and the backend
resource being used.

This implementation does NOT implement any form of metrics eviction for
these new metrics. This is tolerable for the short term as the
cardinality of services and routes is generally much less than the
cardinality of individual endpoints (where we do require
timeout/eviction for metrics).
There are no explicit indications of failure accrual related behavior.

This change adds INFO-level logs when a failure accrual breaker is
tripped or reeopened. These log messages are scoped within the
'endpoint' log context so that log lines include endpoint and balancer
addresses.

    INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker tripped
    INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker reopened
The OutboundPolicies API includes resource references with its
responses. These references allow us to accurately identify an arbitrary
(Kubernetes style) resource by its group, kind, namespace, and name.

To support new metrics that include these resource coordinates as
labels, we need access to the parent and backend references from the
balancer stack.

This change makes the following changes to accomodate this:

1. We introduce `ParentRef`, `BackendRef`, and `EndpointRef` new-types
   to serve as explicit markers for different types of metadata we may
   encounter.
2. The profile stack is updated to parse out metadata from service names
   in the form <name>.<ns>.svc.*.
3. We also support discovering pod metadata from the endpoint labels
   `dst_pod` and `dst_namespace`.
4. When we can't infer any metadata from a profile response, we use an
   "unknown" metadata.
5. When using policy, we use control-plane provided metadata.

In practice, pod metadata won't be surfaced by the existing code path;
but we have a relatively simple path forward to using it.

This change splits the balancer stack into its own layer so it's
distinct from the rest of the concrete stack configuration.

There are only minor functional changes in this branch:

1. The `balance{addr=...}` tracing context is replace by
   `service{ns=..,name=...}` for clarity;
2. Errors returned from the balance stack are now rendered as:
   "Service <name>.<ns>: <error>"
@olix0r olix0r changed the title Ver/service meta outbound: Configure balancers with service metadata Apr 2, 2023
@olix0r

This comment was marked as outdated.

@olix0r olix0r marked this pull request as ready for review April 4, 2023 14:34
@olix0r olix0r requested a review from a team as a code owner April 4, 2023 14:34
@olix0r olix0r marked this pull request as draft April 4, 2023 15:33
@olix0r olix0r marked this pull request as ready for review April 4, 2023 16:32
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me! I left a handful of fairly minor comments.

policy::BackendDispatcher::Fail { ref message } => {
mk_concrete(concrete::Dispatch::Fail {
policy::BackendDispatcher::Fail { ref message } => mk_concrete(
BackendRef(policy::Meta::new_default("fail")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for now, but it occurs to me that the failing backends could potentially have non-default BackendRef metadata, since they're returned by the controller when a backend service doesn't exist...so they should have the metadata of the referenced (non-existent) service.

}

impl EndpointRef {
fn new(md: &Metadata, port: NonZeroU16) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it (definitely not important): since this always returns metadata for a Pod, should this maybe be

Suggested change
fn new(md: &Metadata, port: NonZeroU16) -> Self {
fn new_pod(md: &Metadata, port: NonZeroU16) -> Self {

or something?

};
let name = match md.labels().get("dst_pod") {
Some(pod) => pod.clone(),
None => return Self(UNKNOWN_META.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that this will ever happen in practice, since we expect the controller to populate both the namespace and pod labels, but would it make sense to return a metadata that has the namespace field populated but the name field set to <unknown>, rather than throwing out the information that we do have (the ns and the kind)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes probably but way out of scope for now

@olix0r olix0r merged commit 7859b9a into main Apr 4, 2023
@olix0r olix0r deleted the ver/service-meta branch April 4, 2023 21:04
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Apr 4, 2023
This proxy release adds new `outbound_http_route_backend_requests_total`
and `outbound_grpc_route_backend_requests_total` metrics, which can be
used to track how routing rules and backend distributions apply to
requests. These metrics contain labels describing the route's parent
(i.e. a Service), the route resource being used, and the backend
resource being used by each request. In addition, this proxy release
adds new `INFO`-level logs for failure accrual, and updates logging
contexts with Kubernetes resource metadata.

---

* outbound: Emit INFO-level logs on failure accrual changes (linkerd/linkerd2-proxy#2373)
* app: Rename Metrics types as InboundMetrics and OutboundMetrics (linkerd/linkerd2-proxy#2376)
* outbound: Configure balancers with service metadata (linkerd/linkerd2-proxy#2374)
* outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2372)
* outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2377)
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Apr 5, 2023
This proxy release adds new `outbound_http_route_backend_requests_total`
and `outbound_grpc_route_backend_requests_total` metrics, which can be
used to track how routing rules and backend distributions apply to
requests. These metrics contain labels describing the route's parent
(i.e. a Service), the route resource being used, and the backend
resource being used by each request. In addition, this proxy release
adds new `INFO`-level logs for failure accrual, and updates logging
contexts with Kubernetes resource metadata.

---

* outbound: Emit INFO-level logs on failure accrual changes (linkerd/linkerd2-proxy#2373)
* app: Rename Metrics types as InboundMetrics and OutboundMetrics (linkerd/linkerd2-proxy#2376)
* outbound: Configure balancers with service metadata (linkerd/linkerd2-proxy#2374)
* outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2372)
* outbound: Report per-route-backend request count metrics (linkerd/linkerd2-proxy#2377)
hawkw added a commit that referenced this pull request Apr 24, 2023
Looks like we accidentally merged PR #2375 without a CI build against
the latest state of `main`. In the meantime since #2375 was last built
on CI, PR #2374 added an additional metadata field to
`policy::HttpParams`, which the `HttpParams` constructed in the test
added from #2375 doesn't populate. Therefore, merging this PR broke the
build. Whoops!

This commit populates the `meta` field, fixing it.
hawkw added a commit that referenced this pull request Apr 24, 2023
Looks like we accidentally merged PR #2375 without a CI build against
the latest state of `main`. In the meantime since #2375 was last built
on CI, PR #2374 added an additional metadata field to
`policy::HttpParams`, which the `HttpParams` constructed in the test
added from #2375 doesn't populate. Therefore, merging this PR broke the
build. Whoops!

This commit populates the `meta` field, fixing it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants