Add dst and route labels to request and response metrics#296
Add dst and route labels to request and response metrics#296
Conversation
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
|
Excited to check this out. What's the best way to test this out and look at the Prometheus stats? This repo's README recommends using Cargo and |
|
The easiest way would be to use the proxy image I've already pushed: |
|
Can you confirm that I'm testing this correctly? I have a trafficsplit setup in booksapp; requests are being split between I installed: Then ran Then I port forwarded Prometheus: When I went to localhost:9090, this request: gave me the results : I was expecting one of the result rows to be going from authority |
|
@scottcarol your setup is correct except that you should be looking for a result row that goes from |
|
👍 Got it, just needed to delete the pods in my namespace so they could restart. The correct data is showing up! |
|
Nice, and I'm seeing |
|
|
||
| impl FmtLabels for dst::Route { | ||
| fn fmt_labels(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
| let labels = prefix_labels("rt", self.labels().as_ref().into_iter()); |
There was a problem hiding this comment.
What was your thinking behind the label name rt_route? Is the rt for route as well? target_route seems easier to understand to me, would that be technically correct?
There was a problem hiding this comment.
The rt_route label already exists on the route (route_response_total et al.) metrics. This change just puts that label on the endpoint (response_total et al.) metrics as well.
The rt prefix indicates that this is a route label and route is the name of the route label. The naming is not terribly obvious, but since the metrics are only user-facing for advanced users who are interacting directly with prometheus, the standards for clarity can be somewhat relaxed.
There was a problem hiding this comment.
I see, thanks for the explanation. Do we have a doc anywhere where we list the available user-facing labels, and what they mean? Happy to do this if there isn't one.
| impl Default for Labels { | ||
| fn default() -> Self { | ||
| let mut labels = IndexMap::with_capacity(1); | ||
| labels.insert("route".to_string(), "".to_string()); |
There was a problem hiding this comment.
Without more context, this is super weird. Why is there a magic string in here? Why is this necessary at all?
There was a problem hiding this comment.
Specifically, why is it necessary to hydrate an empty label?
There was a problem hiding this comment.
The problem is that if a route has no labels, its FmtLabels implementation will format empty which will cause the total label string to include ,, which breaks Prometheus. Adding an empty label is hack to work around this.
| last_update: Instant, | ||
| total: Counter, | ||
| by_retry_skipped: IndexMap<RetrySkipped, Counter>, | ||
| by_dst: IndexMap<Option<Addr>, DstMetrics<R, C>>, |
There was a problem hiding this comment.
I don't really understand why the metrics registry has to be aware of any of this... It seems like a it should all be a detail of the stack targets...
Do we not have both the logical and concrete dsts in the endpoint stack target? If not, can we?
There was a problem hiding this comment.
We don't have the logical and concrete dsts in the endpoint stack target, just the concrete. We could have the logical dst in there, but this would make it so that concrete dst load balancers would not be shared between logical dsts.
This is why we propagate the logical dst in the request extensions. Having the logical dst in the target of the endpoint stack would certainly simplify this a lot. Not sure how to evaluate the tradeoff, though.
There was a problem hiding this comment.
Per offline discussion: because a concrete lb is not shared across multiple logical names, we actually can have all of the needed metadata in the endpoint stack's target, which should greatly simplify the change.
|
Yakshave:
|
|
|
let's split out that change, then. there's likely a simplification for that change as well, but it will be easier to address in a follow-up change. |
|
Closed in favor of #300 |
Supersedes #296 The authority label reflects the concrete dst of the request in the endpoint stack. In the case of a traffic split, this may be different from the actual authority of the request (i.e. the logical dst). We add the logical dst as a field on the endpoint target and use this value for the authority label instead of the concrete dst. The net result is that the authority field reflects the apex service of a traffic split and the `dst_service` label reflects the leaf service. Signed-off-by: Alex Leong <alex@buoyant.io>
The
request_totalandresponse_totalmetrics are measured in the endpoint stack and do not include a label which indicates the dst of the request. These metrics do have anauthoritylabel, but this label's value is taken from the target of the endpoint stack. In the case of a traffic-split, theauthoritylabel corresponds to the overridden (concrete) dst, and not to the authority of the request itself. In order to know the logical dst of the request, a new label is needed.We add a
dstlabel by recording the target of the dst stack into the request extensions. We then extract this value in the endpoint stack and use it as a label value. Similarly, we also record the route (the target of the route stack) into the request extensions and then use this as a label in the endpoint stat metrics as well. The route label is not related to traffic-splits, but seems like useful information anyway.Labels
One side-effect of this change is that the default route now has a route label with an empty value. This means that proxy metrics for requests without a route will now include the
rt_route=""label. Prometheus seems to filter out these empty value labels and these are not present in Prometheus.