Conversation
| grpc.client.call.retry_delay | Histogram | s | grpc.method (required), grpc.target (required) | Total time of delay while there is no active attempt during the client call. | ||
| Metric Name | Type | Unit | Labels | Description | ||
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. |
There was a problem hiding this comment.
We might need more than 5 as the upper bound here. When we implement estubs retries, there are modes where there is no limit on the number of retry attempts.
There was a problem hiding this comment.
I think if we end up changing the limit on retries, we can update the advice here.
Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.
| Metric Name | Type | Unit | Labels | Description | ||
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. | ||
| grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10]. |
There was a problem hiding this comment.
In principle, there is no upper bound on the number of transparent retry attempts if each attempt never actually goes out on the wire.
There was a problem hiding this comment.
I added a 10 taking that in consideration. I think this should be enough. I'm not sure it makes sense to distinguish between cases where the number if larger than 10. (I'm willing to change the boundaries here if there are strong opinions. These are just recommendations anyway.)
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. | ||
| grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10]. | ||
| grpc.client.call.hedges | Histogram | {hedge} | grpc.method (required), grpc.target (required) | Number of hedges during the client call. If there were no hedges, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. |
There was a problem hiding this comment.
In estubs, the upper bound for the number of hedged requests can be quite large, so I think we'll want to support larger numbers here too.
There was a problem hiding this comment.
I think if we end up changing the limit on retries, we can update the advice here.
Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.
There was a problem hiding this comment.
I think we've talked about this before, but I don't understand how expontential histograms can possibly work for aggregation. And in any case, we don't support them in the non-per-call metrics APIs, so I don't see how we can recommend that here.
Exponential Histograms are just a type of histograms. Our APIs do not need anything more to support exponential histograms. The OpenTelemetry SDK user can simply provide the buckets to use when creating the histogram view.
expontential histograms can possibly work for aggregation
Aggregation for histograms, which in most cases is calculating percentiles (50% / 95% / 99 %) is always approximate since we no longer have the exact values.
| grpc.client.call.retry_delay | Histogram | s | grpc.method (required), grpc.target (required) | Total time of delay while there is no active attempt during the client call. | ||
| Metric Name | Type | Unit | Labels | Description | ||
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. |
There was a problem hiding this comment.
I think if we end up changing the limit on retries, we can update the advice here.
Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.
| Metric Name | Type | Unit | Labels | Description | ||
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. | ||
| grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10]. |
There was a problem hiding this comment.
I added a 10 taking that in consideration. I think this should be enough. I'm not sure it makes sense to distinguish between cases where the number if larger than 10. (I'm willing to change the boundaries here if there are strong opinions. These are just recommendations anyway.)
| ------------------------------------ | --------- | ------------------- | ---------------------------------------------- | ----------- | ||
| grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. | ||
| grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10]. | ||
| grpc.client.call.hedges | Histogram | {hedge} | grpc.method (required), grpc.target (required) | Number of hedges during the client call. If there were no hedges, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5]. |
There was a problem hiding this comment.
I think if we end up changing the limit on retries, we can update the advice here.
Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.
Implements OTel retry metrics as per grpc/proposal#488 Closes #39195 COPYBARA_INTEGRATE_REVIEW=#39195 from yashykt:AddOTelRetryMetrics b7e8b77 PiperOrigin-RevId: 787268504
Implements OTel retry metrics as per grpc/proposal#488 Closes grpc#39195 COPYBARA_INTEGRATE_REVIEW=grpc#39195 from yashykt:AddOTelRetryMetrics b7e8b77 PiperOrigin-RevId: 787268504
Implements OTel retry metrics as per grpc/proposal#488 Closes grpc#39195 COPYBARA_INTEGRATE_REVIEW=grpc#39195 from yashykt:AddOTelRetryMetrics b7e8b77 PiperOrigin-RevId: 787268504
Implements OTel retry metrics as per grpc/proposal#488 Closes grpc#39195 COPYBARA_INTEGRATE_REVIEW=grpc#39195 from yashykt:AddOTelRetryMetrics b7e8b77 PiperOrigin-RevId: 787268504
No description provided.