Skip to content

feat(stats): support grpc status codes in metrics#2624

Merged
istio-testing merged 14 commits intoistio:masterfrom
douglas-reid:grpc-status-codes
Feb 6, 2020
Merged

feat(stats): support grpc status codes in metrics#2624
istio-testing merged 14 commits intoistio:masterfrom
douglas-reid:grpc-status-codes

Conversation

@douglas-reid
Copy link
Copy Markdown
Contributor

@douglas-reid douglas-reid commented Jan 14, 2020

Companion PR to istio/istio#20036.

This PR implements the changes approved in https://tinyurl.com/yy2cwmv5 to report gRPC status codes in metrics.

TODO:

@douglas-reid douglas-reid added the do-not-merge Block automatic merging of a PR. label Jan 14, 2020
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jan 14, 2020
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Jan 14, 2020
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 14, 2020
@PiotrSikora
Copy link
Copy Markdown
Contributor

@douglas-reid do you want to redo it now on top of #2631 in order to get it into 1.5?

@douglas-reid
Copy link
Copy Markdown
Contributor Author

yeah, i'll need to rebase, etc.

@PiotrSikora
Copy link
Copy Markdown
Contributor

@douglas-reid it's merged now.

@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 30, 2020
@douglas-reid
Copy link
Copy Markdown
Contributor Author

@kyessenov @mandarjog please take a look at this PR now.

There are a few important parts, but mostly, a decision is needed regarding the default behavior of grpc_response_status label for non-gRPC requests.

The original proposal design doc was under-specified in this regard.

In this PR, if the protocol is not gRPC, then the fallback will be the UNKNOWN grpc status code (2). This matches the behavior for translating HTTP OK -> gRPC codes (doc).

From http-grpc-status-mapping:

200 is UNKNOWN because there should be a grpc-status in case of truly OK response.

The spec for UNKNOWN, includes:

Unknown error. For example, this error may be returned when a Status value received from another address space belongs to an error space that is not known in this address space. Also errors raised by APIs that do not return enough error information may be converted to this error.

as well as

Error parsing returned status | UNKNOWN | Client

We had previously discussed the following alternatives:

  1. Returning 1 (gRPC OK)
  2. Returning an empty label.

I rejected returning gRPC OK, as that conflicts with the mapping.

Returning an empty label is slightly clunky, as the code gets much more complicated, as do the regexes for generating prometheus labels and the weirdness of having non-populated labels for non-gRPC metrics.

Thoughts?

Note: this PR is blocked on a cherry-pick of @kyessenov's hardening PR of upstream Envoy.

@douglas-reid douglas-reid force-pushed the grpc-status-codes branch 2 times, most recently from d9fcd65 to 89c156e Compare January 30, 2020 18:58
@douglas-reid
Copy link
Copy Markdown
Contributor Author

Update: After thinking about this again, I switched to grpc_response_status: "" (empty string) for non-gRPC protocol cases. Please let me know what you think.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 4, 2020
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 4, 2020
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
@douglas-reid
Copy link
Copy Markdown
Contributor Author

/test test-asan_proxy

@douglas-reid
Copy link
Copy Markdown
Contributor Author

/test test_proxy

@douglas-reid
Copy link
Copy Markdown
Contributor Author

/test test-tsan_proxy

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 5, 2020
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
@douglas-reid douglas-reid marked this pull request as ready for review February 5, 2020 18:42
@douglas-reid douglas-reid requested a review from a team February 5, 2020 18:42
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Feb 5, 2020
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
FilterConfig_AlpnOverride;
using testing::NiceMock;
using testing::Return;
using testing::ReturnPointee;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not used

@kyessenov kyessenov self-requested a review February 5, 2020 20:21
Copy link
Copy Markdown
Contributor

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG. I think we need to de-duplicate all the templates at some point. It's PITA to update them once the deprecated fields stop working.

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
@douglas-reid
Copy link
Copy Markdown
Contributor Author

@kyessenov agreed on the templates. deprecation will hit us hard.

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
@douglas-reid douglas-reid removed the do-not-merge Block automatic merging of a PR. label Feb 6, 2020
@istio-testing istio-testing merged commit cdf3692 into istio:master Feb 6, 2020
douglas-reid added a commit to douglas-reid/proxy that referenced this pull request Feb 7, 2020
istio-testing pushed a commit that referenced this pull request Feb 11, 2020
* feat(stats): support grpc status codes in metrics (#2624)

* fix(stats): remove policy-related dimensions (#2647)

* feat(stats): add support for canonical service labels (#2658)

* update istio/envoy SHA

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* add missing deps for cel

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* buildifier lint fix WORKSPACE respositories.bzl

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* restore file test path

* commenting out RBE stuff

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>

* skip http mxc test

Signed-off-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants