gRPC retries, assuming gRPC status code is in the returned headers#1067
gRPC retries, assuming gRPC status code is in the returned headers#1067htuch merged 29 commits intoenvoyproxy:masterfrom
Conversation
rshriram
left a comment
There was a problem hiding this comment.
This is cool! Could you add the retry policies to docs as well?
|
Oops. Good call! router_filter.rst looks like the obvious place to add
to, but if there's another one I'm missing please let me know
…On Thu, Jun 8, 2017 at 4:04 PM, Shriram Rajagopalan < ***@***.***> wrote:
***@***.**** commented on this pull request.
This is cool! Could you add the retry policies to docs as well?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1067 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARYFvQ0gd4pTUFeWQNaN1ZRxXAWIH4dBks5sCFPCgaJpZM4N0i_9>
.
|
source/common/grpc/common.h
Outdated
|
|
||
| class Common { | ||
| public: | ||
| enum GrpcStatus { |
There was a problem hiding this comment.
It would be helpful if this could be moved to include/envoy/grpc/status.h, since I would like to use this in interface files for the bidi client I'm working on.
There was a problem hiding this comment.
gRPC status code are identical with protobuf status code, so just re-use it might be enough.
There was a problem hiding this comment.
+1 to what @htuch said. I would like to re-use those for fault injection as well. Do we also have HTTP2 specific error codes in a similar place (in addition to HTTP1)? All I see are HTTP1 codes in include/envoy/http/codes.h
There was a problem hiding this comment.
I'm generally a fan of decoupling protobuf from gRPC. As we know from the other stuff going on with grpc-web, there's not necessarily a direct correspondence between gRPC and protobuf.
include/envoy/router/router.h
Outdated
| static const uint32_t RETRY_ON_CONNECT_FAILURE = 0x2; | ||
| static const uint32_t RETRY_ON_RETRIABLE_4XX = 0x4; | ||
| static const uint32_t RETRY_ON_REFUSED_STREAM = 0x8; | ||
| static const uint32_t RETRY_ON_GRPC_CANCELLED = 0x16; |
There was a problem hiding this comment.
facepalm Done :-P
There was a problem hiding this comment.
Don't see this in the updated diff yet.
source/common/grpc/common.h
Outdated
|
|
||
| class Common { | ||
| public: | ||
| enum GrpcStatus { |
There was a problem hiding this comment.
gRPC status code are identical with protobuf status code, so just re-use it might be enough.
| only supported for gRPC status codes in response headers. gRPC status codes in trailers will not trigger | ||
| retry logic.One or more policies can be specified using a ',' delimited list. The supported policies are: | ||
|
|
||
| cancelled |
There was a problem hiding this comment.
Cancelled are returned typically when the call are cancelled by the caller, not sure a proxy should retry.
http://www.grpc.io/docs/guides/concepts.html#cancelling-rpcs
|
|
||
| // We short circuit here and do not both with an allocation if there is no chance we will retry. | ||
| if (request_headers.EnvoyRetryOn() || route_policy.retryOn()) { | ||
| if (request_headers.EnvoyRetryOn() || request_headers.EnvoyRetryGrpcOn() || route_policy.retryOn()) { |
There was a problem hiding this comment.
quick comment (not full review), can we also add inline route support as is done in route_policy.retryOn() ?
There was a problem hiding this comment.
Rather than extending this if block with more conditions (gRPC, HTTP2 error codes, etc.), would it make sense to abstract them under EnvoyRetryOn() ?
There was a problem hiding this comment.
I think request_headers.EnvoyRetryOn() returning true if there were a literal header x-envoy-grpc-retry-on would be confusing as all generally header functions map cleanly to the header specified. Most other code points could be covered under the general "retry" header if folks think that's cleaner.
Working on route policy now. I'm inclined to add the gRPC strings to the existing retry_on field rather than have retry_on and grpc_retry_on (and num_retries and num_grpc_retries) in router config. If anyone has concerns about HTTP vs gRPC collisions and prefers splitting them out, let me know!
| Setting this header on egress requests will cause Envoy to attempt to retry failed requests (number of | ||
| retries defaults to 1, and is configured as x-envoy-grpc-retry is, above). gRPC retries are currently | ||
| only supported for gRPC status codes in response headers. gRPC status codes in trailers will not trigger | ||
| retry logic.One or more policies can be specified using a ',' delimited list. The supported policies are: |
include/envoy/router/router.h
Outdated
| static const uint32_t RETRY_ON_CONNECT_FAILURE = 0x2; | ||
| static const uint32_t RETRY_ON_RETRIABLE_4XX = 0x4; | ||
| static const uint32_t RETRY_ON_REFUSED_STREAM = 0x8; | ||
| static const uint32_t RETRY_ON_GRPC_CANCELLED = 0x16; |
There was a problem hiding this comment.
Don't see this in the updated diff yet.
test/common/router/router_test.cc
Outdated
| HttpTestUtility::addDefaultHeaders(headers); | ||
| router_.decodeHeaders(headers, true); | ||
|
|
||
| // 5xx response. |
test/integration/integration.cc
Outdated
| Http::StreamEncoder* request_encoder; | ||
| request_encoder = | ||
| &codec_client->startRequest(Http::TestHeaderMapImpl{{":method", "POST"}, | ||
| {":path", "/test/long/url"}, |
There was a problem hiding this comment.
Nit: gRPC looks more like /service/method
test/integration/integration.cc
Outdated
| {"x-envoy-retry-grpc-on", "cancelled"}}, | ||
| *response); | ||
| codec_client->sendData(*request_encoder, 1024, false); | ||
| codec_client->sendTrailers(*request_encoder, request_trailers); |
There was a problem hiding this comment.
Nit: gRPC requests don't have trailers.
| request->encodeHeaders(Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "1"}}, false); | ||
| }, | ||
| [&]() -> void { | ||
| if (fake_upstreams_[0]->httpType() == FakeHttpConnection::Type::HTTP1) { |
There was a problem hiding this comment.
Should we just exclude this from HTTP1 upstream tests? In general you can't proxy gRPC to arbitrary H1 backends I think (certainly not bidi streaming, also you need to get grpc-status into trailers, etc.)
There was a problem hiding this comment.
I thought it was worth testing because it is possible to translate gRPC to HTTP/1.1, and the retry logic doesn't check the connection type before checking for gRPC status codes. Basically if the code might be exercised for HTTP responses which include gRPC status codes for some bizarre reason, might as well test it.
|
@alyssawilk before I forget please sign CLA: https://oss.lyft.com/cla/#/signature (CLA bot has been resourced!) |
source/common/grpc/common.cc
Outdated
|
|
||
| const std::string Common::GRPC_CONTENT_TYPE{"application/grpc"}; | ||
|
|
||
| Status::GrpcStatus Common::getGrpcStatus(const Http::HeaderMap& trailers) { |
There was a problem hiding this comment.
Could have Common::validateTrailers() make use of this to do its gRPC status parsing.
There was a problem hiding this comment.
Sure. It's a minor behavior change for out-of-bounds error codes. If we care we could split returns from getGrpcStatus into "missing" status and "out of bounds" status but either way we'll throw an exception so I assume it's moot.
include/envoy/grpc/status.h
Outdated
|
|
||
| // This is a non-GRPC error code, indicating the status code in gRPC headers | ||
| // was either invalid or missing. | ||
| InvalidCode = 16, |
There was a problem hiding this comment.
If you decided copy this but not reusing protobuf (which is fine), let's not use 16 here, maybe -1?
https://github.com/grpc/grpc/blob/master/include/grpc%2B%2B/impl/codegen/status_code_enum.h
source/common/grpc/common.cc
Outdated
| return Status::GrpcStatus::InvalidCode; | ||
| } | ||
| if (grpc_status_code > Status::GrpcStatus::InvalidCode) { | ||
| if (grpc_status_code > Status::GrpcStatus::DataLoss) { |
There was a problem hiding this comment.
Unauthenticated = 16 is the largest one
htuch
left a comment
There was a problem hiding this comment.
LGTM, some minor nits and ready to ship.
include/envoy/grpc/BUILD
Outdated
| ) | ||
|
|
||
| envoy_cc_library( | ||
| name = "status_interface", |
There was a problem hiding this comment.
Nit: Could just call this "status", since it's not as much an interface as an independent interface-level library (we do something similar for optional).
include/envoy/grpc/status.h
Outdated
| // Permission was denied for the RPC. | ||
| PermissionDenied = 7, | ||
| // The RPC does not have required credentials for the RPC to succeed. | ||
| Unauthenticated = 16, |
There was a problem hiding this comment.
Can you reorder this so that this is after DataLoss?
source/common/grpc/common.cc
Outdated
|
|
||
| uint64_t grpc_status_code; | ||
| if (!grpc_status_header || grpc_status_header->value().empty()) { | ||
| return Status::GrpcStatus::MissingStatus; |
There was a problem hiding this comment.
I'm inclined to make this an optional return type, since having a GrpcStatus that indicates status missing is somewhat self contradictory.
source/common/grpc/common.cc
Outdated
| return Optional<Status::GrpcStatus>(); | ||
| } | ||
| if (!StringUtil::atoul(grpc_status_header->value().c_str(), grpc_status_code) || | ||
| grpc_status_code > Status::GrpcStatus::DataLoss) { |
mattklein123
left a comment
There was a problem hiding this comment.
looks great. few small question/comments.
| x-envoy-grpc-retry-on | ||
| ^^^^^^^^^^^^^^^^^^^^^ | ||
| Setting this header on egress requests will cause Envoy to attempt to retry failed requests (number of | ||
| retries defaults to 1, and is configured as x-envoy-grpc-retry is, above). gRPC retries are currently |
There was a problem hiding this comment.
"x-envoy-grpc-retry" I think is a typo? (Not sure what you meant here)
| resource-exhausted | ||
| Envoy will attempt a retry if the gRPC status code in the response headers is "resource-exhausted" (8) | ||
|
|
||
| As with x-envoy-grpc-retry, the number of retries can be controlled via the |
| retry_on | ||
| *(required, string)* specifies the conditions under which retry takes place. These are the same | ||
| conditions documented for :ref:`config_http_filters_router_x-envoy-retry-on`. | ||
| conditions documented for :ref:`config_http_filters_router_x-envoy-retry-on` and |
There was a problem hiding this comment.
It seems a bit odd to me that in the route, we stick HTTP/gRPC retry policies into a single field, but we have 2 different headers. I don't really feel strongly about this, but thought I would mention it in case anyone else has any strong opinions.
|
Hopefully we can get a merge on this today if there isn't anything major left, my PR (#1057) needs this before I can move it from non-WIP status for final review. |
|
LGTM, I still think we might want to think about how things are configured between headers and routes but we can change this in a follow up before 1.4.0 is released if we want. |
…#1067) Description: scope destruction should happen ahead of destroying main_common_. Additionally while cross thread destruction is acceptable, this PR also moves destruction to happen in the context of the main thread. Risk Level: med - resolve crash Testing: repro in alpha builds of the lyft app. Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
…#1067) Description: scope destruction should happen ahead of destroying main_common_. Additionally while cross thread destruction is acceptable, this PR also moves destruction to happen in the context of the main thread. Risk Level: med - resolve crash Testing: repro in alpha builds of the lyft app. Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
**Description** Garbage was checked in accidentally via a doc fix PR #1030, so this commit removes it Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
for issue #721