From the documentation on retries, the x-envoy-retry-on header can be configured for handling HTTP status codes like 5XX and 4XX. This works great for HTTP services. However, with gRPC all HTTP status codes (if the server is running properly) will return a 200 OK. The actual error code is found within the gRPC response.
Would it be feasible to create a retry policy for a x-envoy-retry-grpc-on header that respects a list of gRPC error codes? There are a few codes that could be deemed as retriable CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED but I am hesitant to make assumptions about implementation details within a service by grouping them together. Which is why I think a list may work best. I'm open to ideas here.
Sample Header
x-envoy-retry-grpc-on: CANCELLED, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED
The existing retry header would then be configured as a fallback for when a service is unreachable and the HTTP status codes have more meaning.
From the documentation on retries, the
x-envoy-retry-onheader can be configured for handlingHTTPstatus codes like5XXand4XX. This works great forHTTPservices. However, withgRPCallHTTPstatus codes (if the server is running properly) will return a200 OK. The actual error code is found within thegRPCresponse.Would it be feasible to create a retry policy for a
x-envoy-retry-grpc-onheader that respects a list ofgRPCerror codes? There are a few codes that could be deemed as retriableCANCELLED,DEADLINE_EXCEEDED,RESOURCE_EXHAUSTEDbut I am hesitant to make assumptions about implementation details within a service by grouping them together. Which is why I think a list may work best. I'm open to ideas here.Sample Header
The existing retry header would then be configured as a fallback for when a service is unreachable and the
HTTPstatus codes have more meaning.