-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: Envoy intermittently responds with 503 UC (upstream_reset_before_response_started{connection_termination})
Description:
What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.
We have a group of Envoy servers running as an Edge Proxy for a number of backend services. We configure these Envoy servers using in-house built xDS GRPC server. See example configuration dump attached to this issue.
We started noticing intermittent responses with error code 503 and response flag "UC" + upstream_reset_before_response_started{connection_termination} appearing in response code details. It seems that these responses are produced by Envoy and not returned by the backend services. We observed this issue while running Envoy versions v1.16 and v1.17.
We looked at some previously reported similar GitHub issues, such as:
- Bug: 503 UC sometimes, upstream_reset_before_response_started #8639
- Envoy issuing 503's intermittently istio/istio#15285
We tried to change some configuration settings, namely:
- enable TCP Keep-alives
- adjust Idle Timeouts (for ex. use default values, disable idle timeout)
None of the actions mentioned above had any influence on the behavior of our Envoys' fleet.
Repro steps:
We have no repeatable method for reproducing these errors. They are very sporadic, ie. 1-2 per 100000 requests. They are spread across various clusters. Some of these clusters receive large and constant amount of requests so the possibility of the upstream connections to become idle is very unlikely.
Config:
See included in tarball attached to this issue
Logs:
This issue occurs in our Production environment where we do not have debug level logging enabled.
Can you please advise:
- why does the Envoy terminates connections?
- have this problem been reported by other Envoy's users?
- is there a possibility that requests are not distributed evenly across all backend servers which could lead to Envoy resetting upstream connections?
- what can we do on our side to eliminate this problem?