-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: Classifying connect timeouts as local reset
Description:
For a few months we've been seeing occasional 503 errors in our client side sidecars with the LR response flag, and the latency on these calls was always equal to the connect timeout we set. It looks like the router filter classifies connect timeouts this way: https://github.com/envoyproxy/envoy/blob/master/source/common/router/upstream_request.cc#L340
I believe this was changed in April in 6aee603#diff-b61a05a18225decb27f5620aefda1646c259e314547a7d985c369f62a684c12bR305, before that PR I believe that would have been considered a connection failure.
I can understand the rationale for considering it a local reset, but one side effect of that change is that a retry policy that specifies to retry connect-timeout and not reset will now fail to retry that case. Furthermore, if a request is not idempotent it's not an option to retry on reset because that can include cases where the server has already processed the request. I've also seen some confusion on the Envoy slack about this with other folks seeing LR start to show up after an Envoy upgrade. I'm not sure anything needs to change here but just wanted to raise that issue.