Skip to content

Classifying connection timeouts as local reset #14394

@mpuncel

Description

@mpuncel

Title: Classifying connect timeouts as local reset

Description:
For a few months we've been seeing occasional 503 errors in our client side sidecars with the LR response flag, and the latency on these calls was always equal to the connect timeout we set. It looks like the router filter classifies connect timeouts this way: https://github.com/envoyproxy/envoy/blob/master/source/common/router/upstream_request.cc#L340

I believe this was changed in April in 6aee603#diff-b61a05a18225decb27f5620aefda1646c259e314547a7d985c369f62a684c12bR305, before that PR I believe that would have been considered a connection failure.

I can understand the rationale for considering it a local reset, but one side effect of that change is that a retry policy that specifies to retry connect-timeout and not reset will now fail to retry that case. Furthermore, if a request is not idempotent it's not an option to retry on reset because that can include cases where the server has already processed the request. I've also seen some confusion on the Envoy slack about this with other folks seeing LR start to show up after an Envoy upgrade. I'm not sure anything needs to change here but just wanted to raise that issue.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions