Skip to content

Retry on mid-stream resets result in 503 but no retry #5876

@liviu-cristea

Description

@liviu-cristea

Title: One line description

Description:

I have a fleet on Envoy running in Docker calling into upstream which is a Google Cloud hosted service, via h2.
Envoy periodically (very hard to repro) gets 503 with upstream_cx_destroy_remote_with_active_rq being bumped up.
I tried using https://www.envoyproxy.io/docs/envoy/v1.9.0/configuration/http_filters/router_filter#config-http-filters-router-x-envoy-retry-on retriable-status-codes, made 503 a retryable error code, however the retry does not take place.
The retry does take place only if I explicitly make upstream send the 503 (without bumping upstream_cx_destroy_remote_with_active_rq)

Is this expected? How can I make Envoy retry in this case?

This only happens once every 24-36 hours for a 200 QPS load. But when it does hundreds of requests fail at a time, causing a big damage. Turning out debug logs might slow everything down. Should I not be concerned about the IO/CPU increase and the effect of Envoy overhead on redirecting those requests?
Also, 5xx as a retry policy does not work. I explicitly want 503 to be retried on, so in 1.9 I am doing this but no effect for Envoy generated 503s as a result of upstream_cx_destroy_remote_with_active_rq. How do I proceed to give you meaningful info?

                      prefix: "/"
                    route:
                      timeout: 120s
                      cluster: test_cluster
                      host_rewrite: "x.y.com"
                      retry_policy:
                        retry_on: connect-failure,refused-stream,retriable-status-codes
                        num_retries: 2
                        retriable_status_codes: [503]

From Harvey Tuch:

Looking at RetryStateImpl::wouldRetryFromReset() and RetryStateImpl::wouldRetry() I'm not sure if they're handling the case of a mid-stream reset. Please file a GH issue and we can discuss further there.

[optional Relevant Links:]

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/envoy-users/fV87_K4cMks/LvGkWbSZGgAJ
is it same as #5023 ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests. Not bugs or questions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions