Skip to content

Retry not firing as expected, even with retriable_status_codes #6726

@jaygorrell

Description

@jaygorrell

Issue Template

Title: Some 503 conditions not being retried, even with 503 as a retry code

Description:
In an Istio environment every service has a default retry policy now. It looks like this:

          "route": {
           "cluster": "outbound|80||my-service.default.svc.cluster.local",
           "timeout": "0s",
           "retry_policy": {
            "retry_on": "connect-failure,refused-stream,unavailable,cancelled,resource-exhausted,retriable-status-codes",
            "num_retries": 2,
            "retry_host_predicate": [
             {
              "name": "envoy.retry_host_predicates.previous_hosts"
             }
            ],
            "host_selection_retry_max_attempts": "3",
            "retriable_status_codes": [
             503
            ]
           },
           "max_grpc_timeout": "0s"
          }

During scale-down events, I get logs like the following on the client-side that appear to not be re-tried:

[2019-04-25 00:56:30.950][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:502] [C11540] remote close
[2019-04-25 00:56:30.950][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C11540] closing socket: 0
[2019-04-25 00:56:30.950][31][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C11540] disconnect. resetting 1 pending requests
[2019-04-25 00:56:30.950][31][debug][client] [external/envoy/source/common/http/codec_client.cc:105] [C11540] request reset
[2019-04-25 00:56:30.950][31][debug][router] [external/envoy/source/common/router/router.cc:644] [C11555][S6290924653342831959] upstream reset: reset reason connection termination
[2019-04-25 00:56:30.951][31][debug][filter] [src/envoy/http/mixer/filter.cc:133] Called Mixer::Filter : encodeHeaders 2
[2019-04-25 00:56:30.951][31][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1243] [C11555][S6290924653342831959] closing connection due to connection close header
[2019-04-25 00:56:30.951][31][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1305] [C11555][S6290924653342831959] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '95'
'content-type', 'text/plain'
'date', 'Thu, 25 Apr 2019 00:56:30 GMT'
'server', 'envoy'
'connection', 'close'

[2019-04-25 00:56:30.951][31][debug][filter] [src/envoy/http/mixer/filter.cc:205] Called Mixer::Filter : onDestroy state: 2
[2019-04-25 00:56:30.951][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:101] [C11555] closing data_to_write=248 type=2
[2019-04-25 00:56:30.951][31][debug][connection] [external/envoy/source/common/network/connection_impl.cc:153] [C11555] setting delayed close timer with timeout 1000 ms
[2019-04-25 00:56:30.951][31][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:129] [C11540] client disconnected, failure reason: 
[2019-04-25 00:56:30.951][31][debug][filter] [src/envoy/http/mixer/filter.cc:219] Called Mixer::Filter : log
[2019-04-25 00:56:30.951][31][debug][filter] [./src/envoy/http/mixer/report_data.h:132] No dynamic_metadata found for filter envoy.filters.http.rbac

If I override the default retry with my own that uses retryOn: gateway-error, it completely addresses the issue. I would have expected connect-failure to work on it's own, but if nto that, then certainly retriable-status-codes with 503.

I'm new to debugging these systems but happy to check anything else I can provide if guided a little.

[optional Relevant Links:]
Istio issue that led to opening one here as well: istio/istio#13616
This also seems similar to #5876, but I'm not sure if it's really the same issue or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests. Not bugs or questions.help wantedNeeds help!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions