-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
When trying to use the previous-priorities predicate in a retry handler, Envoy 1.17.0 is returning an invalid-type-URL error even when using the same syntax from the example included in the Envoy documentation. For example, consider the following retry handler snippet in a route:
retry_policy:
num_retries: 1
retry_on: 5xx
retry_priority:
name: envoy.retry_priorities.previous_priorities
typed_config:
'@type': type.googleapis.com/envoy.extensions.retry.priority.previous_priorities.v3.PreviousPrioritiesConfig
update_frequency: 1
This YAML uses the same previous-priorities configuration shown in the Envoy 1.17.0 documentation (https://www.envoyproxy.io/docs/envoy/v1.17.0/intro/arch_overview/http/http_connection_management#arch-overview-http-retry-plugins), but it fails when Envoy starts up:
[2021-01-17 13:01:31.235][91][warning][config] [source/common/config/filesystem_subscription_impl.cc:78] Filesystem config update failure: Unable to parse JSON as proto (INVALID_ARGUMENT:(virtual_hosts[0].routes[2].route.retry_policy.retry_priority.typed_config): invalid value Invalid type URL, unknown type: envoy.extensions.retry.priority.previous_priorities.v3.PreviousPrioritiesConfig for type Any): {"resources":{"name":"local-routes","
@type":"type.googleapis.com/envoy.config.route.v3.RouteConfiguration","virtual_hosts":[{"name":"http","domains":["*"],"routes":[{"match":{"prefix":"/x-tree/F5Monitor.html"},"direct_response":{"body":{"inline_string":"THE SERVER IS UP"},"status":200}},{"route":{"retry_policy":{"retry_priority":{"typed_config":{"update_frequency":1,"@type":"type.googleapis.com/envoy.extensions.retry.priority.previous_priorities.v3.PreviousPrioritiesConfig"},"name":"envoy.retry_priorities.previous_priorities"},"retry_on":"5xx","num_retries":1},"cluster":"healthtest","timeout":"60s"},"match":{"prefix":"/","headers":[{"safe_regex_match":{"google_re2":{},"regex":"POST|PUT|DELETE"},"invert_match":true,"name":":method"}]}},{"route":{"timeout":"60s","retry_policy":{"retry_on":"connect-failure","retry_priority":{"typed_config":{"update_frequency":1,"@type":"type.googleapis.com/envoy.extensions.retry.priority.previous_priorities.v3.PreviousPrioritiesConfig"},"name":"envoy.retry_priorities.previous_priorities"},"num_retries":1},"cluster":"healthtest"},"match":{"prefix":"/","headers":[{"name":":method","safe_regex_match":{"regex":"POST|PUT|DELETE","google_re2":{}},"invert_match":false}]}}]}]}}
This is strange, as the type URL seems to match up with the corresponding proto file - https://github.com/envoyproxy/envoy/blob/main/api/envoy/extensions/retry/priority/previous_priorities/v3/previous_priorities_config.proto. Since update_frequency is effectively a required field (it has a greater-than-zero validation rule defined at https://github.com/envoyproxy/envoy/blob/main/api/envoy/extensions/retry/priority/previous_priorities/v3/previous_priorities_config.proto#L56 but as an int32 will default to zero if not defined), it's not possible to simply skip defining a value here - indeed trying to use previous_priorities without any typed_config will lead a full Envoy crash when constraint validation fails (not sure if a crash is expected in this case):
[2021-01-17 12:53:27.149][81][debug][router] [source/common/router/router.cc:425] [C2][S11559433518512939447] cluster 'healthtest' match for URL '/test'
[2021-01-17 12:53:27.149][81][debug][misc] [source/common/protobuf/utility.cc:266] Proto validation error; throwing Proto constraint validation failed (PreviousPrioritiesConfigValidationError.UpdateFrequency: ["value must be greater than " '\x00']):
[2021-01-17 12:53:27.150][81][critical][main] [source/exe/terminate_handler.cc:13] std::terminate called! (possible uncaught exception, see trace)
[2021-01-17 12:53:27.150][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2021-01-17 12:53:27.150][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:92] Envoy version: 5c801b25cae04f06bf48248c90e87d623d7a6283/1.17.0/Clean/RELEASE/BoringSSL
[2021-01-17 12:53:27.160][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #0: Envoy::TerminateHandler::logOnTerminate()::$_0::operator()() [0x55781a636a7b]
[2021-01-17 12:53:27.170][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:98] #1: [0x55781a6368e9]
[2021-01-17 12:53:27.180][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: std::__terminate() [0x55781acc05c3]
[2021-01-17 12:53:27.191][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::MessageUtil::validate<>() [0x55781946c85f]
[2021-01-17 12:53:27.201][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Extensions::Retry::Priority::PreviousPrioritiesRetryPriorityFactory::createRetryPriority() [0x55781946c4c5]
[2021-01-17 12:53:27.211][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: Envoy::Router::RetryPolicyImpl::retryPriority() [0x55781a3d933d]
[2021-01-17 12:53:27.221][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Router::RetryStateImpl::RetryStateImpl() [0x55781a4053fc]
[2021-01-17 12:53:27.230][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Router::RetryStateImpl::create() [0x55781a405261]
[2021-01-17 12:53:27.240][81][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #8: Envoy::Router::ProdFilter::createRetryState() [0x55781a3cd36f]
I can see the previous_priorities extension is in place in this Envoy binary as expected:
[2021-01-17 12:53:21.350][70][info][main] [source/server/server.cc:325] statically linked extensions:
...
[2021-01-17 12:53:21.350][70][info][main] [source/server/server.cc:327] envoy.retry_priorities: envoy.retry_priorities.previous_priorities
I'm using the Envoy 1.17.0 binary from the envoyproxy/envoy:v1.17.0 image, and from the envoyproxy/envoy-debug:v1.17.0 image when collecting debug output from the stack trace above, so nothing remarkable on that front:
envoy@acfa1297d53b:/$ envoy --version
envoy version: 5c801b25cae04f06bf48248c90e87d623d7a6283/1.17.0/Clean/RELEASE/BoringSSL
Is the example above (and in the retry plugin documentation itself) the correct syntax to utilize the previous-priorities plugin? If so, does this indicate a deeper problem in Envoy with the type URL not existing when it should?
For background here - we're trying to use Envoy as a far more capable reverse-proxy than NGINX, but are running into a challenge with retry handling logic. The simplest version of our goal here can be described with a use case in which we have only two upstreams, one at priority 0 (the "primary upstream") that should handle all traffic by default, and one at priority 1 (the "backup upstream") who should be used when the priority 0 upstream is unavailable. We can achieve this functionality using active health checks, but would like to avoid that as our sole approach - especially because there's a latency before ill health of the first upstream is detected. An ideal approach here is to retry a wide variety of failed idempotent requests against the backup upstream, and to retry a much narrower set of failed non-idempotent requests as well (pretty much just outright connect failures). In NGINX we can use a proxy_next_upstream directive (http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream) to achieve this outcome, since NGINX will simply move on from the failing upstream member to the next available member even if it's a backup member.
In Envoy the first approach I tried was to use the envoy.retry_host_predicates.previous_hosts host predicate in our route's retry policy, to prevent Envoy from simply rerunning the same host selection algorithm and selecting the primary upstream again. However, this leads to failure, with UF and URX response flags confirming that host selection just kept picking the sole priority 0 upstream and the host predicate kept rejecting it.
Given that result I next tried to use the envoy.retry_priorities.previous_priorities priority predicate, planning to dial the update_frequency down to 1 but running into the issue described above. Is there a more elegant way to achieve this outcome?
I can provide more exhaustive YAML config files if it helps, although the behavior should be reasonably clear from the examples above.