Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
4ab0641
router filter: implement hedge_on_per_try_timeout.
mpuncel Feb 27, 2019
ac2e338
remove empty if block
mpuncel Mar 9, 2019
b51e902
null out conn pool stream handle after pool failure
mpuncel Mar 11, 2019
6d8497c
Merge branch 'master' into mpuncel/hedging-impl
mpuncel Mar 11, 2019
19d7651
fix format
mpuncel Mar 11, 2019
6447c06
add bug fix and test for double retries
mpuncel Mar 11, 2019
6763000
Fix "retry only once per upstream request" behavior.
mpuncel Mar 12, 2019
4777143
fix format
mpuncel Mar 12, 2019
140d4ad
fix spelling in a comment
mpuncel Mar 12, 2019
655d126
more build fixes
mpuncel Mar 12, 2019
86d61e5
fix format
mpuncel Mar 12, 2019
a12c667
run resetStream always on upstream requests to ensure callbacks are r…
mpuncel Mar 12, 2019
1617ffc
proxy all upstream metadata downstream
mpuncel Mar 12, 2019
89d0fb4
use for each syntax to loop over upstream request list
mpuncel Mar 13, 2019
02bc109
PR feedback
mpuncel Mar 14, 2019
7b50e77
PR feedback
mpuncel Mar 17, 2019
4b68740
fix upstream_request reference after destruction
mpuncel Mar 18, 2019
71ecc25
remove unnecessary get() for clang-tidy
mpuncel Mar 18, 2019
fce805d
docs fixes
mpuncel Mar 18, 2019
eae3f0a
make downstream watermark callbacks a list in conn manager
mpuncel Mar 19, 2019
c8cd2ce
fix format
mpuncel Mar 19, 2019
7ea1a8a
Merge branch 'master' into mpuncel/hedging-impl
mpuncel Apr 8, 2019
e075531
clarify some docs and comments
mpuncel Apr 8, 2019
df47fff
add HTTP header for enabling hedging on per try timeout
mpuncel Apr 10, 2019
bb6a289
add integration tests for router timeouts including the hedging case
mpuncel Apr 10, 2019
08c093c
fix spelling in comment
mpuncel Apr 10, 2019
12f24b5
clarify documentation
mpuncel Apr 10, 2019
e4df7be
add integration tests using low buffer sizes
mpuncel Apr 10, 2019
56be605
remove unnecessary buffering case
mpuncel Apr 10, 2019
bb83438
merge master
mpuncel Apr 11, 2019
f68fa0c
merge master
mpuncel Apr 16, 2019
01aa62a
Add asserts in upstream watermark callbacks
mpuncel Apr 19, 2019
8fd4297
remove outdated asserts around 1 upstream request at a time
mpuncel Apr 19, 2019
b86bd47
increase waitForReset timeouts in integration tests
mpuncel Apr 19, 2019
88b2c8b
Merge branch 'master' into mpuncel/hedging-impl
mpuncel Apr 19, 2019
59fe7b8
fix assert in upstream watermark callbacks and unit test that doesn't…
mpuncel Apr 19, 2019
899dacc
fix format
mpuncel Apr 19, 2019
035ebbe
Merge branch 'master' into mpuncel/hedging-impl
mpuncel Apr 22, 2019
dc9fff4
fix format
mpuncel Apr 22, 2019
e3b4396
fix typos
mpuncel Apr 22, 2019
3ba2e57
switch to getStringView() for header comparisons
mpuncel Apr 22, 2019
7e41cd1
fix typo
mpuncel Apr 22, 2019
57593fb
fix faulty assumption in assert. it's possible to get a downstream wa…
mpuncel Apr 22, 2019
f84a666
fix format
mpuncel Apr 22, 2019
6eafa00
add test case where downstream watermark callbacks are deregistered f…
mpuncel Apr 22, 2019
1344432
deregister -> unregister
mpuncel Apr 23, 2019
0d7a7fc
initialize final_upstream_request_ to nullptr
mpuncel Apr 23, 2019
acb40e0
Merge branch 'master' into mpuncel/hedging-impl
mpuncel Apr 24, 2019
00f473f
PR feedback
mpuncel Apr 24, 2019
9b3398b
fix format
mpuncel Apr 25, 2019
8ecf854
add stat for count of hedged requests
mpuncel Apr 25, 2019
f4e3b9f
do removal of upstream request from list inside maybeRetryReset
mpuncel Apr 25, 2019
647e18f
for hedged requests, set x-envoy-expected-timeout and grpc-timeout to be
mpuncel Apr 29, 2019
73b8f53
remove hedging stats (for now)
mpuncel Apr 30, 2019
f7aa990
fix format
mpuncel Apr 30, 2019
701ecce
make hedging require a cluster flag
mpuncel Apr 30, 2019
8312bb7
add more test permutations to hedging integration tests.
mpuncel Apr 30, 2019
eea37b3
fix format
mpuncel Apr 30, 2019
0685088
initialize hedge_on_per_try_timeout_ properly and fix router tests
mpuncel Apr 30, 2019
b29d6ca
update docs with allow_request_hedging cluster option
mpuncel Apr 30, 2019
ee1a395
Merge branch 'master' into mpuncel/hedging-impl
mpuncel May 1, 2019
73652ee
fix format
mpuncel May 1, 2019
ca87a4a
revert cluster config option to allow request hedging
mpuncel May 10, 2019
5977860
fix format
mpuncel May 10, 2019
7b721be
Merge branch 'master' into mpuncel/hedging-impl
mpuncel May 10, 2019
a9db675
fix logical merge where upstream_headers_ isn't always set when heade…
mpuncel May 11, 2019
6d8e7be
fix case where rq_error_ would not be incremented if other requests a…
mpuncel May 11, 2019
35ce081
fix format
mpuncel May 11, 2019
ec80324
fix handling of resets
mpuncel May 13, 2019
f2c645c
fix pending_retries counting
mpuncel May 13, 2019
67b7fa9
fix format
mpuncel May 13, 2019
fcd8ba7
partial PR feedback
mpuncel May 14, 2019
3dafb66
fix bug in doRetry() checking for an immediate reset that could resul…
mpuncel May 14, 2019
3d0f2a9
fix spelling
mpuncel May 14, 2019
488da68
Merge branch 'master' into mpuncel/hedging-impl
mpuncel May 22, 2019
5b92d57
address PR feedback
mpuncel May 22, 2019
80cf6a1
Clean up unnecessary final_upstream_request_ checks.
mpuncel May 23, 2019
e1b4e9d
Merge branch 'master' into mpuncel/hedging-impl
mpuncel May 23, 2019
d364c47
fix format and missing null guard
mpuncel May 23, 2019
1847f33
fix redirect
mpuncel May 24, 2019
5977197
fix spelling/clang tidy errors
mpuncel May 24, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions api/envoy/api/v2/route/route.proto
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,6 @@ message VirtualHost {
// Indicates the hedge policy for all routes in this virtual host. Note that setting a
// route level entry will take precedence over this config and it'll be treated
// independently (e.g.: values are not inherited).
// [#not-implemented-hide:]
HedgePolicy hedge_policy = 17;
}

Expand Down Expand Up @@ -803,7 +802,6 @@ message RouteAction {
// Indicates that the route has a hedge policy. Note that if this is set,
// it'll take precedence over the virtual host level hedge policy entirely
// (e.g.: policies are not merged, most internal one becomes the enforced policy).
// [#not-implemented-hide:]
HedgePolicy hedge_policy = 27;
}

Expand Down Expand Up @@ -899,22 +897,28 @@ message RetryPolicy {
RetryBackOff retry_back_off = 8;
}

// HTTP request hedging TODO(mpuncel) docs
// [#not-implemented-hide:]
// HTTP request hedging :ref:`architecture overview <arch_overview_http_routing_hedging>`.
message HedgePolicy {
// Specifies the number of initial requests that should be sent upstream.
// Must be at least 1.
// Defaults to 1.
// [#not-implemented-hide:]
google.protobuf.UInt32Value initial_requests = 1 [(validate.rules).uint32.gte = 1];

// Specifies a probability that an additional upstream request should be sent
// on top of what is specified by initial_requests.
// Defaults to 0.
// [#not-implemented-hide:]
envoy.type.FractionalPercent additional_request_chance = 2;

// Indicates that a hedged request should be sent when the per-try timeout
// is hit. This will only occur if the retry policy also indicates that a
// timed out request should be retried. Defaults to false.
// timed out request should be retried.
// Once a timed out request is retried due to per try timeout, the router
// filter will ensure that it is not retried again even if the returned
// response headers would otherwise be retried according the specified
// :ref:`RetryPolicy <envoy_api_msg_route.RetryPolicy>`.
// Defaults to false.
bool hedge_on_per_try_timeout = 3;
}

Expand Down
12 changes: 12 additions & 0 deletions docs/root/configuration/http_filters/router_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,18 @@ requests. This timeout must be <= the global route timeout (see
caller to set a tight per try timeout to allow for retries while maintaining a reasonable overall
timeout.

x-envoy-hedge-on-per-try-timeout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Setting this header on egress requests will cause Envoy to use a request
hedging strategy in the case of a per try timeout. This overrides the value set
in the :ref:`route configuration
<envoy_api_field_route.HedgePolicy.hedge_on_per_try_timeout>`. This means that a retry
will be issued without resetting the original request, leaving multiple upstream requests
in flight.

The value of the header should be "true" or "false", and is ignored if invalid.

.. _config_http_filters_router_x-envoy-immediate-health-check-fail:

x-envoy-immediate-health-check-fail
Expand Down
22 changes: 22 additions & 0 deletions docs/root/intro/arch_overview/http_routing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ request. The router filter supports the following features:
* Request timeout specified either via :ref:`HTTP
header <config_http_filters_router_headers_consumed>` or via :ref:`route configuration
<envoy_api_field_route.RouteAction.timeout>`.
* :ref:`Request hedging <arch_overview_http_routing_hedging>` for retries in response to a request (per try) timeout.
* Traffic shifting from one upstream cluster to another via :ref:`runtime values
<envoy_api_field_route.RouteMatch.runtime_fraction>` (see :ref:`traffic shifting/splitting
<config_http_conn_man_route_table_traffic_splitting>`).
Expand Down Expand Up @@ -87,6 +88,27 @@ headers <config_http_filters_router_headers_consumed>`. The following configurat
Note that retries may be disabled depending on the contents of the :ref:`x-envoy-overloaded
<config_http_filters_router_x-envoy-overloaded_consumed>`.

.. _arch_overview_http_routing_hedging:

Request Hedging
---------------

Envoy supports request hedging which can be enabled by specifying a :ref:`hedge
policy <envoy_api_msg_route.HedgePolicy>`. This means that Envoy will race
multiple simultaneous upstream requests and return the response associated with
the first acceptable response headers to the downstream. The retry policy is
used to determine whether a response should be returned or whether more
responses should be awaited.

Currently hedging can only be performed in response to a request timeout. This
means that a retry request will be issued without canceling the initial
timed-out request and a late response will be awaited. The first "good"
response according to retry policy will be returned downstream.

The implementation ensures that the same upstream request is not retried twice.
This might otherwise occur if a request times out and then results in a 5xx
response, creating two retriable events.

.. _arch_overview_http_routing_priority:

Priority routing
Expand Down
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Version history
:ref:`buffer_flush_timeout <envoy_api_field_config.filter.network.redis_proxy.v2.RedisProxy.ConnPoolSettings.buffer_flush_timeout>` to control how quickly the buffer is flushed if it is not full.
* router: add support for configuring a :ref:`grpc timeout offset <envoy_api_field_route.RouteAction.grpc_timeout_offset>` on incoming requests.
* router: added ability to control retry back-off intervals via :ref:`retry policy <envoy_api_msg_route.RetryPolicy.RetryBackOff>`.
* router: added ability to issue a hedged retry in response to a per try timeout via a :ref:`hedge policy <envoy_api_msg_route.HedgePolicy>`.
* router: added a route name field to each http route in route.Route list
* router: per try timeouts will no longer start before the downstream request has been received
in full by the router. This ensures that the per try timeout does not account for slow
Expand Down
1 change: 1 addition & 0 deletions include/envoy/http/header_map.h
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ class HeaderEntry {
HEADER_FUNC(EnvoyExpectedRequestTimeoutMs) \
HEADER_FUNC(EnvoyExternalAddress) \
HEADER_FUNC(EnvoyForceTrace) \
HEADER_FUNC(EnvoyHedgeOnPerTryTimeout) \
HEADER_FUNC(EnvoyImmediateHealthCheckFail) \
HEADER_FUNC(EnvoyInternalRequest) \
HEADER_FUNC(EnvoyIpTags) \
Expand Down
23 changes: 23 additions & 0 deletions include/envoy/router/router.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,16 @@ class RetryState {
virtual RetryStatus shouldRetryHeaders(const Http::HeaderMap& response_headers,
DoRetryCallback callback) PURE;

/**
* Determines whether given response headers would be retried by the retry policy, assuming
* sufficient retry budget and circuit breaker headroom. This is useful in cases where
* the information about whether a response is "good" or not is useful, but a retry should
* not be attempted for other reasons.
* @param response_headers supplies the response headers.
* @return bool true if a retry would be warranted based on the retry policy.
*/
virtual bool wouldRetryFromHeaders(const Http::HeaderMap& response_headers) PURE;

/**
* Determine whether a request should be retried after a reset based on the reason for the reset.
* @param reset_reason supplies the reset reason.
Expand All @@ -268,6 +278,19 @@ class RetryState {
virtual RetryStatus shouldRetryReset(const Http::StreamResetReason reset_reason,
DoRetryCallback callback) PURE;

/**
* Determine whether a "hedged" retry should be sent after the per try
* timeout expires. This means the original request is not canceled, but a
* new one is sent to hedge against the original request taking even longer.
* @param callback supplies the callback that will be invoked when the retry should take place.
* This is used to add timed backoff, etc. The callback will never be called
* inline.
* @return RetryStatus if a retry should take place. @param callback will be called at some point
* in the future. Otherwise a retry should not take place and the callback will never be
* called. Calling code should proceed with error handling.
*/
virtual RetryStatus shouldHedgeRetryPerTryTimeout(DoRetryCallback callback) PURE;

/**
* Called when a host was attempted but the request failed and is eligible for another retry.
* Should be used to update whatever internal state depends on previously attempted hosts.
Expand Down
1 change: 1 addition & 0 deletions source/common/http/conn_manager_utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ Network::Address::InstanceConstSharedPtr ConnectionManagerUtility::mutateRequest
request_headers.removeEnvoyForceTrace();
request_headers.removeEnvoyIpTags();
request_headers.removeEnvoyOriginalUrl();
request_headers.removeEnvoyHedgeOnPerTryTimeout();

for (const LowerCaseString& header : route_config.internalOnlyHeaders()) {
request_headers.remove(header);
Expand Down
1 change: 1 addition & 0 deletions source/common/http/headers.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ class HeaderValues {
const LowerCaseString EnvoyDownstreamServiceNode{"x-envoy-downstream-service-node"};
const LowerCaseString EnvoyExternalAddress{"x-envoy-external-address"};
const LowerCaseString EnvoyForceTrace{"x-envoy-force-trace"};
const LowerCaseString EnvoyHedgeOnPerTryTimeout{"x-envoy-hedge-on-per-try-timeout"};
const LowerCaseString EnvoyImmediateHealthCheckFail{"x-envoy-immediate-health-check-fail"};
const LowerCaseString EnvoyOriginalUrl{"x-envoy-original-url"};
const LowerCaseString EnvoyInternalRequest{"x-envoy-internal"};
Expand Down
11 changes: 11 additions & 0 deletions source/common/router/retry_state_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,17 @@ RetryStatus RetryStateImpl::shouldRetryReset(Http::StreamResetReason reset_reaso
return shouldRetry(wouldRetryFromReset(reset_reason), callback);
}

RetryStatus RetryStateImpl::shouldHedgeRetryPerTryTimeout(DoRetryCallback callback) {
// A hedged retry on per try timeout is always retried if there are retries
// left. NOTE: this is a bit different than non-hedged per try timeouts which
// are only retried if the applicable retry policy specifies either
// RETRY_ON_5XX or RETRY_ON_GATEWAY_ERROR. This is because these types of
// retries are associated with a stream reset which is analogous to a gateway
// error. When hedging on per try timeout is enabled, however, there is no
// stream reset.
return shouldRetry([]() -> bool { return true; }, callback);
}

bool RetryStateImpl::wouldRetryFromHeaders(const Http::HeaderMap& response_headers) {
if (response_headers.EnvoyOverloaded() != nullptr) {
return false;
Expand Down
5 changes: 4 additions & 1 deletion source/common/router/retry_state_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,12 @@ class RetryStateImpl : public RetryState {
bool enabled() override { return retry_on_ != 0; }
RetryStatus shouldRetryHeaders(const Http::HeaderMap& response_headers,
DoRetryCallback callback) override;
// Returns true if the retry policy would retry the passed headers. Does not
// take into account circuit breaking or remaining tries.
bool wouldRetryFromHeaders(const Http::HeaderMap& response_headers) override;
RetryStatus shouldRetryReset(const Http::StreamResetReason reset_reason,
DoRetryCallback callback) override;
RetryStatus shouldHedgeRetryPerTryTimeout(DoRetryCallback callback) override;

void onHostAttempted(Upstream::HostDescriptionConstSharedPtr host) override {
std::for_each(retry_host_predicates_.begin(), retry_host_predicates_.end(),
Expand Down Expand Up @@ -75,7 +79,6 @@ class RetryStateImpl : public RetryState {
void enableBackoffTimer();
void resetRetry();
bool wouldRetryFromReset(const Http::StreamResetReason reset_reason);
bool wouldRetryFromHeaders(const Http::HeaderMap& response_headers);
RetryStatus shouldRetry(bool would_retry, DoRetryCallback callback);

const Upstream::ClusterInfo& cluster_;
Expand Down
Loading