Skip to content

router: add x-envoy-overloaded header#1793

Merged
mattklein123 merged 2 commits intomasterfrom
overloaded_header
Oct 3, 2017
Merged

router: add x-envoy-overloaded header#1793
mattklein123 merged 2 commits intomasterfrom
overloaded_header

Conversation

@mattklein123
Copy link
Copy Markdown
Member

This change reduces retry explosion:

  1. Retry is never performed if local circuit breaking occurred.
  2. If traffic was dropped (circuit breaking or maintenance mode),
    Envoy will set an x-envoy-overloaded header. (Right now it is
    set to "true" which has no real meaning. We might define more
    values in the future).
  3. Envoy will not retry if it sees the x-envoy-overloaded header
    set by upstream. Of course this header can be propagated
    downstream by the caller if desired and application other than
    Envoy can set this header. However, even in the Envoy <-> Envoy
    single hop case this can reduce retry explosion when under
    duress.

One downside of this change is that a single bad upstream might
create a situation in which retries do not happen that might
otherwise succeed. In general, it seems better to err on the side
of less retries and bad single hosts should be ejected via proper
outlier detection policies.

Signed-off-by: Matt Klein mklein@lyft.com

This change reduces retry explosion:
1) Retry is never performed if local circuit breaking occurred.
2) If traffic was dropped (circuit breaking or maintenance mode),
   Envoy will set an x-envoy-overloaded header. (Right now it is
   set to "true" which has no real meaning. We might define more
   values in the future).
3) Envoy will not retry if it sees the x-envoy-overloaded header
   set by upstream. Of course this header can be propagated
   downstream by the caller if desired and application other than
   Envoy can set this header. However, even in the Envoy <-> Envoy
   single hop case this can reduce retry explosion when under
   duress.

One downside of this change is that a single bad upstream might
create a situation in which retries do not happen that might
otherwise succeed. In general, it seems better to err on the side
of less retries and bad single hosts should be ejected via proper
outlier detection policies.

Signed-off-by: Matt Klein <mklein@lyft.com>
return RetryStatus::NoOverflow;
}

if (!runtime_.snapshot().featureEnabled("upstream.use_retry", 100)) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved for perf reasons only. No point in doing lookup unless we are actually going to retry.

danielhochman
danielhochman previously approved these changes Oct 2, 2017
htuch
htuch previously approved these changes Oct 3, 2017
Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits, feel free to ship whenevs.

bool RetryStateImpl::wouldRetry(const Http::HeaderMap* response_headers,
const Optional<Http::StreamResetReason>& reset_reason) {
// We never retry if the overloaded header is set.
if (response_headers && response_headers->EnvoyOverloaded() != nullptr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: the first clause is a pointer and relies on implicit null checking, the second is also presumably a pointer and does an explicit nullptr comparison. So, might be nice to be consistent there.

// This is a customized version of send local reply that allows us to set the overloaded
// header.
Http::Utility::sendLocalReply(
[&](Http::HeaderMapPtr&& headers, bool end_stream) -> void {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer explicit capture.

upstream_host->stats().rq_error_.inc();
}
Http::Utility::sendLocalReply(*callbacks_, stream_destroyed_, code, body);
sendLocalReply(code, body, dropped);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some other sites for sendLocalReply in this file, but maybe not all apply.

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123 mattklein123 dismissed stale reviews from htuch and danielhochman via 2cfc90e October 3, 2017 15:23
@mattklein123
Copy link
Copy Markdown
Member Author

@htuch updated

@mattklein123 mattklein123 merged commit 659ce34 into master Oct 3, 2017
@mattklein123 mattklein123 deleted the overloaded_header branch October 3, 2017 16:22
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
mathetake pushed a commit that referenced this pull request Mar 3, 2026
**Description**

Extend time-to-first-token histogram bucket boundaries from 10s max to
60s max
to capture slow responses from upstream providers. Adds buckets: 15.0,
20.0, 30.0, 45.0, 60.0.

Signed-off-by: Vein Kong <vk@modular.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants