Skip to content

router: set correct timeout for egress->ingress envoys#8051

Merged
mattklein123 merged 14 commits intoenvoyproxy:masterfrom
nezdolik:fix-timeout
Oct 8, 2019
Merged

router: set correct timeout for egress->ingress envoys#8051
mattklein123 merged 14 commits intoenvoyproxy:masterfrom
nezdolik:fix-timeout

Conversation

@nezdolik
Copy link
Copy Markdown
Member

@nezdolik nezdolik commented Aug 27, 2019

Signed-off-by: Kateryna Nezdolii nezdolik@spotify.com

Description: In case of egress->ingress envoy setup, ingress envoy currently does not respect x-envoy-expected-rq-timeout-ms header, set by egress envoy and overrides the header with it's own timeout value. This change makes ingress envoy to respect x-envoy-expected-rq-timeout-ms header value, if it's present in request.
Risk Level: Low
Testing: Added unit and integration test to make sure header is not sanitised and not ignored.
Docs Changes: Updated API v2 docs
Release Notes: Updated version history
Fixes #7358

@snowp snowp self-assigned this Aug 27, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a good start

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need this to be guarded by a config flag as this could potentially break existing deployments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp, you mean by runtime guarding or by simple bool config property?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I think we've been doing xds config for these kind of changes, but I guess this case could be considered a bug fix. @mattklein123 do you have any thoughts around what kind of feature guarding we'll need for this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My quick thought is this should probably be a full on config option in the router?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the route timeout and the expected-rq-timeout-ms is the same in this test case, this line doesn't really verify that we're honoring the expected-rq-timeout-ms. You probably want to make these values different so that you can verify that we're capping the global timeout by the incoming deadline

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worthwhile to add an integration test to verify that we actually have this header at this point due to all the possible header sanitization that happens at various points during request processing

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

@nezdolik
Copy link
Copy Markdown
Member Author

nezdolik commented Sep 2, 2019

added config guard, may need to fix tests later.

@stale
Copy link
Copy Markdown

stale bot commented Sep 11, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 11, 2019
@nezdolik
Copy link
Copy Markdown
Member Author

still working on this (was on vacation).

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Sep 17, 2019
@snowp snowp added no stalebot Disables stalebot from closing an issue waiting labels Sep 20, 2019
Kateryna Nezdolii added 4 commits September 23, 2019 19:17
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8051 was synchronize by nezdolik.

see: more, trace.

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Kateryna Nezdolii added 2 commits September 23, 2019 21:56
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik
Copy link
Copy Markdown
Member Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #8051 (comment) was created by @nezdolik.

see: more, trace.

@nezdolik nezdolik changed the title WIP router: set correct timeout for egress->ingress envoys router: set correct timeout for egress->ingress envoys Sep 24, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good overall, just a few comments

]
}];

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "will first check if the `x-envoy"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use *x-envoy-expected-timeout-ms* formatting for headers.

Http::HeaderEntry* header_expected_timeout_entry =
request_headers.EnvoyExpectedRequestTimeoutMs();
if (header_expected_timeout_entry) {
// This will prevent from overriding `x-envoy-expected-rq-timeout-ms` header.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? won't this be set based on the global_timeout_ later on which should match the value we're extracting from the expected-rq-timeout-ms header?

Copy link
Copy Markdown
Member Author

@nezdolik nezdolik Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp this was protection against this code path:

  if (insert_envoy_expected_request_timeout_ms && expected_timeout > 0) {
    request_headers.insertEnvoyExpectedRequestTimeoutMs().value(expected_timeout);
  }

We do indeed derive timeout and put it into a separate data structure (with global timeout), so there is not a big gain from setting insert_envoy_expected_request_timeout_ms to false.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed my mind, there is a gain, so that we use same value in timeout.global_timeout (derived from x-envoy-expected-timeout-ms) and observe same value in header x-envoy-expected-timeout-ms by not overriding it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we want the expected header to reflect the timeout used by the router, which is affected by more than just those two headers. If you look further down in that function you'll see that we'll use the per try timeout instead of the global timeout if it's set. It seems like we want the expected timeout header to always reflect the timeout enforced by the router for the outgoing request, and just use the incoming expected timeout header to infer the global timeout. Does that make sense?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp it does, thanks for clarification

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will adjust the test as well

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
// `x-envoy-upstream-rq-timeout-ms` header.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is completely correct, as there might not be a rq-timeout header in which case the route specified timeout is used. Might be better to avoid listing what the behavior is without this flag to help ensure that this comment doesn't have to be updated whenever the timeout decision logic changes

ASSERT_TRUE(upstream_request_->waitForEndStream(*dispatcher_));

// Trigger global timeout, populated from `x-envoy-expected-rq-timeout-ms` header.
timeSystem().sleep(std::chrono::milliseconds(501));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the upstream-rq-timeout header is greater than the expected-rq-timeout, you can't tell based on this test that the timeout is due to upstream-rq-timeout. I suggest making upstream-rq-timeout 300ms instead so that we know we're timing out before the expected-rq-timeout timer would hit

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp made x-envoy-upstream-rq-timeout-ms smaller than x-envoy-expected-rq-timeout-ms

]
}];

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use *x-envoy-expected-timeout-ms* formatting for headers.


// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you :ref: internal link to the relevant field in the API docs?


// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/envoy/Envoy/g

Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@nezdolik nezdolik requested review from htuch and snowp September 27, 2019 11:19
Kateryna Nezdolii added 2 commits September 27, 2019 14:05
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@mattklein123 mattklein123 removed the no stalebot Disables stalebot from closing an issue label Sep 27, 2019
snowp
snowp previously approved these changes Sep 30, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, this LGTM

@snowp snowp added the api-review-required API review required by @envoyproxy/api-shepherds label Sep 30, 2019
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great work. Just a few small comments. Great feature!

/wait

Comment on lines +164 to +167
if (header_expected_timeout_entry) {
if (absl::SimpleAtoi(header_expected_timeout_entry->value().getStringView(),
&header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you lift this part out into a helper which takes the header, etc.? It's repeated 3 times in this function and it would help readability.

Kateryna Nezdolii added 2 commits October 7, 2019 11:26
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@mattklein123 mattklein123 merged commit 3f7b132 into envoyproxy:master Oct 8, 2019
nandu-vinodan pushed a commit to nandu-vinodan/envoy that referenced this pull request Oct 17, 2019
Signed-off-by: Kateryna Nezdolii <nezdolik@spotify.com>
@Sooryaa-A
Copy link
Copy Markdown

hi All,
Im trying to send x-envoy-expected-rq-timeout-ms from my application to egress envoy and expecting envoy to use the timeout set in this header to be used for upstream connection request timeout.But the upstream connection request is timing out after the route timeout -30s value only.

Attaching my config below.

static_resources:
listeners:
- address:
socket_address:
protocol: TCP
address: 10.10.180.90
port_value: 9001
filter_chains:
- filters:
name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: upstream_listener
route_config:
name: upstream_listener
virtual_hosts:
- name: upstream_listener
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: server
timeout: 30s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
respect_expected_rq_timeout: true
suppress_envoy_headers:true
use_remote_address: true

@nezdolik
Copy link
Copy Markdown
Member Author

@Sooryaa-A please create a dedicated issue in envoy repo

@nezdolik nezdolik deleted the fix-timeout branch January 12, 2022 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review-required API review required by @envoyproxy/api-shepherds

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Propagate downstream timeout to upstream through multiple Envoys

5 participants