Skip to content

Global timeouts on route not working when max_stream_duration is set  #19073

@agrawroh

Description

@agrawroh

Description

I don't know if this is intended or not but it seems that the global timeouts set on the route via timeout [Link] doesn't work when max_stream_duration [Link] is also set.

Documentation on either of these configs doesn't mention that they can't be used together but, looking at the code it seems to be the case.

https://github.com/envoyproxy/envoy/blob/main/source/common/router/router.cc#L137

if (!route.usingNewTimeouts()) {
  ...
}

What's a little confusing/surprising to us is the fact that setting the global timeouts using the header x-envoy-upstream-rq-timeout-ms still works even when max_stream_duration is supplied.

https://github.com/envoyproxy/envoy/blob/main/source/common/router/router.cc#L167

if (respect_expected_rq_timeout) {
  ...
}

Is this behavior intended? If this is indeed the intention then we should update the docs and clarify the behavior of x-envoy-upstream-rq-timeout-ms.

Use Case

We have a mix of streaming and non-streaming routes where we are setting both the timeout as well as max_stream_duration Current Configuration: (max_stream_duration = timeout + 240s).

We were expecting that the non-streaming routes won't have any impact from adding max_stream_duration and these would still continue to return 504 status codes once the global timeout get exhausted. Streaming route on the other hand would benefit from setting max_stream_duration as global timeout isn't very useful on these routes.

But after the change of adding a higher max_stream_duration on all the routes, the non-streaming routes started returning a 408 status code, not respecting the global timeouts being set via timeout config. Since it changed the final status code being returned from 504 --> 408, it started breaking some of our existing clients who have a specific post-processing logic for handling/retrying on the 504 status codes.

For now we have mitigated the issue by setting the x-envoy-upstream-rq-timeout-ms header on these routes as this header is still respected irrespective of whether or not the max_stream_duration is set on the routes. We would like to understand the correct/long term fix for this.

Repro Steps

Try to set both the timeout as well as max_stream_duration on any of your route definitions. Even you you set a lower global timeout (say 60s) and a higher stream timeout (say 300s), the global timeout timer will never fire and you'll end up getting a 408 status code after 300s when the stream timeout timer fires.

{
  "cluster": "authN",
  "timeout": "60s",
  "idle_timeout": "3600s",
  "max_stream_duration": "{\"max_stream_duration\": \"300s\"}",
  ...
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions