honour routes timeout if max stream duration is set with out stream duration#15585
honour routes timeout if max stream duration is set with out stream duration#15585alyssawilk merged 15 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@mattklein123 @markdroth PTAL |
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@adamsjob FYI |
|
Assigning over to @alyssawilk who I think reviewed the other timeout PRs. |
alyssawilk
left a comment
There was a problem hiding this comment.
Thanks for catching and fixing this!
source/common/router/router.cc
Outdated
| } else { | ||
| timeout.global_timeout_ = route.timeout(); | ||
| } | ||
| } else { |
There was a problem hiding this comment.
When we switched timeouts, we moved the instantiation of the timeout to
ConnectionManagerImpl::ActiveStream::refreshDurationTimeout()
Mind moving this logic there?
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
| // Test timeout when route timeout needs to be used if max_stream_duration is not specified. | ||
| // Regression test for https://github.com/envoyproxy/envoy/issues/15530. | ||
| // TODO(ramaraochavali): move this test to DurationTimeout test. | ||
| TEST_F(HttpConnectionManagerImplTest, DurationTimeoutFromRouteTimeout) { |
There was a problem hiding this comment.
I had to create a separate test for this because clusterInfo is repeatedly called and it keeps triggering the clearRouteCache so the timer gets fired multiple times. The same problem exists for https://github.com/envoyproxy/envoy/blob/main/test/common/http/conn_manager_impl_test.cc#L2173 where timer is repeatedly enabled. Having two such tests in the same test is not working. So I made it as separate test and TODO to look at the issue later.
|
@alyssawilk moved to |
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
Alyssa knows this code a lot better than I do, so I'll let her review. |
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@alyssawilk can you PTAL when you get chance? Also the failure seems not related to this change |
| // If we are using new timeouts and MaxStreamDuration is not set, use the route's timeout | ||
| // value. This is the case when route's MaxStreamDuration is specified without inner | ||
| // MaxStreamDuration i.e. only grpc_timeout_header_max is set. | ||
| if (!route->maxStreamDuration()) { |
There was a problem hiding this comment.
so if grpc_timeout_header_max is set when do we use it? usingNewTimeouts would be true and we'd ignore all the code below, no?
Can you please fix and regression test?
There was a problem hiding this comment.
Oh, Yeah.. Missed that. Will add a test
There was a problem hiding this comment.
@alyssawilk Can you PTAL at the fix to see if this is correct? One thing I found was this needs changes to most of the test cases because now we will be initiating request timer most of the times. So I just want to ensure this is the right direction before I make changes the tests(some of which I have already done - please take a look at the changes)
| auto grpc_timeout = Grpc::Common::getGrpcTimeout(*request_headers_); | ||
| std::chrono::milliseconds timeout; | ||
| bool disable_timer = false; | ||
| auto grpc_timeout = Grpc::Common::getGrpcTimeout(*request_headers_); |
There was a problem hiding this comment.
Offhand I recall debate around the original PR about how the new fields should be applied and it was the consensus from the gRPC folks that the difference wasn't "is this a gRPC request or not" but "is the gRPC timeout header present or not" so I think switching to this is against the intent of the folks who added this to the API.
There was a problem hiding this comment.
Got it. This is good to know. I will change it accordingly
There was a problem hiding this comment.
Awesome. It's super complicated and confusing, so thanks for continuing to plug away at it!
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
|
@alyssawilk Sorry for the delay. Fixed. PTAL. I have added a regression test for this and left a TODO to merge to DurationTimeout test later as I am having some issues with mock timer (even for existing tests also this problem exists). |
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
antoniovicente
left a comment
There was a problem hiding this comment.
Thanks for the fix to timeout config handling.
|
@alyssawilk Can you PTAL when you get chance? |
alyssawilk
left a comment
There was a problem hiding this comment.
Ah sorry, I thought you meant to address the test TODO in this PR. I'm fine with it being a follow-up!
|
@alyssawilk @antoniovicente one thing I realized is when timeout happens with new timeouts we call https://github.com/envoyproxy/envoy/blob/main/source/common/http/conn_manager_impl.cc#L762 and that sets http status to 408 (RequestTimeout) instead of gateway timeout (504) that router sets. Is this expected ? or do we need to change it to return 504 for timeouts? if it is expected - should we document it some where? |
Adds a runtime flag for disabling the behavior introduced in envoyproxy#15585 Signed-off-by: Snow Pettersen <snowp@lyft.com>
…stream duration (envoyproxy#15585)" This reverts commit ffa1680. Signed-off-by: Snow Pettersen <snowp@lyft.com>
|
FYI this broke web sockets for us (by forcing them to terminate earlier), wondering if we can contribute more integration tests here to catch these sort of regressions. |
…stream duration (envoyproxy#15585)" (envoyproxy#16133) This reverts commit ffa1680. Signed-off-by: Snow Pettersen <snowp@lyft.com> Signed-off-by: Gokul Nair <gnair@twitter.com>
Commit Message: honour routes timeout if maxstream duration is set with out specifying max stream duration inside it.
Additional Description:
Risk Level: Low
Testing: Added
Docs Changes: N/A
Release Notes: N/A (There is already a bug fix mentioned for MaxStreamDuration in current.rst)
Platform Specific Features: N/A
Fixes #15530