fqdn/dnsproxy/proxy_test: increase timeout for DNS TCP client exchanges#12305
Merged
fqdn/dnsproxy/proxy_test: increase timeout for DNS TCP client exchanges#12305
Conversation
Under heavy load, the round-trip-time (RTT) for DNS requests between a TCP client and a DNS proxy may exceed the 100ms timeout specified when creating the client in the dnsproxy tests. This was observed on the test-PR #12298, with a RTT value going up to 296ms (under exceptional memory strain). This might be the cause for the rare flakes reported in #12042. Let's increase this timeout. The timeout is only used a couple of times in the tests, so increasing it by a few hundred milliseconds would have no visible impact. And because we expect all requests from the TCP client to succeed on the L4 anyway (i.e. it should never time out in our tests), this should not prolong at all the execution of tests in the normal case. Let's also retrieve and print the RTT value for that request in case of error, to get more info if this change were not enough to fix the flake. Hopefully fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Member
Author
|
test-focus RuntimePrivilegedUnitTests |
Member
Author
|
test-focus RuntimePrivilegedUnitTests |
aanm
approved these changes
Jun 29, 2020
qmonnet
added a commit
that referenced
this pull request
Jul 21, 2020
Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
rolinh
pushed a commit
that referenced
this pull request
Jul 23, 2020
Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
pchaigno
pushed a commit
that referenced
this pull request
Jul 23, 2020
[ upstream commit fff58ef ] Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
christarazi
pushed a commit
that referenced
this pull request
Jul 23, 2020
[ upstream commit fff58ef ] Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
qmonnet
added a commit
that referenced
this pull request
Jul 30, 2020
[ upstream commit fff58ef ] Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
joestringer
pushed a commit
that referenced
this pull request
Jul 30, 2020
[ upstream commit fff58ef ] Follow-up to #12305, where we raised the timeout from 100ms to 500ms. This seemed to suppress most of the flakes reported in #12042, but we saw one again recently: Try restoring the timeout value to its original value of 1 second. Most of the time the RTT time for the exchange is way below 100ms anyway and we won't have a difference on tests duration. In the worst and very unlikely case where all DNS TCP exchanges are super-slow, we only have 5 exchanges in the tests and cannot spend more than a total 5 seconds on them (or one would timeout and the test fail). Fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Under heavy load, the round-trip-time (RTT) for DNS requests between a TCP client and a DNS proxy may exceed the 100ms timeout specified when creating the client in the dnsproxy tests.
This was observed on the test-PR #12298, with a RTT value going up to 296ms (under exceptional memory strain).
This might be the cause for the rare flakes reported in #12042. Let's increase this timeout. The timeout is only used a couple of times in the tests, so increasing it by a few hundred milliseconds would have no visible impact. And because we expect all requests from the TCP client to succeed on the L4 anyway (i.e. it should never time out in our tests), this should not prolong at all the execution of tests in the normal case.
Note that this timeout was at 1 second originally, and was reduced in commit 2b4badb (
"dnsproxy: Correct tests to account for matchPattern generation. Reduce timeout time").Let's also retrieve and print the RTT value for that request in case of error, to get more info if this change were not enough to fix the flake.
Hopefully fixes: #12042