Skip to content

test: Increase timeout to mitigate non-deterministic test failure#3580

Merged
melekes merged 2 commits intodevelopfrom
ismail/issue3576-non-deterministic-test
Apr 22, 2019
Merged

test: Increase timeout to mitigate non-deterministic test failure#3580
melekes merged 2 commits intodevelopfrom
ismail/issue3576-non-deterministic-test

Conversation

@liamsi
Copy link
Contributor

@liamsi liamsi commented Apr 18, 2019

This should fix #3576 (ran it many times locally but only time will tell). The test actually only checked for the opcode of the error. From the name of the test we actually want to test if we see a timeout after a pre-defined time.

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

listener:

 - before this caused the readWriteTimeout to kick in (rarely) while Accept
 - as a side-effect: remove obsolete time.Sleep: in both listener cases
 the Accept will only return after successfully accepting and the timeout
 that is supposed to be tested here will be triggered because there is a
 read without a write
 - see if we actually run into a timeout error (the whole purpose of
 this test AFAIU)

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
@liamsi liamsi requested review from ebuchman, melekes and xla as code owners April 18, 2019 11:02
@codecov-io
Copy link

codecov-io commented Apr 18, 2019

Codecov Report

Merging #3580 into develop will decrease coverage by 0.17%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #3580      +/-   ##
===========================================
- Coverage    64.33%   64.15%   -0.18%     
===========================================
  Files          213      213              
  Lines        17410    17345      -65     
===========================================
- Hits         11201    11128      -73     
+ Misses        5292     5291       -1     
- Partials       917      926       +9
Impacted Files Coverage Δ
p2p/pex/errors.go 11.11% <0%> (-11.12%) ⬇️
p2p/pex/pex_reactor.go 79.44% <0%> (-3.55%) ⬇️
privval/signer_remote.go 80% <0%> (-2%) ⬇️
blockchain/reactor.go 70.56% <0%> (-1.87%) ⬇️
consensus/reactor.go 71.54% <0%> (-1.66%) ⬇️
rpc/client/httpclient.go 66.51% <0%> (-1.13%) ⬇️
p2p/pex/addrbook.go 67% <0%> (-1%) ⬇️
consensus/state.go 78.82% <0%> (-0.59%) ⬇️
proxy/multi_app_conn.go 0% <0%> (ø) ⬆️
blockchain/pool.go 80.92% <0%> (+0.65%) ⬆️
... and 4 more


func TestListenerTimeoutReadWrite(t *testing.T) {
for _, tc := range listenerTestCases(t, time.Second, time.Millisecond) {
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var (
const (

}

if have, want := opErr.Timeout(), true; have != want {
t.Errorf("for %s listener, got unexpected error: have %v, want %v", tc.description, have, want)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("for %s listener, got unexpected error: have %v, want %v", tc.description, have, want)
t.Errorf("for %s listener, got unexpected error: have %v, want timeout error", tc.description, opErr)

t.Errorf("for %s listener, have %v, want %v", tc.description, have, want)
}

if have, want := opErr.Timeout(), true; have != want {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if have, want := opErr.Timeout(), true; have != want {
if !opErr.Timeout() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have this consistent with the cases above (would you also change to if opErr.Op != "read")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this consistent form is that when you reformulate the expectations, you can't forget to change the error reprted. Less error prone and less editing.

Copy link
Contributor

@melekes melekes Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for %s listener, got unexpected error: have false want true
in it's current form does not make sense for person running tests

Copy link
Contributor

@melekes melekes Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xla agree if you're talking about above case where we compare Op == read, but not here

Copy link
Contributor

@xla xla Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it doesn't always produce the most readable output. A balance must be struck.

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>
@xla xla changed the title Increase timeout to mitigate non-deterministic test failure test: Increase timeout to mitigate non-deterministic test failure Apr 18, 2019
Copy link
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing how an accept could take longer than that, but assuming a noisy environment full of little docker whales will be slower than what 50 years of silicon are capable of. The only thing I'd be vary of is that we mask structural issues with the code by just bumping the timeout, if we are sensitive towards that it warrants invesigation, but again this might only be true in the environment our CI runs in.

👍

@melekes melekes merged commit 8db7e74 into develop Apr 22, 2019
@melekes melekes deleted the ismail/issue3576-non-deterministic-test branch April 22, 2019 08:04
@melekes melekes mentioned this pull request May 7, 2019
36 tasks
@melekes melekes mentioned this pull request May 30, 2019
44 tasks
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
…endermint#3580)

This should fix tendermint#3576 (ran it many times locally but only time will tell). The test actually only checked for the opcode of the error. From the name of the test we actually want to test if we see a timeout after a pre-defined time.

## Commits:

* increase readWrite timeout as it is also used in the `Accept` of the tcp
listener:

 - before this caused the readWriteTimeout to kick in (rarely) while Accept
 - as a side-effect: remove obsolete time.Sleep: in both listener cases
 the Accept will only return after successfully accepting and the timeout
 that is supposed to be tested here will be triggered because there is a
 read without a write
 - see if we actually run into a timeout error (the whole purpose of
 this test AFAIU)

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* makee local test-vars `const`

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

## Additional comments:

@xla: Confusing how an accept could take longer than that, but assuming a noisy environment full of little docker whales will be slower than what 50 years of silicon are capable of. The only thing I'd be vary of is that we mask structural issues with the code by just bumping the timeout, if we are sensitive towards that it warrants invesigation, but again this might only be true in the environment our CI runs in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants