Algod Importer: Longer Timeout#133
Conversation
Codecov Report
@@ Coverage Diff @@
## master #133 +/- ##
==========================================
+ Coverage 67.66% 70.53% +2.86%
==========================================
Files 32 36 +4
Lines 1976 2545 +569
==========================================
+ Hits 1337 1795 +458
- Misses 570 653 +83
- Partials 69 97 +28
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Co-authored-by: Will Winder <wwinder.unh@gmail.com>
|
|
||
| func (e *SyncError) Error() string { | ||
| return fmt.Sprintf("wrong round returned from status for round: %d != %d", e.rnd, e.expected) | ||
| return fmt.Sprintf("wrong round returned from status for round: retrieved(%d) != expected(%d): %v", e.retrievedRound, e.expectedRound, e.err) |
There was a problem hiding this comment.
@shiqizng - as you pointed out yesterday, at face value it's hard to tell which round was which so I've added "retrieved" for the value gotten back from the endpoint vs. "expected" which is the importer's expectation.
winder
left a comment
There was a problem hiding this comment.
Looks good, just had a couple of nits.
Co-authored-by: Will Winder <wwinder.unh@gmail.com>
| } | ||
| noError = noError && assert.True(t, found, "(%s) Expected log was not found: '%s'", tc.name, log) | ||
| if !noError { | ||
| fmt.Printf(">>>>>WE HAVE A PROBLEM<<<<<<\n") |
There was a problem hiding this comment.
Make it a little easier to find the failing needle in the printouts haystack since we aren't using require assertions.
There was a problem hiding this comment.
I think you should use !found as the condition, in tests that provide multiple log entries would print WE HAVE A PROBLEM for all checks after the first one.
winder
left a comment
There was a problem hiding this comment.
LGTM, just one suggestion for the WE HAVE A PROBLEM condition.
Change
waitForRoundTimeoutto 30 from 5SyncErrortypeExplanation for the timeout change
During a recent EC2 import test, I noticed that after catching up, around 1% of imports are reporting an error/warning:
{ "__type":"importer", "_name":"algod", "level":"warning", "msg":"Sync error detected, attempting to set the sync round to recover the node: wrong round returned from status for round: 0 != 31117945", "time":"2023-08-07T15:31:16.166545875Z" } { "__type":"Conduit", "_name":"main", "level":"error", "msg":"wrong round returned from status for round: 0 != 31117945", "time":"2023-08-07T15:31:16.166783576Z" }The
pipeliningbranch, which added some more information to theSyncErrorerrored similarly with the warning msg:So we can see that we're getting a timeout error. This is to be expected on occasion as when calling algod's
StatusAfterBlock(), a context is provided with a 5 sec timeout.Having carried out some statistical analysis on the last 1 million rounds, I got the following results:
So if we re-set
waitForRoundTimeout = 30, we're likely to encounter 0 such timeouts, or at most a small number.On the other hand, algod's own timeout is 60 secs so we are steering quite far from the upper bound.