Skip to content

Algod Importer: Longer Timeout#133

Merged
tzaffi merged 16 commits intoalgorand:masterfrom
tzaffi:algod-importer-longer-timeout
Aug 11, 2023
Merged

Algod Importer: Longer Timeout#133
tzaffi merged 16 commits intoalgorand:masterfrom
tzaffi:algod-importer-longer-timeout

Conversation

@tzaffi
Copy link
Contributor

@tzaffi tzaffi commented Aug 9, 2023

Change waitForRoundTimeout to 30 from 5

  1. changing the amount of time to wait for the next round
  2. adding more context to the SyncError type

Explanation for the timeout change

During a recent EC2 import test, I noticed that after catching up, around 1% of imports are reporting an error/warning:

{
"__type":"importer",
"_name":"algod",
"level":"warning",
"msg":"Sync error detected, attempting to set the sync round to recover the node: wrong round returned from status for round: 0 != 31117945",
"time":"2023-08-07T15:31:16.166545875Z"
}
{
"__type":"Conduit",
"_name":"main",
"level":"error",
"msg":"wrong round returned from status for round: 0 != 31117945",
"time":"2023-08-07T15:31:16.166783576Z"
}

The pipelining branch, which added some more information to the SyncError errored similarly with the warning msg:

importer algod.GetBlock() sync error detected, attempting to set the sync round to recover the node: wrong round returned from status for round: 31112472 != 31112473: status2.LastRound mismatch: context deadline exceeded

So we can see that we're getting a timeout error. This is to be expected on occasion as when calling algod's StatusAfterBlock(), a context is provided with a 5 sec timeout.

Having carried out some statistical analysis on the last 1 million rounds, I got the following results:

mean_duration stdev_duration min_duration max_duration 0-4 5-9 10-14 15-19 20-24
3.400925599 0.507927656 0 22 999803 148 0 0 50

So if we re-set waitForRoundTimeout = 30, we're likely to encounter 0 such timeouts, or at most a small number.

On the other hand, algod's own timeout is 60 secs so we are steering quite far from the upper bound.

@codecov
Copy link

codecov bot commented Aug 9, 2023

Codecov Report

Merging #133 (b9fd399) into master (442791a) will increase coverage by 2.86%.
Report is 48 commits behind head on master.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage   67.66%   70.53%   +2.86%     
==========================================
  Files          32       36       +4     
  Lines        1976     2545     +569     
==========================================
+ Hits         1337     1795     +458     
- Misses        570      653      +83     
- Partials       69       97      +28     
Files Changed Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) ⬆️
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
conduit/plugins/config.go 100.00% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) ⬇️
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...gins/processors/filterprocessor/fields/searcher.go 77.50% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) ⬆️
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) ⬆️
... and 19 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Co-authored-by: Will Winder <wwinder.unh@gmail.com>
@tzaffi tzaffi marked this pull request as ready for review August 9, 2023 19:52

func (e *SyncError) Error() string {
return fmt.Sprintf("wrong round returned from status for round: %d != %d", e.rnd, e.expected)
return fmt.Sprintf("wrong round returned from status for round: retrieved(%d) != expected(%d): %v", e.retrievedRound, e.expectedRound, e.err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shiqizng - as you pointed out yesterday, at face value it's hard to tell which round was which so I've added "retrieved" for the value gotten back from the endpoint vs. "expected" which is the importer's expectation.

@tzaffi tzaffi requested a review from a team August 9, 2023 19:57
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a couple of nits.

}
noError = noError && assert.True(t, found, "(%s) Expected log was not found: '%s'", tc.name, log)
if !noError {
fmt.Printf(">>>>>WE HAVE A PROBLEM<<<<<<\n")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a little easier to find the failing needle in the printouts haystack since we aren't using require assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use !found as the condition, in tests that provide multiple log entries would print WE HAVE A PROBLEM for all checks after the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzaffi tzaffi mentioned this pull request Aug 10, 2023
10 tasks
Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion for the WE HAVE A PROBLEM condition.

@tzaffi tzaffi merged commit 7ef6106 into algorand:master Aug 11, 2023
@tzaffi tzaffi deleted the algod-importer-longer-timeout branch August 11, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants