Skip to content

Pipeline the Pipeline#128

Merged
tzaffi merged 53 commits intoalgorand:masterfrom
tzaffi:pipelining
Aug 21, 2023
Merged

Pipeline the Pipeline#128
tzaffi merged 53 commits intoalgorand:masterfrom
tzaffi:pipelining

Conversation

@tzaffi
Copy link
Contributor

@tzaffi tzaffi commented Jul 24, 2023

Description

Allowing for moderate concurrency in the pipeline but without sacrificing its sequential integrity.

Summary of Changes

  • conduit/pipeline/common.go: introducing generic retrying of pipeline methods via Retries() and RetriesNoOutput()
  • conduit/pipeline/pipeline.go:
    • error/cancellation handling:
      • modify the pipeline's cancellation function to have a cause, and expose it via the WhyStopped() method
      • joinError() instead of setError() to the pipeline's error property
    • introducing goroutines for the importer, processors, exporters, and round launcher by refactoring Start() and introducing methods ImportHandler(), ProcessorHandler(), and ExporterHandler()
    • E2E end of test signal: modified the info-level log at the end of a round to look like FINISHED Pipeline round: 110. UPDATED Pipeline round: 111 and added WARNING commentary to be careful about changing this format as it would break the E2E test.
  • conduit/pipeline/pipeline_bench_test.go - benchmarker for the pipeline that includes sleeping plugins with an importer, 2 processors, and an exporter.
  • pkg/cli/cli.go: remove a line break from the final error printout

Issues

#118

TODO

  • Cleanup E2E code
  • Merge in more granular timestamps in logs #129
  • Don't embed PipelineData inside of BlockData
  • Remove OStart() and any resulting orphans in pipeline.go
  • Merge latest from master which includes 30 second timeout Algod Importer: Longer Timeout #133
  • Should we send a telemetry observation based on WhyStopped() ? NOT FOR NOW. Reconsider this for enhancement: graceful pipeline interrupt #97
  • What should be logged and how should internal-tools/.../logstats.go be modified? This is under the umbrella of the "Logging Plugin Performance" thread
  • Should we send metrics for the callbacks? Not in this PR
  • Test on EC2 with the final changes in place.
  • Update the results from that test in the spreadsheet table image below.

Testing

E2E

pipeline_bench_test.go

Running a new benchmark test twice on the original code and the new, we have the following results. Note the most pertinent results for the typical indexer DB population use case is exporter_10ms_while_others_1ms:

Benchmark Name Original rounds/sec Pipelining rounds/sec Pipelining v Original (%)
vanilla_2_procs_without_sleep-size-1-8 3077 3309.5 +7%
uniform_sleep_of_10ms-size-1-8 22.32 79.815 +250%
exporter_10ms_while_others_1ms-size-1-8 63.405 78.565 +24%
importer_10ms_while_others_1ms-size-1-8 65.535 91.255 +39%
first_processor_10ms_while_others_1ms-size-1-8 60.28 89.175 +48%

Block Generator Results

Running the block generator test using SCENARIO = scenarios/config.allmixed.small.yml for 30s, with the original code and the new, each time for 2 experiments we have:

Reset database? Original rounds/30 sec Pipelining rounds/30 sec Pipelining v Original (%)
Reset 301 400 +33%
No Reset 295 418 +41%

Local test network 5 minute sprint

I used the Justfile command

❯ just conduit-bootstrap-and-go 300

to bootstrap testnet and run a postgresql exporter against it for 300 seconds. I ran it a number of times against both the original pipeline and the new one. Here are the experimental results:

Log Level Reps Original rounds/300 sec (logs/round) Pipelining rounds/300 sec (logs/round) Pipelining v Original (%)
TRACE 3 3718 (7.0) 3509 (14.0) -5.6% 😢
INFO 2 4578.5 (3.0) 4423.5 (3.0) -3.4% 😢

On EC2 - CLASSIC vs. PIPELINING vs. 30 Second Timeout vs. FINAL

I ran catchup tests for 4 versions of conduit:

  • CLASSIC - what was on master at the time of the run
  • PIPELINING - the head of the pipelining branch at the same time
  • 30 Second Timeout - the pipelining branch after a 30 second timeout was introduced in the algod importer
  • FINAL - commit 867973f which is essentially the code meant for merging

There are much more detailed results in a google sheets document, but the summary is:

SUMMARY

image

@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #128 (9918073) into master (442791a) will increase coverage by 4.32%.
Report is 52 commits behind head on master.
The diff coverage is 81.89%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   67.66%   71.98%   +4.32%     
==========================================
  Files          32       36       +4     
  Lines        1976     2695     +719     
==========================================
+ Hits         1337     1940     +603     
- Misses        570      657      +87     
- Partials       69       98      +29     
Files Changed Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) ⬆️
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
conduit/plugins/config.go 100.00% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) ⬇️
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...gins/processors/filterprocessor/fields/searcher.go 77.50% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) ⬆️
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) ⬆️
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tzaffi tzaffi changed the title benchmark test Pipeline the Pipeline Jul 26, 2023
Zeph Grunschlag added 2 commits July 26, 2023 14:37
@tzaffi tzaffi mentioned this pull request Jul 27, 2023
err := runConduitCmdWithConfig(cfg)
if err != nil {
fmt.Fprintf(os.Stderr, "\nExiting with error:\n%s.\n", err)
fmt.Fprintf(os.Stderr, "\nExiting with error:\t%s.\n", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves the clarity of the E2E tests in the case of an error.

@tzaffi
Copy link
Contributor Author

tzaffi commented Aug 10, 2023

If it's easy for you to run, it might be interesting to see Local test network 5 minute sprint using different log levels to show how logrus/logging in general is impacting processing times.

Can we add this as a task for #131 ?

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the validation test, this looks good.

@tzaffi tzaffi mentioned this pull request Aug 18, 2023
}

ctx := context.Background()
pipeline, err := pipeline.MakePipeline(ctx, pCfg, logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes here, but removing the shadowing of the package by the variable.

@tzaffi tzaffi merged commit 0095fc9 into algorand:master Aug 21, 2023
@tzaffi tzaffi deleted the pipelining branch August 21, 2023 18:39
This was referenced Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants