enhancement: graceful pipeline interrupt#97
Conversation
Co-authored-by: Will Winder <wwinder.unh@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #97 +/- ##
==========================================
+ Coverage 67.66% 69.44% +1.78%
==========================================
Files 32 36 +4
Lines 1976 2435 +459
==========================================
+ Hits 1337 1691 +354
- Misses 570 653 +83
- Partials 69 91 +22
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| stop := make(chan os.Signal, 1) | ||
| signal.Notify(stop, os.Interrupt, syscall.SIGTERM, syscall.SIGINT) | ||
| go func() { | ||
| sig := <-stop | ||
| p.logger.Infof("Pipeline received stopping signal <%v>, stopping pipeline. p.pipelineMetadata.NextRound: %d", sig, p.pipelineMetadata.NextRound) | ||
| p.Stop() | ||
| }() |
There was a problem hiding this comment.
You get the duplicate calls to plugins being closed because Stop is called twice. The other spot is in runConduitCmdWithConfig.
Maybe the cli package is the right place to install a signal handler? We pass a context into the pipeline, when the context is cancelled maybe we should implicitly call stop (or maybe cancelling the context is Stop and we get rid of the public function)
There was a problem hiding this comment.
I've converted this PR into a draft and encapsulated its goals in new issue #100
|
Zeph Grunschlag seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Summary
This continues an original draft PR, but this time against
algorand/conduit/master.Gracefully handling interrupt signal. The non-gracefulness was discovered during block-generator testing. The following new printouts are being generated (which without the new go-routine are NOT printed).
{"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline received stopping signal <interrupt>, stopping pipeline. p.pipelineMetadata.NextRound: 14","time":"2023-06-09T14:47:06-05:00"} {"__type":"importer","_name":"algod","level":"trace","msg":"importer algod.GetBlock() called BlockRaw(14) err: context canceled","time":"2023-06-09T14:47:06-05:00"} {"__type":"importer","_name":"algod","level":"error","msg":"error getting block for round 14 (attempt 0): context canceled","time":"2023-06-09T14:47:06-05:00"} {"__type":"importer","_name":"algod","level":"trace","msg":"importer algod.GetBlock() called StatusAfterBlock(13) err: context canceled","time":"2023-06-09T14:47:06-05:00"} {"__type":"Conduit","_name":"main","level":"error","msg":"GetBlock ctx error: context canceled","time":"2023-06-09T14:47:06-05:00"} {"__type":"Conduit","_name":"main","level":"info","msg":"Retry number 1 resuming after a 1s retry delay.","time":"2023-06-09T14:47:06-05:00"} {"__type":"importer","_name":"algod","level":"info","msg":"importer algod.Close() at round 14","time":"2023-06-09T14:47:07-05:00"} {"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline.Stop(): Importer (algod) closed without error","time":"2023-06-09T14:47:07-05:00"} {"__type":"importer","_name":"algod","level":"info","msg":"importer algod.Close() at round 14","time":"2023-06-09T14:47:07-05:00"} {"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline.Stop(): Importer (algod) closed without error","time":"2023-06-09T14:47:07-05:00"} {"__type":"exporter","_name":"postgresql","level":"info","msg":"exporter postgresql.Close() at round 14","time":"2023-06-09T14:47:07-05:00"} {"__type":"Conduit","_name":"main","level":"info","msg":"Pipeline.Stop(): Exporter (postgresql) closed without error","time":"2023-06-09T14:47:07-05:00"} {"__type":"exporter","_name":"postgresql","level":"info","msg":"exporter postgresql.Close() at round 14","time":"2023-06-09T14:47:07-05:00"}I'm not sure why the
closed without errorlines are getting repeated for each plugin.Issues
#100
Test Plan
WOMM