Skip to content

Graceful Pipeline Exit #100

@tzaffi

Description

@tzaffi

Problem

Currently, when the main conduit process receives a shutdown signal, it abruptly tears down all its goroutines even though the pipeline shares a common context.Context object that is supposed to make it possible to exit more gracefully and in a deterministic fashion. In particular, it is currently not possible to coordinate a new requirement such as having each plugin finish the current round before exit.

Solution

This story posits the following goals (though after further discussion we may want to trim the goals or break them out into separate issues):

  1. Handle an interrupt signal in a way that won't cause duplicate calls to plugin Stop(). As a starting point refer to enhancement: graceful pipeline interrupt #97 but better crafting of where to place the signal handling is required. See also Rework pipeline process management #99
  2. Consider adding a new conduit stop command that looks for the PID file and sends an interrupt to that process
  3. Consider the possibility of letting each plugin go to the end. Even better make this configurable via the type of interrupt signal received. EG:
    • conduit stop now would send an os.Interrupt (or a kill -TERM $(PROCNUM) ) and kills all plugin processes immediately
    • conduit stop end-of-round produces something like kill -USR1 $(PROCNUM) and kills plugin process in the following order at the end of its current round: ImporterProcessorsExporter

Some questions

  1. Should we add WhyStopped() to the Pipeline interface, and send a telemetry observation with this context cancellation cause?
  2. Do we still need Stop() in the Pipeline's interface?
  3. Do we need sentinel error to act as sentinel cancelation causes and to distinguish between different legitimate reasons?

Some possible code in the client making use of the stop cause:

defer func() {
    pline.Stop()
    if !errors.Is(pline.WhyStopped(), pipeline.BecauseStopMethod) {
        logger.Error("unexpected conduit pipeline exit: %v", pline.WhyStopped())
    }
}()

Dependencies

None

Urgency

Low - I haven't heard of other developers in the community complain of this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions