Skip to content

feat(package): Add support for gracefully shutting down the compression scheduler and compression workers. #1037

@gibber9809

Description

@gibber9809

Request

Currently our stop package scripts will stop every container running on a given machine in dependency order using docker stop, which by default will send the main process inside of the container SIGTERM followed by SIGKILL after a 10 second grace period.

Unfortunately, the compression scheduler currently has no signal handler to initiate a graceful shutdown on SIGTERM, and a 10 second grace period is insufficient for a graceful shutdown in our system. This means that the current setup can leave jobs in a bad state or potentially duplicate or lose data if a shutdown occurs while compression jobs are running.

Possible implementation

To ensure that running compression jobs complete before shutdown the following two conditions should be sufficient, even in a distributed setup:

  1. Upon receiving SIGTERM a compression worker finishes any currently running job and shuts down without fetching a new job
  2. Upon receiving SIGTERM the compression scheduler stops dispatching new jobs and waits for currently running jobs to finish before terminating.

Since we can not guarantee liveness we must also support forcefully terminating after some grace period (implemented by sending a SIGKILL or otherwise).

Celery workers already register signal handlers that do most of what we want by default (except they seem to use SIGTERM/SIGINT/SIGQUIT).

We would also need to write a signal handler for the compression scheduler that handles graceful shutdown as described.

After that it would just be a matter of modifying our stop script to send the compression worker and scheduler a "soft" kill signal followed by a "hard" kill signal after a reasonable timeout (e.g. 5 minutes). The "hard" timeout could be made configurable via command line argument on the package stop script.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions