feat(package): Add support for gracefully shutting down the compression scheduler and compression workers.

### Request

Currently our stop package scripts will stop every container running on a given machine in dependency order using `docker stop`, which [by default](https://docs.docker.com/reference/cli/docker/container/stop/) will send the main process inside of the container `SIGTERM` followed by `SIGKILL` after a 10 second grace period.

Unfortunately, the compression scheduler currently has no signal handler to initiate a graceful shutdown on `SIGTERM`, and a 10 second grace period is insufficient  for a graceful shutdown in our system. This means that the current setup can leave jobs in a bad state or potentially duplicate or lose data if a shutdown occurs while compression jobs are running.

### Possible implementation

To ensure that running compression jobs complete before shutdown the following two conditions should be sufficient, even in a distributed setup:
1. Upon receiving `SIGTERM` a compression worker finishes any currently running job and shuts down without fetching a new job
2. Upon receiving `SIGTERM` the compression scheduler stops dispatching new jobs and waits for currently running jobs to finish before terminating.

Since we can not guarantee liveness we must also support forcefully terminating after some grace period (implemented by sending a `SIGKILL` or otherwise).

Celery workers already [register signal handlers that do most of what we want](https://docs.celeryq.dev/en/stable/userguide/workers.html#warm-shutdown) by default (except they seem to use `SIGTERM`/`SIGINT`/`SIGQUIT`).

We would also need to write a signal handler for the compression scheduler that handles graceful shutdown as described.

After that it would just be a matter of modifying our stop script to send the compression worker and scheduler a "soft" kill signal followed by a "hard" kill signal after a reasonable timeout (e.g. 5 minutes). The "hard" timeout could be made  configurable via command line argument on the package stop script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(package): Add support for gracefully shutting down the compression scheduler and compression workers. #1037

Request

Possible implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(package): Add support for gracefully shutting down the compression scheduler and compression workers. #1037

Description

Request

Possible implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions