Skip to content

Worker fails to recover from closed database connection #18546

@camhorn

Description

@camhorn

Describe the bug

On database connectivity loss, the worker fails to reconnect. Consumer restarts and lack of functionality persist until service is restarted. No reconnection seems to be attempted.

In addition, health checks are not triggered. The metric django_db_errors_total does see an increase, though.

How to reproduce

In my case, I'm using a replicated HAProxy in front of Postgresql. Cycling the proxy interrupts the connection, causing the errors.

Expected behavior

The worker ought reconnect to the database after losing connectivity. Alternatively, if the worker cannot connect to the database, the health check ought trigger.

Screenshots

No response

Additional context

As a workaround, I have amended the worker healthcheck

    healthcheck:
      test: >
        ak healthcheck &&
        if curl -s localhost:9300/metrics | grep django_db_errors_total;
        then false;
        fi

Deployment Method

Docker

Version

2025.10.2

Relevant log output

{"event": "Consumer encountered a connection error: the connection is closed", "level": "critical", "logger": "dramatiq.worker.ConsumerThread(default)", "timestamp": "2025-12-02T23:37:04.211192"}
{"domain_url": null, "event": "Database error encountered", "exc": "OperationalError('the connection is closed')", "level": "warning", "logger": "django_dramatiq_postgres.broker", "pid": 57, "schema_name": "public", "timestamp": "2025-12-02T23:37:04.210856"}
{"event": "Restarting consumer in 3.00 seconds.", "level": "info", "logger": "dramatiq.worker.ConsumerThread(default)", "timestamp": "2025-12-02T23:37:01.208377"}
{"event": "Consumer encountered a connection error: the connection is closed", "level": "critical", "logger": "dramatiq.worker.ConsumerThread(default)", "timestamp": "2025-12-02T23:37:01.208174"}
{"domain_url": null, "event": "Database error encountered", "exc": "OperationalError('the connection is closed')", "level": "warning", "logger": "django_dramatiq_postgres.broker", "pid": 57, "schema_name": "public", "timestamp": "2025-12-02T23:37:01.207834"}
{"event": "Restarting consumer in 3.00 seconds.", "level": "info", "logger": "dramatiq.worker.ConsumerThread(default)", "timestamp": "2025-12-02T23:36:58.205207"}
{"event": "Consumer encountered a connection error: the connection is closed", "level": "critical", "logger": "dramatiq.worker.ConsumerThread(default)", "timestamp": "2025-12-02T23:36:58.205045"}
`

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingbug/confirmedConfirmed bugs

Type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions