Summary
The shipped worker healthcheck command vibetuner worker-health cannot reliably complete within a typical Docker healthcheck timeout, because it pays the full CLI bootstrap cost (~5s) before it ever reads the streaq health key. With the scaffolded compose.prod.yml worker healthcheck (timeout: 5s), every probe is killed at the timeout boundary (ExitCode -1) before the check logic runs, so a perfectly healthy worker flaps to unhealthy permanently.
Steps to reproduce
- Deploy the scaffolded worker with its default healthcheck:
healthcheck:
test: ["CMD", "vibetuner", "worker-health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 40s
- Worker is genuinely healthy: asyncio loop alive, per-minute cron firing, streaq consumer group
pending=0, lag=0, worker_watchdog_timeout=60s active, restart count 0.
docker inspect shows the container unhealthy with a long failing streak; healthcheck log entries are ExitCode -1 (timed out), not the command's own output.
Expected vs actual
- Expected:
worker-health only needs to read streaq's streaq:{queue}:health:* key — it should return in well under a second, comfortably inside a small timeout.
- Actual: the command first runs the standard CLI/app bootstrap (config load,
BlobService init, rate limiter, app-config load, logging setup), which on a real deploy takes ~5s, so it is killed before reaching the health logic.
Suggested fixes (either or both)
- Make
worker-health a lightweight fast-path that skips the heavy app-config / BlobService / rate-limiter initialization and only constructs the minimal Redis client needed to read the streaq health key.
- Ship the scaffolded worker healthcheck with a
timeout that accounts for the current bootstrap cost (e.g. 20s) so it isn't unusable at its own default.
Environment
- vibetuner 10.22.3, observed on a production Docker Compose deploy (
restart: unless-stopped, no autoheal sidecar — so the false unhealthy is cosmetic today, but it masks any genuine future unhealthy signal).
Filed by Claude Code.
Summary
The shipped worker healthcheck command
vibetuner worker-healthcannot reliably complete within a typical Docker healthchecktimeout, because it pays the full CLI bootstrap cost (~5s) before it ever reads the streaq health key. With the scaffoldedcompose.prod.ymlworker healthcheck (timeout: 5s), every probe is killed at the timeout boundary (ExitCode -1) before the check logic runs, so a perfectly healthy worker flaps tounhealthypermanently.Steps to reproduce
pending=0, lag=0,worker_watchdog_timeout=60sactive, restart count 0.docker inspectshows the containerunhealthywith a long failing streak; healthcheck log entries are ExitCode -1 (timed out), not the command's own output.Expected vs actual
worker-healthonly needs to read streaq'sstreaq:{queue}:health:*key — it should return in well under a second, comfortably inside a small timeout.BlobServiceinit, rate limiter, app-config load, logging setup), which on a real deploy takes ~5s, so it is killed before reaching the health logic.Suggested fixes (either or both)
worker-healtha lightweight fast-path that skips the heavy app-config / BlobService / rate-limiter initialization and only constructs the minimal Redis client needed to read the streaq health key.timeoutthat accounts for the current bootstrap cost (e.g. 20s) so it isn't unusable at its own default.Environment
restart: unless-stopped, no autoheal sidecar — so the falseunhealthyis cosmetic today, but it masks any genuine future unhealthy signal).Filed by Claude Code.