Skip to content

vibetuner worker-health too slow to finish within a sane healthcheck timeout (CLI bootstrap ~5s) #1974

@davidpoblador

Description

@davidpoblador

Summary

The shipped worker healthcheck command vibetuner worker-health cannot reliably complete within a typical Docker healthcheck timeout, because it pays the full CLI bootstrap cost (~5s) before it ever reads the streaq health key. With the scaffolded compose.prod.yml worker healthcheck (timeout: 5s), every probe is killed at the timeout boundary (ExitCode -1) before the check logic runs, so a perfectly healthy worker flaps to unhealthy permanently.

Steps to reproduce

  1. Deploy the scaffolded worker with its default healthcheck:
    healthcheck:
      test: ["CMD", "vibetuner", "worker-health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 40s
  2. Worker is genuinely healthy: asyncio loop alive, per-minute cron firing, streaq consumer group pending=0, lag=0, worker_watchdog_timeout=60s active, restart count 0.
  3. docker inspect shows the container unhealthy with a long failing streak; healthcheck log entries are ExitCode -1 (timed out), not the command's own output.

Expected vs actual

  • Expected: worker-health only needs to read streaq's streaq:{queue}:health:* key — it should return in well under a second, comfortably inside a small timeout.
  • Actual: the command first runs the standard CLI/app bootstrap (config load, BlobService init, rate limiter, app-config load, logging setup), which on a real deploy takes ~5s, so it is killed before reaching the health logic.

Suggested fixes (either or both)

  1. Make worker-health a lightweight fast-path that skips the heavy app-config / BlobService / rate-limiter initialization and only constructs the minimal Redis client needed to read the streaq health key.
  2. Ship the scaffolded worker healthcheck with a timeout that accounts for the current bootstrap cost (e.g. 20s) so it isn't unusable at its own default.

Environment

  • vibetuner 10.22.3, observed on a production Docker Compose deploy (restart: unless-stopped, no autoheal sidecar — so the false unhealthy is cosmetic today, but it masks any genuine future unhealthy signal).

Filed by Claude Code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions