Skip to content

feat(server): add /api/health endpoints with DB liveness check#3029

Merged
enoch85 merged 4 commits into
Maintainerr:developmentfrom
Arvuno:feat/health-endpoint
Jun 5, 2026
Merged

feat(server): add /api/health endpoints with DB liveness check#3029
enoch85 merged 4 commits into
Maintainerr:developmentfrom
Arvuno:feat/health-endpoint

Conversation

@Arvuno

@Arvuno Arvuno commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a HealthController under apps/server/src/app/health.controller.ts exposing three endpoints:
    • GET /api/health — combined health check, 200 with database: "ok", 503 with database: "unreachable".
    • GET /api/health/live — liveness, 200 as long as the process is up, no DB call. Use for Kubernetes livenessProbe.
    • GET /api/health/ready — readiness, runs SELECT 1 against the configured TypeORM datasource, 503 on failure. Use for Kubernetes readinessProbe and Docker HEALTHCHECK.
  • Register the controller in AppModule.

Why

The application has no /health endpoint today, which makes it awkward to wire up Kubernetes liveness/readiness probes or Docker HEALTHCHECK. /api/app/status exists but it also exercises the GitHub releases API, so a slow GitHub response would make the app look unhealthy to a load balancer.

How tested

  • yarn install — OK.
  • yarn check-types (Turbo) — OK across all four packages.
  • yarn workspace @maintainerr/server lint — OK, no warnings.
  • yarn workspace @maintainerr/server tsc --noEmit — OK.
  • No automated test added in this PR; the controller is small enough to be verifiable by hitting the endpoints once the app is up. A follow-up unit test for the unreachable branch (mock the datasource to throw) is straightforward and can be added if reviewers want it.

Notes

  • The DB check uses the same TypeORM datasource the rest of the app uses, so a misconfigured connection string or a full disk surfaces as a clear 503 rather than a silent 200.
  • No new dependencies. The controller relies on the existing @nestjs/typeorm and typeorm packages.
  • The endpoints are mounted under /api/health to follow the existing /api/* prefix convention used elsewhere in the app.

The application has no /health endpoint today, which makes it
awkward to wire up Kubernetes livenessProbe / readinessProbe or
Docker HEALTHCHECK directives. /api/app/status is not a substitute
because it also exercises the GitHub releases API.

Add three endpoints under /api/health:

  - GET /api/health       — combined: 200 with database=ok, or 503
                            with database=unreachable.
  - GET /api/health/live  — liveness: 200 as long as the process is
                            up, no DB call. Use for Kubernetes
                            livenessProbe (no restarts on transient
                            DB blips).
  - GET /api/health/ready — readiness: SELECT 1 against the
                            configured TypeORM datasource; returns
                            503 if the query fails. Use for
                            Kubernetes readinessProbe (stop
                            sending traffic until the DB is back).

The DB check uses the same TypeORM datasource the rest of the app
uses, so a misconfigured connection string or a full disk surfaces
as a clear 503 rather than a silent 200.
@Arvuno Arvuno requested a review from enoch85 as a code owner June 3, 2026 19:53

This comment was marked as resolved.

enoch85 added 2 commits June 5, 2026 10:12
…cker HEALTHCHECK

- Extract the SELECT 1 ping into HealthService so the controller only shapes the
  response, matching the controllers-delegate-to-services pattern.
- Log the caught DB error via MaintainerrLogger (warn + debug) instead of
  swallowing it.
- Use process.uptime(); drop the instantiation-time uptime helper.
- Move the response shapes to @maintainerr/contracts (LivenessResponse,
  HealthResponse); live() returns the shared envelope; root route uses @get().
- Add HealthService and HealthController specs.
- Wire the Dockerfile HEALTHCHECK to /api/health/ready via a BASE_PATH-aware
  probe script.
@enoch85 enoch85 merged commit ddbc151 into Maintainerr:development Jun 5, 2026
14 checks passed
@enoch85

enoch85 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Thanks for this. Before merging I pushed a follow-up to address the review points:

  • The database check now lives in a dedicated HealthService; the controller only shapes the response, in keeping with the rest of the codebase where controllers delegate to services.
  • The caught database error is logged via MaintainerrLogger instead of being swallowed, so an unreachable result leaves a trace in the logs.
  • Uptime is reported from process.uptime(), and the response shapes (LivenessResponse, HealthResponse) were moved into @maintainerr/contracts. /live now returns the same envelope as the other endpoints, and the root route uses @Get().
  • Added unit tests for both the service and the controller, covering the reachable and unreachable paths.
  • Wired up the Docker HEALTHCHECK to /api/health/ready (BASE_PATH-aware), which was the original motivation for the endpoint.

Readiness intentionally checks only the database - connected services (media servers, *arr, Seerr, Tautulli) are deliberately left out so a flaky integration can't take the whole instance out of rotation.

Merged - thanks for the contribution.

maintainerr-automation Bot added a commit that referenced this pull request Jun 5, 2026
* fix(rules): save rule groups from incomplete payloads and clarify YAML import errors (#3045)

- Default keepLogsForMonths to 6 when omitted instead of binding NaN (create and update)
- Return a structured {code:0} error instead of a silent empty 201 when a save fails
- Guard updateRules against a missing collection block (no throw, no spurious media wipe)
- Clearer message for invalid YAML; log real server-side faults instead of mislabeling them as syntax errors

Refs #3044

* feat(server): add /api/health endpoints with DB liveness check (#3029)

* feat(server): add /api/health endpoints with DB liveness check

The application has no /health endpoint today, which makes it
awkward to wire up Kubernetes livenessProbe / readinessProbe or
Docker HEALTHCHECK directives. /api/app/status is not a substitute
because it also exercises the GitHub releases API.

Add three endpoints under /api/health:

  - GET /api/health       — combined: 200 with database=ok, or 503
                            with database=unreachable.
  - GET /api/health/live  — liveness: 200 as long as the process is
                            up, no DB call. Use for Kubernetes
                            livenessProbe (no restarts on transient
                            DB blips).
  - GET /api/health/ready — readiness: SELECT 1 against the
                            configured TypeORM datasource; returns
                            503 if the query fails. Use for
                            Kubernetes readinessProbe (stop
                            sending traffic until the DB is back).

The DB check uses the same TypeORM datasource the rest of the app
uses, so a misconfigured connection string or a full disk surfaces
as a clear 503 rather than a silent 200.

* refactor(server): move health DB check to a service; add tests and Docker HEALTHCHECK

- Extract the SELECT 1 ping into HealthService so the controller only shapes the
  response, matching the controllers-delegate-to-services pattern.
- Log the caught DB error via MaintainerrLogger (warn + debug) instead of
  swallowing it.
- Use process.uptime(); drop the instantiation-time uptime helper.
- Move the response shapes to @maintainerr/contracts (LivenessResponse,
  HealthResponse); live() returns the shared envelope; root route uses @get().
- Add HealthService and HealthController specs.
- Wire the Dockerfile HEALTHCHECK to /api/health/ready via a BASE_PATH-aware
  probe script.

---------

Co-authored-by: enoch85 <mailto@danielhansson.nu>

* docs: clarify yarn command-not-found means a stale node_modules

* fix(rules): preserve collection link and visibility on partial updates (#3046)

When the collection block is omitted from PUT /api/rules, fall back to the
saved values for manualCollection, manualCollectionName, visibleOnHome and
visibleOnRecommended instead of forwarding undefined. This stops
updateCollection from silently unlinking a manual collection (mediaServerId
cleared) or switching off Plex Home/Recommended visibility.

Follow-up to #3045.

* feat(logging): honour LOG_LEVEL env var on startup (#3030)

Operators can override the persisted log level for a single container by
setting LOG_LEVEL=debug|verbose|info|warn|error|fatal in the environment.
The persisted setting (the value the UI shows) is left untouched, so the
override is never baked into the database. An unrecognised value logs a
single warning at startup and falls back to the persisted level; with the
env var unset, behaviour is unchanged.

The winston factory now also tolerates a missing log_settings row during
first boot, falling back to the shared DEFAULT_LOG_* constants for level,
max size, and max files instead of dereferencing a null row.

Co-authored-by: enoch85 <mailto@danielhansson.nu>

* fix(notifications): validate webhook URL scheme before posting (#3031)

User-configured notification URLs were passed straight to axios.post, so a
misconfigured value pointing at file://, gopher://, or any other non-http(s)
scheme would be turned into an outbound request. These URLs have no validation
at the settings layer, so this is the only guard.

Add a shared validateWebhookUrl helper and apply it in every agent that posts
to an operator-supplied URL: the webhookUrl agents (webhook, slack, lunasea,
discord) and ntfy's server url. Unparseable URLs and non-http(s) schemes are
rejected before sending and the normalised URL is posted; the agent returns a
clear Failure result.

Also replace ntfy's regex slash-trimming with the codebase's char-based idiom,
per the no-regex convention.

Co-authored-by: enoch85 <mailto@danielhansson.nu>

* docs: refresh README - features, health endpoints, deploy examples, credits (#3048)

---------

Co-authored-by: enoch85 <mailto@danielhansson.nu>
Co-authored-by: Arvuno <hi@arvuno.xyz>
@maintainerr-automation

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.14.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

doonga pushed a commit to greyrock-labs/home-ops that referenced this pull request Jun 6, 2026
… ➔ 3.14.0) (#207)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/maintainerr/maintainerr](https://github.com/Maintainerr/Maintainerr) | minor | `3.13.0` → `3.14.0` |

---

### Release Notes

<details>
<summary>Maintainerr/Maintainerr (ghcr.io/maintainerr/maintainerr)</summary>

### [`v3.14.0`](https://github.com/Maintainerr/Maintainerr/blob/HEAD/CHANGELOG.md#3140-2026-06-05)

[Compare Source](Maintainerr/Maintainerr@v3.13.0...v3.14.0)

#### Highlights

- Added `/api/health` endpoints with liveness and readiness checks for monitoring and integration with tools like Kubernetes and Docker Compose ([#&#8203;3029](Maintainerr/Maintainerr#3029)).
- Collection handler now skips media currently being streamed to avoid disrupting active viewers ([#&#8203;3027](Maintainerr/Maintainerr#3027)).
- Fixed issue where saving log settings would overwrite an active `LOG_LEVEL` environment variable override ([#&#8203;3053](Maintainerr/Maintainerr#3053)).

#### Features

- Added `/api/health` endpoints with liveness and readiness checks ([#&#8203;3029](Maintainerr/Maintainerr#3029)).
- Collection handler now skips media currently being streamed ([#&#8203;3027](Maintainerr/Maintainerr#3027)).
- Logging system now honors the `LOG_LEVEL` environment variable on startup ([#&#8203;3030](Maintainerr/Maintainerr#3030)).

#### Fixes

- Fixed issue where saving log settings would overwrite an active `LOG_LEVEL` environment variable override ([#&#8203;3053](Maintainerr/Maintainerr#3053)).
- Validated webhook URL schemes to prevent invalid or potentially harmful requests ([#&#8203;3031](Maintainerr/Maintainerr#3031)).
- Fixed issue where rule groups lost collection links and visibility on partial updates ([#&#8203;3045](Maintainerr/Maintainerr#3045), [#&#8203;3046](Maintainerr/Maintainerr#3046)).
- Fixed issue with manual collections not being found across libraries on Jellyfin/Emby ([#&#8203;3026](Maintainerr/Maintainerr#3026), [#&#8203;3042](Maintainerr/Maintainerr#3042)).
- Resolved issue where deleted media remained stuck in Jellyfin/Emby collections and caused repeated processing errors ([#&#8203;3023](Maintainerr/Maintainerr#3023), [#&#8203;3024](Maintainerr/Maintainerr#3024), [#&#8203;3040](Maintainerr/Maintainerr#3040)).
- Fixed issue where Seerr requests for episode rules incorrectly deleted entire season requests ([#&#8203;3015](Maintainerr/Maintainerr#3015)).
- Improved error notifications for collection handling failures to include the name of the failing collection ([#&#8203;3013](Maintainerr/Maintainerr#3013)).
- Used Radarr bulk exclusions endpoint to avoid duplicate 400 errors when adding exclusions ([#&#8203;3012](Maintainerr/Maintainerr#3012)).

#### Performance

- Pruned media that no longer exists on the media server to improve collection handling efficiency ([#&#8203;3023](Maintainerr/Maintainerr#3023), [#&#8203;3040](Maintainerr/Maintainerr#3040)).

#### Internal

- Refreshed README with updated features, deployment examples, and credits ([#&#8203;3048](Maintainerr/Maintainerr#3048)).
- Clarified that a missing `yarn` command indicates a stale `node_modules` directory.

#### Dependencies

- Updated 20 dependencies, including `@typescript-eslint/parser`, `react-router-dom`, `axios`, and `vite`.

#### New Contributors

- [@&#8203;Arvuno](https://github.com/Arvuno) made their first contribution in [#&#8203;3029](Maintainerr/Maintainerr#3029)

</details>

---

### Configuration

📅 **Schedule**: (in timezone America/New_York)

- Branch creation
  - At any time (no schedule defined)
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yMDYuMCIsInVwZGF0ZWRJblZlciI6IjQzLjIwNi4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Reviewed-on: https://git.greyrock.io/greyrock-labs/home-ops/pulls/207
mrdynamo added a commit to mrdynamo/home-ops that referenced this pull request Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants