Skip to content

openviking: implement health_check() with /health probe (#42 step 2e)#49

Merged
PowerCreek merged 1 commit into
mainfrom
openviking-health-check-override
May 23, 2026
Merged

openviking: implement health_check() with /health probe (#42 step 2e)#49
PowerCreek merged 1 commit into
mainfrom
openviking-health-check-override

Conversation

@PowerCreek

Copy link
Copy Markdown

Summary

Step 2e of the RFC #42 migration plan. OpenViking's is_available() only checked whether OPENVIKING_ENDPOINT was set — a typo'd URL or down server got a green check.

health_check() now hits <endpoint>/health via httpx (3s timeout) and classifies status codes:

Status / Outcome Tuple
200 (True, "")
401 / 403 (False, "auth: HTTP <code>")
404 (False, "not_found: /health route missing")
Other non-2xx (False, "http: status <code>")
Exception (False, "unreachable: <exc>")
OPENVIKING_ENDPOINT unset (False, "no_endpoint")
httpx absent (False, "sdk_missing")

The probe builds minimal tenant headers (X-OpenViking-Account / X-OpenViking-User / X-OpenViking-Agent) so ROOT-keyed requests validate correctly — OpenViking 0.3.x rejects ROOT-keyed tenant-scoped calls that omit those headers (per _OpenVikingClient._headers).

Test plan

  • 12 new tests in tests/plugins/memory/test_openviking_health_check.py:
    • 200 success
    • no_endpoint / sdk_missing
    • 401 + 403 → auth:
    • 404 → not_found:
    • 500/502/503/504 → http:
    • ConnectionError → unreachable:
    • Endpoint trailing-slash normalized (no //health)
    • API key forwarded as both Bearer and X-API-Key
    • Default tenant headers (Account / User / Agent) included
  • pytest across the full health_check suite → 123 passed (109 prior + 14 new across openviking).

Five providers migrated (mem0/honcho/byterover/supermemory/openviking); three remain (holographic/hindsight/retaindb). Continues #42 step 2.

Filed by hermes-maintainer (PowerCreek).

Step 2e of the RFC #42 migration plan. is_available() only
checked whether OPENVIKING_ENDPOINT was set — an operator with a
typo'd URL or unreachable server got a green check.

health_check() now hits `<endpoint>/health` via httpx with a
3-second timeout and classifies the status code:

  200          → (True, "")
  401 / 403    → (False, "auth: HTTP <code>")
  404          → (False, "not_found: /health route missing")
  other        → (False, "http: status <code>")
  exception    → (False, "unreachable: <exc>")

Plus the standard prelude:
  no env       → (False, "no_endpoint")
  httpx absent → (False, "sdk_missing")

The probe builds minimal tenant headers (X-OpenViking-Account /
X-OpenViking-User / X-OpenViking-Agent) so ROOT API keys validate
correctly — OpenViking 0.3.x rejects ROOT-keyed requests to
tenant-scoped APIs that omit those headers (see _OpenVikingClient._headers).

Endpoint normalization: trailing-slash variants both produce a
clean <endpoint>/health URL.

Tests:
  - 200 returns (True, "")
  - no_endpoint when OPENVIKING_ENDPOINT unset
  - sdk_missing when httpx absent
  - 401 + 403 classified as auth:
  - 404 classified as not_found:
  - 500/502/503/504 classified as http:
  - ConnectionError classified as unreachable:
  - trailing-slash URL normalized
  - API key forwarded as both Bearer and X-API-Key
  - Default tenant headers included

Five providers migrated now (mem0/honcho/byterover/supermemory/
openviking); three remain (holographic/hindsight/retaindb).
Continues #42 step 2.
@PowerCreek PowerCreek merged commit 871e33f into main May 23, 2026
@PowerCreek PowerCreek deleted the openviking-health-check-override branch May 23, 2026 02:46
PowerCreek added a commit that referenced this pull request May 23, 2026
…s 2h + 3) (#52)

Closes #42. Final two steps of the RFC migration.

Step 2h — holographic health_check:
  Holographic is local-only (SQLite); no remote service. But a
  read-only HERMES_HOME (RO mount, wrong perms) would still
  break it at runtime. Override verifies db_path's parent
  directory is writable.

    (True, "")                     — parent mkdir + W_OK pass.
    (False, "unreachable: <msg>")  — mkdir failed or W_OK denied.
    (False, "config_error: <msg>") — get_hermes_home raised.

Step 3 — doctor branch collapse:
  Every shipped provider now implements health_check() with the
  RFC #42 reason-prefix taxonomy. The provider-specific Honcho +
  Mem0 elif blocks (~120 lines combined) are gone. The unified
  dispatch is one ~80-line block in doctor.py keyed on the prefix:

    healthy → check_ok("<name> reachable")
    no_api_key / no_credentials → _fail_and_issue (setup hint)
    no_url / no_endpoint        → _fail_and_issue (URL hint)
    no_config                   → check_warn (run setup)
    disabled                    → check_info
    sdk_missing                 → _fail_and_issue (plugin docs)
    auth:                       → _fail_and_issue (rotate key)
    not_found:                  → _fail_and_issue (verify base URL)
    http:                       → check_warn (unexpected status)
    unreachable:                → check_warn (transient hint)
    config_error:               → check_warn (config raised)
    health_check_raised:        → check_warn (provider bug; RFC
                                   #42 says health_check MUST NOT
                                   raise, so this is a contract
                                   violation worth flagging)
    other                       → check_warn (unknown verbatim)

Doctor's Memory Provider section is now ~80 lines instead of
~200, and adding the 9th provider requires zero doctor changes.

Migration table (8 providers total):
  mem0       — PR #44  GET /v1/memory profile
  honcho     — PR #45  get_honcho_client handshake
  byterover  — PR #46  brv status (CLI + login)
  supermemory — PR #48  client.profile probe
  openviking — PR #49  /health endpoint probe
  retaindb   — PR #50  /v1/memory/profile GET
  hindsight  — PR #51  mode-dependent (local import / cloud /version)
  holographic — this PR  db_path parent writability

Tests:
  - 6 new tests for holographic health_check
  - Updated 3 doctor tests to assert the new unified dispatch
    (auth → _fail_and_issue; unreachable → check_warn; healthy
    → "<name> reachable") instead of the now-removed elif-block
    output strings.
  - 228 health_check + doctor tests pass total.

The #42 RFC is now fully implemented across all shipped memory
providers. Closing the issue with this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant