Skip to content

retaindb: implement health_check() with profile probe (#42 step 2f)#50

Merged
PowerCreek merged 1 commit into
mainfrom
retaindb-health-check-override
May 23, 2026
Merged

retaindb: implement health_check() with profile probe (#42 step 2f)#50
PowerCreek merged 1 commit into
mainfrom
retaindb-health-check-override

Conversation

@PowerCreek

Copy link
Copy Markdown

Summary

Step 2f of the RFC #42 migration plan. RetainDB's is_available() only checked RETAINDB_API_KEY presence — a bad key, expired token, or wrong base_url got a green check.

health_check() now GETs /v1/memory/profile/hermes-doctor-probe (side-effect-free; just reads any existing memories for the probe user_id) and classifies the outcome:

Tuple When
(True, "") profile GET succeeds
(False, "no_api_key") RETAINDB_API_KEY unset
(False, "sdk_missing") requests not installed
(False, "auth: ...") 401/403/forbidden/unauthorized/invalid-api-key/authentication
(False, "not_found: ...") 404 (wrong base_url or routing missing)
(False, "unreachable: ...") Anything else, including client init exceptions (defense in depth)

Honors RETAINDB_BASE_URL for self-hosted deployments + applies the same trailing-slash normalization that the production _Client already does.

Test plan

  • 10 new tests in tests/plugins/memory/test_retaindb_health_check.py:
    • success
    • no_api_key / sdk_missing
    • 4 auth-classification cases (401, 403, "invalid api key", "authentication failed")
    • 404 → not_found:
    • ConnectionErrorunreachable:
    • Client init exception → unreachable: (defense in depth)
    • Long errors truncated under 220 chars
    • Custom RETAINDB_BASE_URL passed through
    • Trailing-slash base_url normalized
  • pytest across the full health_check suite → 136 passed (123 prior + 13 net new across retaindb after fixture overlap).

Six providers now migrated (mem0/honcho/byterover/supermemory/openviking/retaindb); two remain (holographic/hindsight). Continues #42 step 2.

Filed by hermes-maintainer (PowerCreek).

Step 2f of the RFC #42 migration plan. is_available() only
checked RETAINDB_API_KEY presence; bad key / wrong base_url got
a green check. health_check() now GETs
/v1/memory/profile/hermes-doctor-probe (side-effect-free) and
classifies the outcome:

  (True, "")                  — profile GET succeeds.
  (False, "no_api_key")       — RETAINDB_API_KEY unset.
  (False, "sdk_missing")      — requests not installed.
  (False, "auth: ...")        — 401/403/forbidden/unauthorized/
                                 invalid-api-key/authentication.
  (False, "not_found: ...")   — 404 (wrong base_url or routing
                                 missing).
  (False, "unreachable: ...") — anything else, including client
                                 init exceptions (defense in depth).

Honors RETAINDB_BASE_URL for self-hosted deployments and the
trailing-slash normalization that the production _Client already
applies in its __init__.

Tests:
  - success → (True, "")
  - no_api_key / sdk_missing
  - 4 auth-classification cases (401, 403, invalid-api-key, authentication-failed)
  - 404 → not_found:
  - ConnectionError → unreachable:
  - Client init exception → unreachable: (defense in depth)
  - Long errors truncated under 220 chars
  - Custom RETAINDB_BASE_URL passed through
  - Trailing-slash base_url normalized

Six providers now migrated (mem0/honcho/byterover/supermemory/
openviking/retaindb); two remain (holographic/hindsight).
136 health_check tests pass across all migrated providers + the
ABC default. Continues #42 step 2.
@PowerCreek PowerCreek merged commit 58ae08b into main May 23, 2026
@PowerCreek PowerCreek deleted the retaindb-health-check-override branch May 23, 2026 02:59
PowerCreek added a commit that referenced this pull request May 23, 2026
…s 2h + 3) (#52)

Closes #42. Final two steps of the RFC migration.

Step 2h — holographic health_check:
  Holographic is local-only (SQLite); no remote service. But a
  read-only HERMES_HOME (RO mount, wrong perms) would still
  break it at runtime. Override verifies db_path's parent
  directory is writable.

    (True, "")                     — parent mkdir + W_OK pass.
    (False, "unreachable: <msg>")  — mkdir failed or W_OK denied.
    (False, "config_error: <msg>") — get_hermes_home raised.

Step 3 — doctor branch collapse:
  Every shipped provider now implements health_check() with the
  RFC #42 reason-prefix taxonomy. The provider-specific Honcho +
  Mem0 elif blocks (~120 lines combined) are gone. The unified
  dispatch is one ~80-line block in doctor.py keyed on the prefix:

    healthy → check_ok("<name> reachable")
    no_api_key / no_credentials → _fail_and_issue (setup hint)
    no_url / no_endpoint        → _fail_and_issue (URL hint)
    no_config                   → check_warn (run setup)
    disabled                    → check_info
    sdk_missing                 → _fail_and_issue (plugin docs)
    auth:                       → _fail_and_issue (rotate key)
    not_found:                  → _fail_and_issue (verify base URL)
    http:                       → check_warn (unexpected status)
    unreachable:                → check_warn (transient hint)
    config_error:               → check_warn (config raised)
    health_check_raised:        → check_warn (provider bug; RFC
                                   #42 says health_check MUST NOT
                                   raise, so this is a contract
                                   violation worth flagging)
    other                       → check_warn (unknown verbatim)

Doctor's Memory Provider section is now ~80 lines instead of
~200, and adding the 9th provider requires zero doctor changes.

Migration table (8 providers total):
  mem0       — PR #44  GET /v1/memory profile
  honcho     — PR #45  get_honcho_client handshake
  byterover  — PR #46  brv status (CLI + login)
  supermemory — PR #48  client.profile probe
  openviking — PR #49  /health endpoint probe
  retaindb   — PR #50  /v1/memory/profile GET
  hindsight  — PR #51  mode-dependent (local import / cloud /version)
  holographic — this PR  db_path parent writability

Tests:
  - 6 new tests for holographic health_check
  - Updated 3 doctor tests to assert the new unified dispatch
    (auth → _fail_and_issue; unreachable → check_warn; healthy
    → "<name> reachable") instead of the now-removed elif-block
    output strings.
  - 228 health_check + doctor tests pass total.

The #42 RFC is now fully implemented across all shipped memory
providers. Closing the issue with this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant