Skip to content

hindsight: implement health_check() with mode-dependent dispatch (#42 step 2g)#51

Merged
PowerCreek merged 1 commit into
mainfrom
hindsight-health-check-override
May 23, 2026
Merged

hindsight: implement health_check() with mode-dependent dispatch (#42 step 2g)#51
PowerCreek merged 1 commit into
mainfrom
hindsight-health-check-override

Conversation

@PowerCreek

Copy link
Copy Markdown

Summary

Step 2g of the RFC #42 migration plan. Hindsight has four deployment modes (local / local_embedded / local_external / cloud) and each needs a different probe:

Mode Probe
local / local_embedded _check_local_runtime() (existing helper — tries to import hindsight + hindsight_embed; catches NumPy ABI mismatches the session-start loader already degrades around)
local_external / cloud GET <api_url>/version with the configured bearer; classify status code

Outcomes (RFC #42 reason prefixes)

Tuple When
(True, "") Local imports OK, OR /version 200
(False, "sdk_missing: ...") Local imports failed
(False, "no_credentials") Cloud with neither apiKey nor api_url
(False, "no_url") local_external without api_url, OR cloud with apiKey but no api_url
(False, "auth: ...") 401 / 403 from /version
(False, "not_found: ...") 404
(False, "http: ...") Other non-2xx
(False, "unreachable: ...") Connection / timeout / parse failure, including non-dict JSON response
(False, "config_error: ...") _load_config itself raised

Honors the same env-var resolution as is_available (HINDSIGHT_API_KEY, HINDSIGHT_API_URL).

Test plan

  • 17 new tests in tests/plugins/memory/test_hindsight_health_check.py:
    • Local + local_embedded: success + sdk_missing
    • local_external: no_url + success
    • Cloud: no_credentials, no_url, success, 401 → auth, 404 → not_found, 503 → http, URLError → unreachable, malformed JSON → unreachable, non-dict JSON → unreachable
    • Top-level: config-load raises → config_error
    • Env-var fallback when cfg dict empty
  • pytest across the full health_check suite → 152 passed.

Seven providers now migrated (mem0/honcho/byterover/supermemory/openviking/retaindb/hindsight); only holographic remains (trivial — SQLite always healthy).

Filed by hermes-maintainer (PowerCreek).

…step 2g)

Step 2g of the RFC #42 migration plan. Hindsight has three deployment
modes (local / local_embedded / local_external / cloud) and each
needs a different probe:

  local / local_embedded → _check_local_runtime() (existing helper:
                            tries to import hindsight + hindsight_embed
                            and catches NumPy ABI mismatches that the
                            session-start loader already gracefully
                            degrades around).
  local_external / cloud → GET <api_url>/version with the configured
                            bearer; classify status code.

Outcomes (RFC #42 reason prefixes):
  (True, "")                  — local imports OK, OR /version 200.
  (False, "sdk_missing: ...") — local imports failed.
  (False, "no_credentials")   — cloud with neither apiKey nor api_url.
  (False, "no_url")           — local_external missing api_url, OR
                                 cloud with apiKey but no api_url.
  (False, "auth: ...")        — 401 / 403 from /version.
  (False, "not_found: ...")   — 404.
  (False, "http: ...")        — other non-2xx.
  (False, "unreachable: ...") — connection/timeout/parse failure,
                                 including non-dict JSON response.
  (False, "config_error:...") — _load_config itself raised.

Honors the same env-var resolution as is_available (HINDSIGHT_API_KEY,
HINDSIGHT_API_URL).

Tests:
  - local + local_embedded: success + sdk_missing
  - local_external: no_url + success
  - cloud: no_credentials, no_url, success
  - cloud: 401 → auth, 404 → not_found, 503 → http,
           URLError → unreachable, malformed JSON → unreachable,
           non-dict JSON → unreachable
  - config-load raises → config_error
  - env-var fallback when cfg dict empty

Seven providers now migrated (mem0/honcho/byterover/supermemory/
openviking/retaindb/hindsight); one remains (holographic — trivial,
SQLite always healthy). 152 health_check tests pass.

Continues #42 step 2.
@PowerCreek PowerCreek merged commit e8b59a4 into main May 23, 2026
@PowerCreek PowerCreek deleted the hindsight-health-check-override branch May 23, 2026 03:05
PowerCreek added a commit that referenced this pull request May 23, 2026
…s 2h + 3) (#52)

Closes #42. Final two steps of the RFC migration.

Step 2h — holographic health_check:
  Holographic is local-only (SQLite); no remote service. But a
  read-only HERMES_HOME (RO mount, wrong perms) would still
  break it at runtime. Override verifies db_path's parent
  directory is writable.

    (True, "")                     — parent mkdir + W_OK pass.
    (False, "unreachable: <msg>")  — mkdir failed or W_OK denied.
    (False, "config_error: <msg>") — get_hermes_home raised.

Step 3 — doctor branch collapse:
  Every shipped provider now implements health_check() with the
  RFC #42 reason-prefix taxonomy. The provider-specific Honcho +
  Mem0 elif blocks (~120 lines combined) are gone. The unified
  dispatch is one ~80-line block in doctor.py keyed on the prefix:

    healthy → check_ok("<name> reachable")
    no_api_key / no_credentials → _fail_and_issue (setup hint)
    no_url / no_endpoint        → _fail_and_issue (URL hint)
    no_config                   → check_warn (run setup)
    disabled                    → check_info
    sdk_missing                 → _fail_and_issue (plugin docs)
    auth:                       → _fail_and_issue (rotate key)
    not_found:                  → _fail_and_issue (verify base URL)
    http:                       → check_warn (unexpected status)
    unreachable:                → check_warn (transient hint)
    config_error:               → check_warn (config raised)
    health_check_raised:        → check_warn (provider bug; RFC
                                   #42 says health_check MUST NOT
                                   raise, so this is a contract
                                   violation worth flagging)
    other                       → check_warn (unknown verbatim)

Doctor's Memory Provider section is now ~80 lines instead of
~200, and adding the 9th provider requires zero doctor changes.

Migration table (8 providers total):
  mem0       — PR #44  GET /v1/memory profile
  honcho     — PR #45  get_honcho_client handshake
  byterover  — PR #46  brv status (CLI + login)
  supermemory — PR #48  client.profile probe
  openviking — PR #49  /health endpoint probe
  retaindb   — PR #50  /v1/memory/profile GET
  hindsight  — PR #51  mode-dependent (local import / cloud /version)
  holographic — this PR  db_path parent writability

Tests:
  - 6 new tests for holographic health_check
  - Updated 3 doctor tests to assert the new unified dispatch
    (auth → _fail_and_issue; unreachable → check_warn; healthy
    → "<name> reachable") instead of the now-removed elif-block
    output strings.
  - 228 health_check + doctor tests pass total.

The #42 RFC is now fully implemented across all shipped memory
providers. Closing the issue with this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant