Skip to content

doctor: Mem0 reachability probe with auth/network classification (#34)#35

Merged
PowerCreek merged 1 commit into
mainfrom
doctor-mem0-probe-loudness
May 23, 2026
Merged

doctor: Mem0 reachability probe with auth/network classification (#34)#35
PowerCreek merged 1 commit into
mainfrom
doctor-mem0-probe-loudness

Conversation

@PowerCreek

Copy link
Copy Markdown

Summary

Closes #34. The Mem0 doctor block was asymmetric with Honcho: Honcho did a real connection probe via get_honcho_client(hcfg) and surfaced auth/network failures distinctly; Mem0 stopped at cfg["api_key"] presence. So MEM0_API_KEY=garbage got the same green ✓ as a working key, and the first production request 401'd unexpectedly.

Fix

Add a one-shot MemoryClient(api_key).get_all(user_id, limit=1) round-trip and classify the outcome:

Outcome Doctor row Action
Success check_ok("Mem0 connected", "user_id=… agent_id=…")
401/403/forbidden/unauthorized/invalid api key/authentication check_fail("Mem0 auth rejected", …) + issue → https://app.mem0.ai Rotate the key
mem0ai SDK ImportError check_fail("mem0ai not installed") + issue → pip install mem0ai Install SDK
Anything else check_warn("Mem0 probe failed", … "key itself may still be valid") Wait / retry

The auth-vs-network distinction matters: doctor's job is "what should the operator do", not just "something broke". An auth rejection means rotate the key; a network error means wait. Conflating them sends operators chasing the wrong fix.

Test plan

  • 7 new tests in tests/hermes_cli/test_doctor_mem0_probe.py:
    • successful probe → check_ok with user/agent ids
    • missing API key → _fail_and_issue (existing behavior preserved)
    • 401 UnauthorizedMem0 auth rejected fail + https://app.mem0.ai issue
    • 403 forbidden → same auth classification
    • invalid api key phrase → same auth classification
    • ConnectionErrorcheck_warn with "key itself may still be valid" hint
    • mem0ai SDK ImportError → check_fail("mem0ai not installed") + issue
  • pytest tests/hermes_cli/test_doctor_mem0_probe.py tests/hermes_cli/test_doctor.py → 71 passed (64 existing + 7 new).

Filed by hermes-maintainer (PowerCreek).

The Mem0 elif block stopped at `cfg["api_key"]` presence — a
garbage / expired / quota-locked key got the same green check as
a working one. Honcho already does a real connection probe via
get_honcho_client(); Mem0 was the asymmetric outlier.

Add a `MemoryClient(api_key).get_all(user_id, limit=1)` round-trip
that classifies the outcome:

  - Success            → check_ok("Mem0 connected", user/agent)
  - 401/403/forbidden/
    unauthorized/
    "invalid api key" → check_fail("Mem0 auth rejected") +
                        issue pointing at https://app.mem0.ai
  - mem0ai SDK missing → check_fail("mem0ai not installed") +
                        issue pointing at pip install
  - Other exception    → check_warn("Mem0 probe failed", ...
                        "key itself may still be valid")

The auth-vs-network distinction matters because doctor's job is
not just "did anything break" but "what action should the operator
take". An auth rejection means rotate the key; a network error
means wait. Conflating them sends operators chasing the wrong fix.

Closes #34.
@PowerCreek PowerCreek merged commit 01960cb into main May 23, 2026
@PowerCreek PowerCreek deleted the doctor-mem0-probe-loudness branch May 23, 2026 00:14
PowerCreek added a commit that referenced this pull request May 23, 2026
After auditing the non-Honcho / non-Mem0 memory providers
(openviking, supermemory, hindsight, byterover, retaindb,
holographic), every implementation's is_available() only checks
env-vars / CLI presence — none probe backend reachability.

The doctor message "X provider active" overstates that as a
connection-validated state. Wrong key, expired token, unreachable
host all silently produce a green ✓.

Tighten the wording to "X provider configured" + add a check_info
row that:
  - states is_available() only verified env-vars / CLI presence
  - calls out the asymmetry (Honcho + Mem0 do real probes via
    #15/#34/#35; others wait on a health_check() ABC method)
  - links #36 for the design discussion

No code-level probe added here — adding per-provider probes
requires deep SDK knowledge for 6 providers AND the right shape
is probably a new ABC method, not 6 elif blocks. Both belong in
the follow-up. This PR's job is just to stop the false-positive
green check from misleading operators.

Closes #36 (message-tightening scope).
PowerCreek added a commit that referenced this pull request May 23, 2026
…step 2a) (#44)

Step 2a of the RFC #42 migration plan. Mem0 already had a real
backend probe inside doctor's elif block (PR #35); move it INTO
the provider's health_check() so doctor's generic branch can
eventually use the unified ABC interface (step 3).

The implementation is functionally identical to PR #35's doctor
block, but returns the RFC-conventioned tuple instead of calling
check_ok / check_fail / check_warn directly:

  (True, "")                 — get_all(user_id, limit=1) succeeds.
  (False, "no_api_key")      — MEM0_API_KEY not set.
  (False, "sdk_missing")     — mem0ai not installed.
  (False, "auth: <msg>")     — 401/403/forbidden/unauthorized/
                                 invalid-api-key/authentication.
  (False, "unreachable:...")— any other exception during get_all.
  (False, "config_error:..")— _load_config() itself raised.

The "never raises" contract from the ABC default is preserved —
config_error catches the rare case where even loading the config
fails.

This PR DOES NOT collapse the dedicated mem0 elif block in
doctor.py yet — that comes in a later PR after Honcho also
migrates. Operators see no behavior change here; the new method
is just an alternative entry point with the same logic.

Tests:
  - success returns (True, "")
  - no_api_key when MEM0_API_KEY empty
  - sdk_missing when `mem0` not importable
  - 4 auth-classification cases (401, 403, "invalid api key",
    "authentication failed")
  - ConnectionError classified as unreachable:
  - long error messages truncated under 220 chars
  - never raises: even when _load_config itself throws,
    health_check returns (False, "config_error: ...")

Continues #42; does not close it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hermes doctor: Mem0 check stops at API-key presence — no reachability probe (vs Honcho which connects)

1 participant