doctor: Mem0 reachability probe with auth/network classification (#34)#35
Merged
Conversation
The Mem0 elif block stopped at `cfg["api_key"]` presence — a
garbage / expired / quota-locked key got the same green check as
a working one. Honcho already does a real connection probe via
get_honcho_client(); Mem0 was the asymmetric outlier.
Add a `MemoryClient(api_key).get_all(user_id, limit=1)` round-trip
that classifies the outcome:
- Success → check_ok("Mem0 connected", user/agent)
- 401/403/forbidden/
unauthorized/
"invalid api key" → check_fail("Mem0 auth rejected") +
issue pointing at https://app.mem0.ai
- mem0ai SDK missing → check_fail("mem0ai not installed") +
issue pointing at pip install
- Other exception → check_warn("Mem0 probe failed", ...
"key itself may still be valid")
The auth-vs-network distinction matters because doctor's job is
not just "did anything break" but "what action should the operator
take". An auth rejection means rotate the key; a network error
means wait. Conflating them sends operators chasing the wrong fix.
Closes #34.
PowerCreek
added a commit
that referenced
this pull request
May 23, 2026
After auditing the non-Honcho / non-Mem0 memory providers
(openviking, supermemory, hindsight, byterover, retaindb,
holographic), every implementation's is_available() only checks
env-vars / CLI presence — none probe backend reachability.
The doctor message "X provider active" overstates that as a
connection-validated state. Wrong key, expired token, unreachable
host all silently produce a green ✓.
Tighten the wording to "X provider configured" + add a check_info
row that:
- states is_available() only verified env-vars / CLI presence
- calls out the asymmetry (Honcho + Mem0 do real probes via
#15/#34/#35; others wait on a health_check() ABC method)
- links #36 for the design discussion
No code-level probe added here — adding per-provider probes
requires deep SDK knowledge for 6 providers AND the right shape
is probably a new ABC method, not 6 elif blocks. Both belong in
the follow-up. This PR's job is just to stop the false-positive
green check from misleading operators.
Closes #36 (message-tightening scope).
This was referenced May 23, 2026
PowerCreek
added a commit
that referenced
this pull request
May 23, 2026
…step 2a) (#44) Step 2a of the RFC #42 migration plan. Mem0 already had a real backend probe inside doctor's elif block (PR #35); move it INTO the provider's health_check() so doctor's generic branch can eventually use the unified ABC interface (step 3). The implementation is functionally identical to PR #35's doctor block, but returns the RFC-conventioned tuple instead of calling check_ok / check_fail / check_warn directly: (True, "") — get_all(user_id, limit=1) succeeds. (False, "no_api_key") — MEM0_API_KEY not set. (False, "sdk_missing") — mem0ai not installed. (False, "auth: <msg>") — 401/403/forbidden/unauthorized/ invalid-api-key/authentication. (False, "unreachable:...")— any other exception during get_all. (False, "config_error:..")— _load_config() itself raised. The "never raises" contract from the ABC default is preserved — config_error catches the rare case where even loading the config fails. This PR DOES NOT collapse the dedicated mem0 elif block in doctor.py yet — that comes in a later PR after Honcho also migrates. Operators see no behavior change here; the new method is just an alternative entry point with the same logic. Tests: - success returns (True, "") - no_api_key when MEM0_API_KEY empty - sdk_missing when `mem0` not importable - 4 auth-classification cases (401, 403, "invalid api key", "authentication failed") - ConnectionError classified as unreachable: - long error messages truncated under 220 chars - never raises: even when _load_config itself throws, health_check returns (False, "config_error: ...") Continues #42; does not close it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #34. The Mem0 doctor block was asymmetric with Honcho: Honcho did a real connection probe via
get_honcho_client(hcfg)and surfaced auth/network failures distinctly; Mem0 stopped atcfg["api_key"]presence. SoMEM0_API_KEY=garbagegot the same green ✓ as a working key, and the first production request 401'd unexpectedly.Fix
Add a one-shot
MemoryClient(api_key).get_all(user_id, limit=1)round-trip and classify the outcome:check_ok("Mem0 connected", "user_id=… agent_id=…")401/403/forbidden/unauthorized/invalid api key/authenticationcheck_fail("Mem0 auth rejected", …)+ issue →https://app.mem0.aimem0aiSDK ImportErrorcheck_fail("mem0ai not installed")+ issue →pip install mem0aicheck_warn("Mem0 probe failed", … "key itself may still be valid")The auth-vs-network distinction matters: doctor's job is "what should the operator do", not just "something broke". An auth rejection means rotate the key; a network error means wait. Conflating them sends operators chasing the wrong fix.
Test plan
tests/hermes_cli/test_doctor_mem0_probe.py:check_okwith user/agent ids_fail_and_issue(existing behavior preserved)401 Unauthorized→Mem0 auth rejectedfail +https://app.mem0.aiissue403 forbidden→ same auth classificationinvalid api keyphrase → same auth classificationConnectionError→check_warnwith "key itself may still be valid" hintmem0aiSDK ImportError →check_fail("mem0ai not installed")+ issuepytest tests/hermes_cli/test_doctor_mem0_probe.py tests/hermes_cli/test_doctor.py→ 71 passed (64 existing + 7 new).Filed by hermes-maintainer (PowerCreek).