Context
Auditing memory-provider doctor checks for diagnostic-loudness gaps. Found a parallel-but-not-identical issue to #32: Honcho does a real connection probe (get_honcho_client(hcfg) raises on failure), Mem0 only checks cfg["api_key"] is non-empty (hermes_cli/doctor.py:2243):
mem0_key = mem0_cfg.get("api_key", "")
if mem0_key:
check_ok("Mem0 API key configured")
check_info(f"user_id={...} agent_id={...}")
So MEM0_API_KEY=garbage, MEM0_API_KEY= (expired), or MEM0_API_KEY= (network down) all produce the same green check. Operator's first inference request 401s in production and they don't connect that to doctor's clean run.
The Mem0 plugin already imports from mem0 import MemoryClient in _get_client() (plugins/memory/mem0/__init__.py:174). A MemoryClient(api_key=key).get_all(user_id=..., limit=1) round-trip validates auth + reachability.
Fix
Mirror the Honcho block: after confirming the key is present, attempt a one-shot MemoryClient(api_key=key).get_all(user_id=cfg["user_id"], limit=1). Three outcomes:
- Success →
check_ok("Mem0 connected", "user_id=… agent_id=…")
- Auth failure (401/403 surface as an exception from mem0ai) →
_fail_and_issue("Mem0 auth rejected", str(exc), "Mem0 API key rejected — verify MEM0_API_KEY at https://app.mem0.ai", issues)
- Other exception (network, timeout, mem0ai SDK change) →
check_warn("Mem0 probe failed", str(exc))
Wrap in try/except so a Mem0 SDK shape change can't crash doctor mid-run — surfaces as a warn row.
Out of scope
- Probing other memory providers' generic
is_available() more loudly. The MemoryProvider ABC method already returns False on missing config, which doctor surfaces with check_warn(f"{name} configured but not available"). Adding more depth there would require per-provider knowledge and is a separate scope decision.
- Mem0-side circuit breaker integration. The plugin tracks
_consecutive_failures for runtime backoff; doctor's check is a one-shot probe, not a long-running observer.
Filed by hermes-maintainer (PowerCreek). PR incoming.
Context
Auditing memory-provider doctor checks for diagnostic-loudness gaps. Found a parallel-but-not-identical issue to #32: Honcho does a real connection probe (
get_honcho_client(hcfg)raises on failure), Mem0 only checkscfg["api_key"]is non-empty (hermes_cli/doctor.py:2243):So
MEM0_API_KEY=garbage,MEM0_API_KEY=(expired), orMEM0_API_KEY=(network down) all produce the same green check. Operator's first inference request 401s in production and they don't connect that to doctor's clean run.The Mem0 plugin already imports
from mem0 import MemoryClientin_get_client()(plugins/memory/mem0/__init__.py:174). AMemoryClient(api_key=key).get_all(user_id=..., limit=1)round-trip validates auth + reachability.Fix
Mirror the Honcho block: after confirming the key is present, attempt a one-shot
MemoryClient(api_key=key).get_all(user_id=cfg["user_id"], limit=1). Three outcomes:check_ok("Mem0 connected", "user_id=… agent_id=…")_fail_and_issue("Mem0 auth rejected", str(exc), "Mem0 API key rejected — verify MEM0_API_KEY at https://app.mem0.ai", issues)check_warn("Mem0 probe failed", str(exc))Wrap in try/except so a Mem0 SDK shape change can't crash doctor mid-run — surfaces as a warn row.
Out of scope
is_available()more loudly. The MemoryProvider ABC method already returns False on missing config, which doctor surfaces withcheck_warn(f"{name} configured but not available"). Adding more depth there would require per-provider knowledge and is a separate scope decision._consecutive_failuresfor runtime backoff; doctor's check is a one-shot probe, not a long-running observer.Filed by hermes-maintainer (PowerCreek). PR incoming.