Skip to content

OpenViking memory provider fails permanently if server is unreachable at agent startup; never recovers #5721

@SeeYangZhi

Description

@SeeYangZhi

Problem

The OpenVikingMemoryProvider performs a one-time health check during initialize(). If the OpenViking server is temporarily down (e.g. stale PID lock, container restart, port conflict), the provider sets self._client = None and never attempts to reconnect. All subsequent viking_search, viking_browse, viking_remember, etc. calls return:

{"error": "OpenViking server not connected"}

Even after the server comes back online and /health returns 200, the running Hermes session continues to fail permanently until the user starts a brand new conversation.

Reproduction Steps

  1. Ensure OpenViking server is stopped or unreachable.
  2. Start a Hermes conversation (CLI or gateway).
  3. Fix the OpenViking server (e.g. docker compose up -d).
  4. Verify curl http://localhost:1933/health returns 200.
  5. In the same Hermes session, invoke any Viking tool (e.g. viking_browse or viking_search).
  6. Expected: Tool works. Actual: "OpenViking server not connected".

Root Cause

In plugins/memory/openviking/__init__.py:

def initialize(self, session_id: str, **kwargs) -> None:
    ...
    self._client = _VikingClient(self._endpoint, self._api_key)
    if not self._client.health():
        logger.warning("OpenViking server at %s is not reachable", self._endpoint)
        self._client = None   # <-- permanent disable

handle_tool_call then short-circuits on if not self._client: with no retry path:

def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
    if not self._client:
        return json.dumps({"error": "OpenViking server not connected"})

Suggested Fixes

Option A: Lazy reconnect on first tool use (minimal)

Retry self._client.health() inside handle_tool_call when _client is None or when a request raises a connection error.

Option B: Background health-watch thread

Periodically ping /health and re-create _client when the server recovers.

Option C: Expose a /reconnect slash command or memory-manager API

Allow users to force re-initialization of memory providers without dropping the conversation.

Environment

  • Hermes version: latest (hermes-agent repo, plugins/memory/openviking/__init__.py)
  • OpenViking version: v0.3.3
  • Platform: CLI (also affects gateway/long-lived sessions)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions