Skip to content

Bug: mem0-oss gRPC thread causes CLI abort after successful output (FATAL: exception not rethrown) #27832

@DmitryPogodaev

Description

@DmitryPogodaev

Bug: mem0-oss leaves gRPC thread alive and CLI aborts after successful output (FATAL: exception not rethrown, exit 134)

Summary

When the mem0-oss memory plugin is enabled, Hermes CLI can print the correct final answer and then abort during process shutdown with:

FATAL: exception not rethrown
Fatal Python error: Aborted
...
Extension modules: ... google._upb._message, grpc._cython.cygrpc

The user-visible response is produced successfully, but the process exits with 134 (SIGABRT). This makes CLI/cron callers see a false failure and may interrupt background memory sync/prefetch cleanup.

Environment

  • Host: Linux 6.8.0-100-generic
  • Hermes repo: NousResearch/hermes-agent
  • Hermes version observed: v0.14.0 (2026.5.16)
  • Profile memory provider: mem0-oss
  • mem0-oss backend: self-hosted Qdrant + Ollama
  • Active model/provider during Hermes repro: openai-codex, gpt-5.5

Reproduction

Full Hermes CLI repro

PYTHONFAULTHANDLER=1 hermes --profile ceo -z "Ответь ровно: ok" --ignore-rules --ignore-user-config

Observed output:

ok
FATAL: exception not rethrown
Fatal Python error: Aborted

Thread 0x... (most recent call first):
  <no Python frame>

Extension modules: yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, websockets.speedups, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, google._upb._message, grpc._cython.cygrpc (total: 17)

Exit code: 134.

Minimal plugin-only repro

This isolates the crash from the model/provider path:

import time, importlib.util

p = "/home/agent/.hermes/plugins/mem0-oss/__init__.py"
spec = importlib.util.spec_from_file_location("mem0_oss_plugin", p)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

prov = mod.Mem0OSSMemoryProvider()
prov.initialize(session_id="debug", hermes_home="/home/agent/.hermes/profiles/ceo")
prov.queue_prefetch("debug query", session_id="debug")
print("queued")
time.sleep(0.5)
print("exit")

Run:

PYTHONFAULTHANDLER=1 python /tmp/mem0_prefetch_repro.py

Observed 3/3 runs abort with the same signature:

queued
exit
FATAL: exception not rethrown
Fatal Python error: Aborted
...
Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, google._upb._message, grpc._cython.cygrpc (total: 7)

Control: explicit shutdown avoids abort

Adding prov.shutdown() before process exit makes the same repro pass 3/3:

time.sleep(0.5)
prov.shutdown()
print("shutdown")

Observed:

queued
shutdown

Exit code: 0.

Control: direct OpenAI/Codex request does not abort

A direct OpenAI SDK call to the same Codex backend completed cleanly (exit=0), so this does not appear to be caused by the model provider path.

Likely cause

mem0-oss.queue_prefetch() starts a daemon thread that initializes mem0/Qdrant/gRPC. If the Python interpreter exits while that gRPC-backed thread/client is still active, the process aborts in native code.

This matches the known C++/pthread/gRPC failure class where pthread_exit/forced unwind is swallowed by a broad C++ catch(...), leading to FATAL: exception not rethrown. A very similar explanation is documented in Apache Arrow Flight: active Python gRPC server/client work during interpreter shutdown can trigger this exact fatal message unless the server/client is explicitly shut down before exit.

Relevant external reference:

  • Apache Arrow issue apache/arrow#31952: Python FlightRPC active server may segfault / abort at interpreter shutdown because gRPC catches pthread forced-unwind; explicit shutdown fixes it.

Expected behavior

Hermes should exit with code 0 after a successful CLI one-shot response, even when mem0-oss auto-recall/auto-capture is enabled.

Actual behavior

Hermes prints the response, then aborts with exit code 134 due to native gRPC shutdown state.

Suggested fix direction

Do not leave mem0/Qdrant/gRPC work running as daemon-only background threads at interpreter shutdown.

Possible directions:

  1. Ensure Mem0OSSMemoryProvider.shutdown() is always called at CLI process exit and blocks until prefetch/sync threads are fully joined.
  2. Make mem0-oss prefetch/sync threads non-daemon or explicitly lifecycle-managed by the memory manager.
  3. Close the underlying Qdrant/gRPC client if the mem0/qdrant client exposes a close/shutdown method.
  4. Avoid initializing the gRPC-backed client in speculative prefetch if process lifetime is short, or make prefetch cancellable/drainable.

Impact

  • User receives the answer, so this is not usually user-visible in interactive use.
  • Automation sees a failed process (134) after a successful response.
  • Background memory sync/prefetch may be cut off during shutdown.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions