Bug: mem0-oss gRPC thread causes CLI abort after successful output (FATAL: exception not rethrown)

# Bug: mem0-oss leaves gRPC thread alive and CLI aborts after successful output (`FATAL: exception not rethrown`, exit 134)

## Summary

When the `mem0-oss` memory plugin is enabled, Hermes CLI can print the correct final answer and then abort during process shutdown with:

```text
FATAL: exception not rethrown
Fatal Python error: Aborted
...
Extension modules: ... google._upb._message, grpc._cython.cygrpc
```

The user-visible response is produced successfully, but the process exits with `134` (`SIGABRT`). This makes CLI/cron callers see a false failure and may interrupt background memory sync/prefetch cleanup.

## Environment

- Host: Linux 6.8.0-100-generic
- Hermes repo: `NousResearch/hermes-agent`
- Hermes version observed: `v0.14.0 (2026.5.16)`
- Profile memory provider: `mem0-oss`
- `mem0-oss` backend: self-hosted Qdrant + Ollama
- Active model/provider during Hermes repro: `openai-codex`, `gpt-5.5`

## Reproduction

### Full Hermes CLI repro

```bash
PYTHONFAULTHANDLER=1 hermes --profile ceo -z "Ответь ровно: ok" --ignore-rules --ignore-user-config
```

Observed output:

```text
ok
FATAL: exception not rethrown
Fatal Python error: Aborted

Thread 0x... (most recent call first):
  <no Python frame>

Extension modules: yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, websockets.speedups, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, google._upb._message, grpc._cython.cygrpc (total: 17)
```

Exit code: `134`.

### Minimal plugin-only repro

This isolates the crash from the model/provider path:

```python
import time, importlib.util

p = "/home/agent/.hermes/plugins/mem0-oss/__init__.py"
spec = importlib.util.spec_from_file_location("mem0_oss_plugin", p)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

prov = mod.Mem0OSSMemoryProvider()
prov.initialize(session_id="debug", hermes_home="/home/agent/.hermes/profiles/ceo")
prov.queue_prefetch("debug query", session_id="debug")
print("queued")
time.sleep(0.5)
print("exit")
```

Run:

```bash
PYTHONFAULTHANDLER=1 python /tmp/mem0_prefetch_repro.py
```

Observed 3/3 runs abort with the same signature:

```text
queued
exit
FATAL: exception not rethrown
Fatal Python error: Aborted
...
Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, google._upb._message, grpc._cython.cygrpc (total: 7)
```

### Control: explicit shutdown avoids abort

Adding `prov.shutdown()` before process exit makes the same repro pass 3/3:

```python
time.sleep(0.5)
prov.shutdown()
print("shutdown")
```

Observed:

```text
queued
shutdown
```

Exit code: `0`.

### Control: direct OpenAI/Codex request does not abort

A direct OpenAI SDK call to the same Codex backend completed cleanly (`exit=0`), so this does not appear to be caused by the model provider path.

## Likely cause

`mem0-oss.queue_prefetch()` starts a daemon thread that initializes mem0/Qdrant/gRPC. If the Python interpreter exits while that gRPC-backed thread/client is still active, the process aborts in native code.

This matches the known C++/pthread/gRPC failure class where `pthread_exit`/forced unwind is swallowed by a broad C++ `catch(...)`, leading to `FATAL: exception not rethrown`. A very similar explanation is documented in Apache Arrow Flight: active Python gRPC server/client work during interpreter shutdown can trigger this exact fatal message unless the server/client is explicitly shut down before exit.

Relevant external reference:

- Apache Arrow issue `apache/arrow#31952`: Python FlightRPC active server may segfault / abort at interpreter shutdown because gRPC catches pthread forced-unwind; explicit shutdown fixes it.

## Expected behavior

Hermes should exit with code `0` after a successful CLI one-shot response, even when `mem0-oss` auto-recall/auto-capture is enabled.

## Actual behavior

Hermes prints the response, then aborts with exit code `134` due to native gRPC shutdown state.

## Suggested fix direction

Do not leave mem0/Qdrant/gRPC work running as daemon-only background threads at interpreter shutdown.

Possible directions:

1. Ensure `Mem0OSSMemoryProvider.shutdown()` is always called at CLI process exit and blocks until prefetch/sync threads are fully joined.
2. Make mem0-oss prefetch/sync threads non-daemon or explicitly lifecycle-managed by the memory manager.
3. Close the underlying Qdrant/gRPC client if the mem0/qdrant client exposes a close/shutdown method.
4. Avoid initializing the gRPC-backed client in speculative prefetch if process lifetime is short, or make prefetch cancellable/drainable.

## Impact

- User receives the answer, so this is not usually user-visible in interactive use.
- Automation sees a failed process (`134`) after a successful response.
- Background memory sync/prefetch may be cut off during shutdown.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: mem0-oss gRPC thread causes CLI abort after successful output (FATAL: exception not rethrown) #27832

Bug: mem0-oss leaves gRPC thread alive and CLI aborts after successful output (`FATAL: exception not rethrown`, exit 134)

Summary

Environment

Reproduction

Full Hermes CLI repro

Minimal plugin-only repro

Control: explicit shutdown avoids abort

Control: direct OpenAI/Codex request does not abort

Likely cause

Expected behavior

Actual behavior

Suggested fix direction

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: mem0-oss gRPC thread causes CLI abort after successful output (FATAL: exception not rethrown) #27832

Description

Bug: mem0-oss leaves gRPC thread alive and CLI aborts after successful output (FATAL: exception not rethrown, exit 134)

Summary

Environment

Reproduction

Full Hermes CLI repro

Minimal plugin-only repro

Control: explicit shutdown avoids abort

Control: direct OpenAI/Codex request does not abort

Likely cause

Expected behavior

Actual behavior

Suggested fix direction

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: mem0-oss leaves gRPC thread alive and CLI aborts after successful output (`FATAL: exception not rethrown`, exit 134)