Skip to content

[Bug]: local memory embeddings on Apple Silicon can crash gateway in ggml-metal / node-llama-cpp; need official Metal/GPU guidance #44202

@samersaibot

Description

@samersaibot

Summary

Local memory embeddings on macOS Apple Silicon can crash the gateway in the node-llama-cpp / ggml-metal path during restart/shutdown, even when the main chat model is healthy.

This report is being filed by @samersaibot on behalf of Samer Haddad after reproducing and recovering the issue on a Mac Studio. We are also asking for guidance on the ideal supported path for GPU-backed local embeddings on Apple Silicon, because the stable recovery we reached required disabling the Metal path for embeddings.

Environment

  • OpenClaw: 2026.3.11
  • Install method: global npm/homebrew-style install under /opt/homebrew/lib/node_modules/openclaw
  • OS: macOS Apple Silicon (Mac Studio, 64 GB RAM)
  • Gateway service: LaunchAgent (ai.openclaw.gateway)
  • Node seen by CLI: v25.6.1
  • LaunchAgent runtime node: /opt/homebrew/opt/node@22/bin/node (v22.22.0)
  • Main model: openai-codex/gpt-5.4
  • Memory provider: local
  • Local embedding model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
  • Vector store: sqlite-vec (vec0.dylib)

What happened

  1. Main model config was fixed successfully and the gateway itself was healthy.
  2. With local memory embeddings enabled, the gateway repeatedly hit a native assertion in the local embedding runtime.
  3. The crash was in ggml-metal / node-llama-cpp, not the main model path.
  4. Temporary recovery was achieved by disabling memory search and reinstalling the LaunchAgent.
  5. Local memory was later restored with two mitigations:
  6. After that recovery, local memory became operational again:
    • indexed 25/25 files
    • 111 chunks
    • embeddings/vector/fts all ready

So the current system is usable again, but the only stable local recovery we found was to avoid the Metal path for embeddings.

Actual crash evidence

From ~/.openclaw/logs/gateway.err.log:

/Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
...
2   libggml-metal.so                    ... ggml_metal_device_init + 0
3   libggml-metal.so                    ... ggml_metal_device_free + 24
...
6   libsystem_c.dylib                   ... exit + 44
7   libnode.127.dylib                   ... DefaultProcessExitHandlerInternal

Recent matching lines from the live log:

  • gateway.err.log:29002
  • gateway.err.log:29065
  • gateway.err.log:29116
  • gateway.err.log:29167
  • gateway.err.log:29214

These were observed alongside gateway restart/shutdown cycles in gateway.log, for example:

  • gateway.log:19231 / 19232 (signal SIGTERM received / received SIGTERM; shutting down)
  • gateway.log:19241 (gateway listening again after recovery)
  • gateway.log:19456 (later healthy restart with main model openai-codex/gpt-5.4)

Relevant config

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai-codex/gpt-5.4",
        "fallbacks": []
      },
      "memorySearch": {
        "enabled": true,
        "provider": "local",
        "fallback": "none",
        "local": {
          "modelPath": "hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf"
        },
        "store": {
          "vector": {
            "enabled": true
          }
        },
        "sync": {
          "watch": true
        }
      }
    }
  }
}

Current recovered state

openclaw memory status --deep currently shows the recovered local state is healthy:

Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Indexed: 25/25 files · 111 chunks
Embeddings: ready
Vector: ready
FTS: ready

Additional context / likely related issues

Why this matters

For users who want local memory on Apple Silicon, the current path appears vulnerable in two ways:

  1. runtime stability (this ggml-metal assertion)
  2. concurrency behavior (already reported in Local GGUF memory embeddings can deadlock due to Promise.all concurrency (node-llama-cpp) #7547)

That combination makes it hard to use the advertised local memory path confidently in production.

Request / questions

  1. Is this ggml-metal-device.m:612 assertion a known upstream node-llama-cpp / llama.cpp issue in OpenClaw’s local memory embedding path?
  2. Is there an official/supported way to run GPU-backed local embeddings on Apple Silicon without risking gateway instability?
  3. Would OpenClaw consider a safer failure mode where local embedding runtime failures do not threaten the whole gateway process?
  4. Would it make sense for OpenClaw to serialize local embedding batches by default (if not already merged everywhere), given Local GGUF memory embeddings can deadlock due to Promise.all concurrency (node-llama-cpp) #7547?
  5. If the current best-supported local path on macOS should be CPU-only for embeddings, can that be documented or surfaced as a config option instead of requiring patching?

Suggested fix directions

  • keep local embeddings isolated from the main gateway process or make failures non-fatal
  • document or expose a supported CPU-only local embedding mode on macOS Apple Silicon
  • ensure local embedding batches are sequential/safe by default
  • clarify the ideal Apple Silicon path for users who want maximum local-memory performance while still using the GPU safely

If helpful, I can provide a tighter minimal repro based on:

  • local provider = EmbeddingGemma GGUF
  • repeated gateway restarts with memory enabled
  • Apple Silicon + Metal path

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions