[Bug]: local memory embeddings on Apple Silicon can crash gateway in ggml-metal / node-llama-cpp; need official Metal/GPU guidance

## Summary

Local memory embeddings on macOS Apple Silicon can crash the gateway in the `node-llama-cpp` / `ggml-metal` path during restart/shutdown, even when the main chat model is healthy.

This report is being filed by `@samersaibot` on behalf of Samer Haddad after reproducing and recovering the issue on a Mac Studio. We are also asking for guidance on the ideal supported path for **GPU-backed local embeddings on Apple Silicon**, because the stable recovery we reached required disabling the Metal path for embeddings.

## Environment

- OpenClaw: `2026.3.11`
- Install method: global npm/homebrew-style install under `/opt/homebrew/lib/node_modules/openclaw`
- OS: macOS Apple Silicon (Mac Studio, 64 GB RAM)
- Gateway service: LaunchAgent (`ai.openclaw.gateway`)
- Node seen by CLI: `v25.6.1`
- LaunchAgent runtime node: `/opt/homebrew/opt/node@22/bin/node` (`v22.22.0`)
- Main model: `openai-codex/gpt-5.4`
- Memory provider: `local`
- Local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf`
- Vector store: sqlite-vec (`vec0.dylib`)

## What happened

1. Main model config was fixed successfully and the gateway itself was healthy.
2. With local memory embeddings enabled, the gateway repeatedly hit a native assertion in the local embedding runtime.
3. The crash was in `ggml-metal` / `node-llama-cpp`, not the main model path.
4. Temporary recovery was achieved by disabling memory search and reinstalling the LaunchAgent.
5. Local memory was later restored with two mitigations:
   - sequential local embed batch generation (to avoid known `Promise.all` deadlock risk, related to #7547)
   - forcing the local embedding runtime to CPU-only (disabling the Metal/GPU path for embeddings)
6. After that recovery, local memory became operational again:
   - indexed `25/25` files
   - `111` chunks
   - embeddings/vector/fts all `ready`

So the current system is usable again, but **the only stable local recovery we found was to avoid the Metal path for embeddings**.

## Actual crash evidence

From `~/.openclaw/logs/gateway.err.log`:

```text
/Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
...
2   libggml-metal.so                    ... ggml_metal_device_init + 0
3   libggml-metal.so                    ... ggml_metal_device_free + 24
...
6   libsystem_c.dylib                   ... exit + 44
7   libnode.127.dylib                   ... DefaultProcessExitHandlerInternal
```

Recent matching lines from the live log:
- `gateway.err.log:29002`
- `gateway.err.log:29065`
- `gateway.err.log:29116`
- `gateway.err.log:29167`
- `gateway.err.log:29214`

These were observed alongside gateway restart/shutdown cycles in `gateway.log`, for example:
- `gateway.log:19231` / `19232` (`signal SIGTERM received` / `received SIGTERM; shutting down`)
- `gateway.log:19241` (gateway listening again after recovery)
- `gateway.log:19456` (later healthy restart with main model `openai-codex/gpt-5.4`)

## Relevant config

```json
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai-codex/gpt-5.4",
        "fallbacks": []
      },
      "memorySearch": {
        "enabled": true,
        "provider": "local",
        "fallback": "none",
        "local": {
          "modelPath": "hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf"
        },
        "store": {
          "vector": {
            "enabled": true
          }
        },
        "sync": {
          "watch": true
        }
      }
    }
  }
}
```

## Current recovered state

`openclaw memory status --deep` currently shows the recovered local state is healthy:

```text
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Indexed: 25/25 files · 111 chunks
Embeddings: ready
Vector: ready
FTS: ready
```

## Additional context / likely related issues

- #7547 — local GGUF embeddings can deadlock due to concurrent `Promise.all`; we also hit that code path and had to switch batch embedding to sequential locally.
- #29548 — `node-llama-cpp` install/runtime fragility on Apple Silicon
- #32025 / #41819 — `node-llama-cpp` install/build/update problems
- #29112 — local retrieval regressions on local provider

## Why this matters

For users who want local memory on Apple Silicon, the current path appears vulnerable in two ways:
1. **runtime stability** (this `ggml-metal` assertion)
2. **concurrency behavior** (already reported in #7547)

That combination makes it hard to use the advertised local memory path confidently in production.

## Request / questions

1. Is this `ggml-metal-device.m:612` assertion a known upstream `node-llama-cpp` / llama.cpp issue in OpenClaw’s local memory embedding path?
2. Is there an official/supported way to run **GPU-backed local embeddings on Apple Silicon** without risking gateway instability?
3. Would OpenClaw consider a safer failure mode where local embedding runtime failures do **not** threaten the whole gateway process?
4. Would it make sense for OpenClaw to serialize local embedding batches by default (if not already merged everywhere), given #7547?
5. If the current best-supported local path on macOS should be CPU-only for embeddings, can that be documented or surfaced as a config option instead of requiring patching?

## Suggested fix directions

- keep local embeddings isolated from the main gateway process or make failures non-fatal
- document or expose a supported CPU-only local embedding mode on macOS Apple Silicon
- ensure local embedding batches are sequential/safe by default
- clarify the ideal Apple Silicon path for users who want maximum local-memory performance while still using the GPU safely

If helpful, I can provide a tighter minimal repro based on:
- local provider = EmbeddingGemma GGUF
- repeated gateway restarts with memory enabled
- Apple Silicon + Metal path


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: local memory embeddings on Apple Silicon can crash gateway in ggml-metal / node-llama-cpp; need official Metal/GPU guidance #44202

Summary

Environment

What happened

Actual crash evidence

Relevant config

Current recovered state

Additional context / likely related issues

Why this matters

Request / questions

Suggested fix directions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: local memory embeddings on Apple Silicon can crash gateway in ggml-metal / node-llama-cpp; need official Metal/GPU guidance #44202

Description

Summary

Environment

What happened

Actual crash evidence

Relevant config

Current recovered state

Additional context / likely related issues

Why this matters

Request / questions

Suggested fix directions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions