memorySearch: embedding reindex fails with 'TypeError: fetch failed' after indexing ~40K chunks

## Description

Memory search embedding reindex consistently fails with `TypeError: fetch failed` after successfully indexing a significant number of chunks (~41K out of estimated ~45K). The `.tmp` file is deleted on failure (runSafeReindex rollback), so all progress is lost and the next attempt starts from scratch — creating an infinite failure loop.

## Environment

- **OpenClaw version:** v2026.3.24 (cff6dc9)
- **Node.js:** v25.8.2
- **OS:** Linux 6.8.0-88-generic (x64), 123GB RAM
- **Embedding provider:** SiliconFlow API (`Pro/BAAI/bge-m3`, 1024d, OpenAI-compatible endpoint at `https://api.siliconflow.cn/v1/`)
- **Files:** 272 `.md` files (~187MB) under workspace `memory/` directory
- **Config:** `memorySearch.remote.batch.concurrency: 2`, default retry settings (3 attempts, 500ms/8000ms backoff)
- **main agent** with the **same** SiliconFlow config successfully indexed 4 chunks — issue is specific to large-scale reindex

## Reproduction Steps

1. Configure an agent with `memorySearch.enabled: true`
2. Place ~270 large `.md` files (100KB-1MB each) in the workspace `memory/` directory
3. Use a remote embedding provider (SiliconFlow, OpenAI-compatible)
4. Trigger `memory_search` which initiates `runSafeReindex`
5. Observe: tmp file grows to ~2GB, ~41K chunks indexed
6. After ~1 hour: `memory sync failed: TypeError: fetch failed`
7. tmp is deleted, sqlite remains empty → next trigger restarts from scratch

## Error Log

```
{"subsystem":"memory","level":"warn","msg":"memory embeddings rate limited; retrying in 530ms"}  // once during indexing
{"subsystem":"memory","level":"warn","msg":"memory sync failed (session-start): TypeError: fetch failed"}
{"subsystem":"memory","level":"warn","msg":"memory sync failed (search): TypeError: fetch failed"}
```

No stack trace is included — `TypeError: fetch failed` is logged without the underlying cause (DNS, timeout, connection reset, etc.).

## Observations

1. **The embedding API itself is stable** — manual test with 10 concurrent requests to SiliconFlow: 0 failures, ~300-400ms each
2. **Not a 429/rate-limit issue** — only one rate-limit warning in the entire run
3. **Not an OOM issue** — 123GB RAM, no swap pressure
4. **Not concurrency-dependent** — fails with both concurrency=2 and concurrency=4
5. **Not specific to this provider** — same failure pattern occurred with Alibaba DashScope (text-embedding-v4) before switching to SiliconFlow
6. **Progress loss is the critical issue** — `runSafeReindex` deletes the `.tmp` on any failure, meaning ~1 hour of API calls is wasted every time
7. **No stack trace** makes it impossible to determine if the root cause is: undici connection pool reuse of dead connections, TLS session timeout, DNS resolution failure, or something else

## Suggested Improvements

1. **Include full stack trace** in the `TypeError: fetch failed` log so the root cause can be identified
2. **Partial progress preservation** — instead of deleting `.tmp` on failure, consider checkpointing or resuming from the last successful batch
3. **Connection health checks** — validate embedding API connectivity before starting a long reindex, or periodically during the process
4. **Graceful degradation** — if one batch fails, skip it and continue instead of aborting the entire reindex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memorySearch: embedding reindex fails with 'TypeError: fetch failed' after indexing ~40K chunks #56815

Description

Environment

Reproduction Steps

Error Log

Observations

Suggested Improvements

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

memorySearch: embedding reindex fails with 'TypeError: fetch failed' after indexing ~40K chunks #56815

Description

Description

Environment

Reproduction Steps

Error Log

Observations

Suggested Improvements

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions