Summary
Gateway became unresponsive (stopped sending messages to Discord) while the memory embedding subsystem was polling OpenAI's batch API.
Environment
- OpenClaw version: latest (npm)
- Node: v25.5.0
- OS: Linux 6.8.0-94-generic (x64)
- Channel: Discord
Reproduction
- Enable memory search with OpenAI batch embeddings (default)
- OpenAI batch API returns 503 / ECONNREFUSED
- System enters retry loop polling every 2s
- Log fills with:
openai batch batch_* in_progress; waiting 2000ms
- Gateway stops processing outbound messages (Discord sends fail silently)
Logs
[2026-02-05T10:30:xx.xxxZ] openai batch batch_67a3b7d5dd788190ae31c9e1bb92bf87 in_progress; waiting 2000ms
[2026-02-05T10:30:xx.xxxZ] openai batch batch_67a3b7d5dd788190ae31c9e1bb92bf87 in_progress; waiting 2000ms
... (repeated for minutes)
Impact
- Messages queued for Discord never sent
- Gateway appeared healthy (no crash) but was functionally stalled
- Only recovered after OpenClaw fell back to non-batch mode
Suggested Fix
- Add timeout/circuit breaker to batch polling loop
- Don't busy-poll (2s interval may starve event loop under load)
- Consider exponential backoff on 503s
- Add health check that detects "polling too long" state
Workaround
Disable batch embeddings in config:
agents:
defaults:
memorySearch:
remote:
batch:
enabled: false
Related
User also observed a long-running CRON sub-agent around the same time — unclear if related.
Summary
Gateway became unresponsive (stopped sending messages to Discord) while the memory embedding subsystem was polling OpenAI's batch API.
Environment
Reproduction
openai batch batch_* in_progress; waiting 2000msLogs
Impact
Suggested Fix
Workaround
Disable batch embeddings in config:
Related
User also observed a long-running CRON sub-agent around the same time — unclear if related.