Skip to content

[Bug]: Gateway hard-crashes with 0xC0000409 (STATUS_STACK_BUFFER_OVERRUN) on Windows during Mattermost streaming reply; auto-respawn frequently wedges #71699

@mathiticus3

Description

@mathiticus3

Bug

The gateway crashes hard on Windows with exit code 3221226505 (0xC0000409STATUS_STACK_BUFFER_OVERRUN) during normal operation: incoming Mattermost channel events arriving while the embedded acpx runtime is mid-inference. The user-visible symptom is a Mattermost post that the bot stopped editing mid-stream — the bot creates a post, edits it once or twice with partial content, then dies before sending the rest. Mattermost is left holding the half-finished message.

This is distinct from #64253 (gateway alive but unresponsive) — here the process exits, with a memory-corruption status code. After the crash, the Windows Scheduled Task auto-respawns the gateway, but the respawned instance frequently fails to complete starting channels and sidecars… (CPU-pegged, no Mattermost connect log line, never replies to inbound). A hard kill + clean re-trigger is needed to recover, and stuck-session trajectories pile up under ~/.openclaw/agents/main/sessions/.

Symptoms

  1. Hard crash, exit 0xC0000409. Get-ScheduledTaskInfo -TaskName 'OpenClaw Gateway' reports LastTaskResult: 3221226505 after each crash.
  2. Half-finished Mattermost post. Final state has update_atcreate_at + a few seconds, never updated again, content cuts off mid-sentence. Example: bot reply finalized as "The current year according to the provided" (42 chars, no closing punctuation, update_at - create_at = 2562 ms).
  3. No "post-mortem" log line. The runtime log just stops at the last gateway/ws RPC response or agent/embedded bootstrap warning. No stack trace, no error event in the runtime log.
  4. Post-restart wedging. The auto-respawned gateway often binds the port, logs ready (6 plugins…), then sits at "starting channels and sidecars…" with no mattermost connect line. CPU stays >80% for a single core; node has 4 ESTABLISHED conns to Ollama (127.0.0.1:11434) but zero to Mattermost. After ~3 min it sometimes does connect, but only after multiple slow RPCs (chat.history, models.list) report 30+ second durations on the WS log.

Pattern: 5–15 minutes between restart and next crash under steady-state Mattermost activity.

Environment

  • OpenClaw: 2026.4.23 (a979721) (npm install)
  • Node.js: v24.15.0
  • OS: Windows 11
  • Gateway service: Windows Scheduled Task OpenClaw Gateway running node …\openclaw\dist\index.js gateway --port 18789, bind lan, auth token
  • Channels enabled: Mattermost only (Mattermost Team Edition v11.6.1 over Tailscale, plain HTTP)
  • Agent model: ollama-local/llama3.1:8b (local Ollama on the same host)
  • MEMORY.md: 18,848 chars (truncates to 12,000 every session bootstrap — warning fires for every channel + DM session)
  • Plugins loaded: acpx, browser, device-pair, mattermost, phone-control, talk-voice (6)
  • Cron: 1 enabled job (pcs-redfin-sync-daily, fires at 5 AM ET; not active during the crashes I observed)

Mattermost config (channels.mattermost):

{
  "name": "lab-1",
  "enabled": true,
  "botToken": "<redacted>",
  "baseUrl": "http://<mm-host>.<tailnet>.ts.net:8065",
  "network": { "dangerouslyAllowPrivateNetwork": true },
  "dmPolicy": "open",
  "groupPolicy": "open"
}

The bot is a member of 5 channels.

Reproduction

  1. Configure Mattermost channel as above. Set dmPolicy: open + groupPolicy: open so both DMs and channel messages flow.
  2. Set agent model to ollama-local/llama3.1:8b (or any local Ollama backend that produces multi-second streamed responses).
  3. From a Mattermost user, send @openclaw <prompt that produces multi-line output> to a channel the bot is in. Repeat across 5–10 messages over 5–15 min.
  4. Observe at least one bot reply in Mattermost where the post was created, edited a couple of times, then frozen mid-sentence with no further update_at changes.
  5. Check Get-ScheduledTaskInfo -TaskName 'OpenClaw Gateway'LastTaskResult will be 3221226505.

Log slice (last entries before death)

Trimmed from ~/.openclaw/Local/Temp/openclaw/openclaw-2026-04-25.log. Note the sequence: ANSI-escape-laden gateway/ws RPC responses with 48-second durations on routine usage.cost / sessions.usage calls, followed by an agent/embedded bootstrap for a Mattermost session, then nothing.

2026-04-25T12:57:09.894-04:00 [INFO] gateway/ws res "channels.status" 1839ms
2026-04-25T12:57:18.380-04:00 [INFO] plugins   mattermost: registered slash command callback at /api/channels/mattermost/command
2026-04-25T12:57:27.941-04:00 [WARN] plugins   1 plugin(s) failed to initialize (validation: device-pair). Run 'openclaw plugins list' for details.
2026-04-25T12:57:55.546-04:00 [WARN] agent/embedded   workspace bootstrap file MEMORY.md is 18848 chars (limit 12000); truncating in injected context
                                       (sessionKey=agent:main:mattermost:channel:<channel-id>)
2026-04-25T12:57:57.264-04:00 [INFO] gateway/ws res "usage.cost" 48239ms
2026-04-25T12:57:57.346-04:00 [INFO] gateway/ws res "sessions.usage" 48328ms
2026-04-25T12:58:11.848-04:00 [INFO] gateway/ws res "node.list" 52ms
2026-04-25T12:58:11.905-04:00 [WARN] agent/embedded   workspace bootstrap file MEMORY.md is 18848 chars (limit 12000); truncating in injected context
                                       (sessionKey=agent:main:mattermost:direct:<user-id>)
<<< process exits 0xC0000409, no further log lines >>>

Stuck session trajectories left behind:

8889f05c-….trajectory.jsonl    2,495,462 bytes   last write 12:58:58
704ce0ef-….trajectory.jsonl      341,880 bytes   last write 12:59:57

Suggested investigation

  • Memory corruption / stack overrun likely originates in a native module or a large-buffer copy in the agent/embedded ↔ Ollama path. The repeated MEMORY.md truncation warning (running on every session bootstrap because the file exceeds the 12 KB injected limit) is a candidate hot path. Worth checking the truncation code for off-by-one / unsafe writes when input size > limit by ~50%.
  • The 48-second usage.cost / sessions.usage RPCs immediately before death suggest the event loop was stalled (likely on disk I/O or an Ollama HTTP call) while WS frames piled up. A blocked event loop combined with a corrupt Buffer write would line up with 0xC0000409.
  • Investigate device-pair plugin validation error (1 plugin(s) failed to initialize (validation: device-pair)) — appears in every restart even though device-pair is in the loaded list. Probably benign but adds noise.

Workaround

External watchdog scheduled task that probes /health every 60 s, kills lingering node …openclaw\dist\index.js gateway processes, and re-triggers the gateway task after 2 consecutive failures with a 5-minute restart cooldown. Recovers from both this crash and the post-crash wedging in #64253.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions